You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have just spent the last day tracking down a bug in my code that turned out to be the result of the collect environment mistakenly being used outside of the collect actor's run method.
Because the collect env was being stepped outside of actor.run, the actual environment state became out of sync with self._time_step in the Actor class. I was only able to detect this problem because my environment has illegal actions dependent on the state, and once every so often my runs would crash when an agent tried to take an illegal action due to receiving the incorrect observation. In cases where there are no illegal actions, such an error would be even harder to spot and the first time step of the actor.run would always be incorrect.
I see two options to prevent people from encountering the same problem in the future:
At the start of the actor.run step synchronise the internal variable tracking the time step with self._time_step = self._env.current_time_step().
If we want to encourage or force users to not use the collect environment outside of actor.run, the method could check if self._time_step equals self._env.current_time_step() and raise a warning or exception if they do not match.
The argument against option 1. is that doesn't account for the policy state, which afaik cannot be recovered if the environment was stepped elsewhere. In this case, I think that both options could be combined into something along the lines of this:
if not self._policy_state:
# silently synchronise actor with env
self._time_step = self._env.current_time_step()
elif self._time_step != self._env.current_time_step():
# alert user that they are making a mistake
raise Exception("Environment was stepped outside of actor run. Actor tracking variables are out of sync with the environment.")
I'm happy to put together a PR if this seems reasonable
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
agents/tf_agents/train/actor.py
Line 152 in 5f7d836
I have just spent the last day tracking down a bug in my code that turned out to be the result of the collect environment mistakenly being used outside of the collect actor's run method.
Because the collect env was being stepped outside of
actor.run
, the actual environment state became out of sync withself._time_step
in theActor
class. I was only able to detect this problem because my environment has illegal actions dependent on the state, and once every so often my runs would crash when an agent tried to take an illegal action due to receiving the incorrect observation. In cases where there are no illegal actions, such an error would be even harder to spot and the first time step of theactor.run
would always be incorrect.I see two options to prevent people from encountering the same problem in the future:
actor.run
step synchronise the internal variable tracking the time step withself._time_step = self._env.current_time_step()
.actor.run
, the method could check ifself._time_step
equalsself._env.current_time_step()
and raise a warning or exception if they do not match.The argument against option 1. is that doesn't account for the policy state, which afaik cannot be recovered if the environment was stepped elsewhere. In this case, I think that both options could be combined into something along the lines of this:
I'm happy to put together a PR if this seems reasonable
Beta Was this translation helpful? Give feedback.
All reactions