Change Proposal to Actor Run Time Step Handling #780

DylanCope · 2022-10-05T23:00:39Z

DylanCope
Oct 5, 2022

Line 152 in 5f7d836

self._time_step, self._policy_state = self._driver.run(

I have just spent the last day tracking down a bug in my code that turned out to be the result of the collect environment mistakenly being used outside of the collect actor's run method.

Because the collect env was being stepped outside of actor.run, the actual environment state became out of sync with self._time_step in the Actor class. I was only able to detect this problem because my environment has illegal actions dependent on the state, and once every so often my runs would crash when an agent tried to take an illegal action due to receiving the incorrect observation. In cases where there are no illegal actions, such an error would be even harder to spot and the first time step of the actor.run would always be incorrect.

I see two options to prevent people from encountering the same problem in the future:

At the start of the actor.run step synchronise the internal variable tracking the time step with self._time_step = self._env.current_time_step().
If we want to encourage or force users to not use the collect environment outside of actor.run, the method could check if self._time_step equals self._env.current_time_step() and raise a warning or exception if they do not match.

The argument against option 1. is that doesn't account for the policy state, which afaik cannot be recovered if the environment was stepped elsewhere. In this case, I think that both options could be combined into something along the lines of this:

if not self._policy_state:
    # silently synchronise actor with env
    self._time_step = self._env.current_time_step()
elif self._time_step != self._env.current_time_step():
    # alert user that they are making a mistake
    raise Exception("Environment was stepped outside of actor run. Actor tracking variables are out of sync with the environment.")

I'm happy to put together a PR if this seems reasonable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Proposal to Actor Run Time Step Handling #780

{{title}}

Replies: 0 comments

Select a reply

Change Proposal to Actor Run Time Step Handling #780

DylanCope Oct 5, 2022

Replies: 0 comments

DylanCope
Oct 5, 2022