Description
Proposal
I would like to propose ActionRepeat
wrapper that would allow the wrapped environment to repeat step()
for the specified number of times.
Motivation
I am working on implementing models like PlaNet and Dreamer, and I'm working with MuJoCo environments mostly. In these implementations, there is almost always a term like action_repeat
. I think the proposed wrapper would simplify this line of implementation.
Pitch
Assuming that the overridden step()
is called at time-step t
, it would return the followings
observation[t + n_repeat], sum(reward[t: t + n_repeat + 1]), terminated, truncated, info[t + n_repeat]
That means, the overridden step()
would call the parent step()
at least once and it would assert that n_repeat
is positive (>=0).
If terminal
or truncation
is reached within the action repetition, the loop would be exited, and the reward would be summed unto that point while observation and info from that time step would be returned.
Alternatives
No response
Additional context
No response
Checklist
- I have checked that there is no similar issue in the repo