Open
Description
Describe the bug
The doc on autoresetting states this:
Disabled Mode
No automatic resetting occurs and users need to manually reset the sub-environment through a mask, env.reset(mask=np.array([True, False, ...], dtype=bool)). The easier way of generating this mask is np.logical_or(terminations, truncations). This makes training code closer to single vector training code, however, can be slower is some cases due to running another function.
import gymnasium as gym
import numpy as np
from collections import deque
# Initialize environment, buffer and episode_start
envs = gym.vector.SyncVectorEnv(
[lambda: gym.make("CartPole-v1") for _ in range(2)],
autoreset_mode=gym.vector.AutoresetMode.DISABLED
)
replay_buffer = deque(maxlen=100)
observations, _ = envs.reset()
while True: # Training loop
actions = policy(observations)
next_observations, rewards, terminations, truncations, infos = envs.step(actions)
# Add to replay buffer
for i in range(envs.num_envs):
replay_buffer.append((observations[i], actions[i], rewards[i], terminations[i], next_observations[i]))
# update observation
autoreset = np.logical_or(terminations, truncations)
if np.any(autoreset):
observations = envs.reset(options={"mask": autoreset})
else:
observations = next_observations
envs.close()
This is misleading:
env.reset(mask=np.array([True, False, ...], dtype=bool))
It should be
env.reset(options={"reset_mask": np.array([True, False, ...], dtype=bool)})
This is also wrong:
if np.any(autoreset):
observations = envs.reset(options={"mask": autoreset})
else:
observations = next_observations
It should be
if np.any(autoreset):
observations = envs.reset(options={"reset_mask": autoreset})
else:
observations = next_observations
happy to propose a patch, IDK where the doc is hosted though.
Code example
System info
No response
Additional context
No response
Checklist
- I have checked that there is no similar issue in the repo