Skip to content

Commit eec83c4

Browse files
Update vector env docs (#1053)
1 parent 6d5a0f3 commit eec83c4

File tree

1 file changed

+45
-31
lines changed

1 file changed

+45
-31
lines changed

Diff for: gymnasium/vector/vector_env.py

+45-31
Original file line numberDiff line numberDiff line change
@@ -30,32 +30,45 @@ class VectorEnv(Generic[ObsType, ActType, ArrayType]):
3030
"""Base class for vectorized environments to run multiple independent copies of the same environment in parallel.
3131
3232
Vector environments can provide a linear speed-up in the steps taken per second through sampling multiple
33-
sub-environments at the same time. To prevent terminated environments waiting until all sub-environments have
34-
terminated or truncated, the vector environments automatically reset sub-environments after they terminate or truncated (within the same step call).
35-
As a result, the step's observation and info are overwritten by the reset's observation and info.
36-
To preserve this data, the observation and info for the final step of a sub-environment is stored in the info parameter,
37-
using `"final_observation"` and `"final_info"` respectively. See :meth:`step` for more information.
33+
sub-environments at the same time. Gymnasium contains two generalised Vector environments: :class:`AsyncVectorEnv`
34+
and :class:`SyncVectorEnv` along with several custom vector environment implementations.
35+
For :func:`reset` and :func:`step` batches `observations`, `rewards`, `terminations`, `truncations` and
36+
`info` for each sub-environment, see the example below. For the `rewards`, `terminations`, and `truncations`,
37+
the data is packaged into a NumPy array of shape `(num_envs,)`. For `observations` (and `actions`, the batching
38+
process is dependent on the type of observation (and action) space, and generally optimised for neural network
39+
input/outputs. For `info`, the data is kept as a dictionary such that a key will give the data for all sub-environment.
40+
41+
For creating environments, :func:`make_vec` is a vector environment equivalent to :func:`make` for easily creating
42+
vector environments that contains several unique arguments for modifying environment qualities, number of environment,
43+
vectorizer type, vectorizer arguments.
3844
39-
The vector environments batches `observations`, `rewards`, `terminations`, `truncations` and `info` for each
40-
sub-environment. In addition, :meth:`step` expects to receive a batch of actions for each parallel environment.
41-
42-
Gymnasium contains two generalised Vector environments: :class:`AsyncVectorEnv` and :class:`SyncVectorEnv` along with
43-
several custom vector environment implementations.
44-
45-
The Vector Environments have the additional attributes for users to understand the implementation
46-
47-
- :attr:`num_envs` - The number of sub-environment in the vector environment
48-
- :attr:`observation_space` - The batched observation space of the vector environment
49-
- :attr:`single_observation_space` - The observation space of a single sub-environment
50-
- :attr:`action_space` - The batched action space of the vector environment
51-
- :attr:`single_action_space` - The action space of a single sub-environment
45+
Note:
46+
The info parameter of :meth:`reset` and :meth:`step` was originally implemented before v0.25 as a list
47+
of dictionary for each sub-environment. However, this was modified in v0.25+ to be a dictionary with a NumPy
48+
array for each key. To use the old info style, utilise the :class:`DictInfoToList` wrapper.
5249
5350
Examples:
5451
>>> import gymnasium as gym
5552
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync", wrappers=(gym.wrappers.TimeAwareObservation,))
5653
>>> envs = gym.wrappers.vector.ClipReward(envs, min_reward=0.2, max_reward=0.8)
5754
>>> envs
5855
<ClipReward, SyncVectorEnv(CartPole-v1, num_envs=3)>
56+
>>> envs.num_envs
57+
3
58+
>>> envs.action_space
59+
MultiDiscrete([2 2 2])
60+
>>> envs.observation_space
61+
Box([[-4.80000019e+00 -3.40282347e+38 -4.18879032e-01 -3.40282347e+38
62+
0.00000000e+00]
63+
[-4.80000019e+00 -3.40282347e+38 -4.18879032e-01 -3.40282347e+38
64+
0.00000000e+00]
65+
[-4.80000019e+00 -3.40282347e+38 -4.18879032e-01 -3.40282347e+38
66+
0.00000000e+00]], [[4.80000019e+00 3.40282347e+38 4.18879032e-01 3.40282347e+38
67+
5.00000000e+02]
68+
[4.80000019e+00 3.40282347e+38 4.18879032e-01 3.40282347e+38
69+
5.00000000e+02]
70+
[4.80000019e+00 3.40282347e+38 4.18879032e-01 3.40282347e+38
71+
5.00000000e+02]], (3, 5), float64)
5972
>>> observations, infos = envs.reset(seed=123)
6073
>>> observations
6174
array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282, 0. ],
@@ -64,7 +77,8 @@ class VectorEnv(Generic[ObsType, ActType, ArrayType]):
6477
>>> infos
6578
{}
6679
>>> _ = envs.action_space.seed(123)
67-
>>> observations, rewards, terminations, truncations, infos = envs.step(envs.action_space.sample())
80+
>>> actions = envs.action_space.sample()
81+
>>> observations, rewards, terminations, truncations, infos = envs.step(actions)
6882
>>> observations
6983
array([[ 0.01734283, 0.15089367, -0.02859527, -0.33293587, 1. ],
7084
[ 0.02909703, -0.16717631, 0.04740972, 0.3319138 , 1. ],
@@ -79,17 +93,18 @@ class VectorEnv(Generic[ObsType, ActType, ArrayType]):
7993
{}
8094
>>> envs.close()
8195
82-
Note:
83-
The info parameter of :meth:`reset` and :meth:`step` was originally implemented before v0.25 as a list
84-
of dictionary for each sub-environment. However, this was modified in v0.25+ to be a
85-
dictionary with a NumPy array for each key. To use the old info style, utilise the :class:`DictInfoToList` wrapper.
96+
To avoid having to wait for all sub-environments to terminated before resetting, implementations will autoreset
97+
sub-environments on episode end (`terminated or truncated is True`). As a result, when adding observations
98+
to a replay buffer, this requires a knowning where the observation (and info) for each sub-environment are the first
99+
observation from an autoreset. We recommend using an additional variable to store this information.
86100
87-
Note:
88-
All parallel environments should share the identical observation and action spaces.
89-
In other words, a vector of multiple different environments is not supported.
101+
The Vector Environments have the additional attributes for users to understand the implementation
90102
91-
Note:
92-
:func:`make_vec` is the equivalent function to :func:`make` for vector environments.
103+
- :attr:`num_envs` - The number of sub-environment in the vector environment
104+
- :attr:`observation_space` - The batched observation space of the vector environment
105+
- :attr:`single_observation_space` - The observation space of a single sub-environment
106+
- :attr:`action_space` - The batched action space of the vector environment
107+
- :attr:`single_action_space` - The action space of a single sub-environment
93108
"""
94109

95110
metadata: dict[str, Any] = {}
@@ -149,9 +164,8 @@ def step(
149164
Batch of (observations, rewards, terminations, truncations, infos)
150165
151166
Note:
152-
As the vector environments autoreset for a terminating and truncating sub-environments,
153-
the returned observation and info is not the final step's observation or info which is instead stored in
154-
info as `"final_observation"` and `"final_info"`.
167+
As the vector environments autoreset for a terminating and truncating sub-environments, this will occur on
168+
the next step after `terminated or truncated is True`.
155169
156170
Example:
157171
>>> import gymnasium as gym

0 commit comments

Comments
 (0)