Skip to content

Commit de909da

Browse files
Update the classic control arguments doc sections (#898)
1 parent af3d6d7 commit de909da

19 files changed

Lines changed: 159 additions & 143 deletions

gymnasium/envs/box2d/bipedal_walker.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -142,11 +142,15 @@ class BipedalWalker(gym.Env, EzPickle):
142142
if the walker exceeds the right end of the terrain length.
143143
144144
## Arguments
145-
To use the _hardcore_ environment, you need to specify the
146-
`hardcore=True` argument like below:
145+
146+
To use the _hardcore_ environment, you need to specify the `hardcore=True`:
147+
147148
```python
148-
import gymnasium as gym
149-
env = gym.make("BipedalWalker-v3", hardcore=True)
149+
>>> import gymnasium as gym
150+
>>> env = gym.make("BipedalWalker-v3", hardcore=True, render_mode="rgb_array")
151+
>>> env
152+
<TimeLimit<OrderEnforcing<PassiveEnvChecker<BipedalWalker<BipedalWalker-v3>>>>>
153+
150154
```
151155
152156
## Version History

gymnasium/envs/box2d/car_racing.py

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ class CarRacing(gym.Env, EzPickle):
115115
state RGB buffer. From left to right: true speed, four ABS sensors,
116116
steering wheel position, and gyroscope.
117117
To play yourself (it's rather fast for humans), type:
118-
```
118+
```shell
119119
python gymnasium/envs/box2d/car_racing.py
120120
```
121121
Remember: it's a powerful rear-wheel drive car - don't press the accelerator
@@ -139,46 +139,54 @@ class CarRacing(gym.Env, EzPickle):
139139
A top-down 96x96 RGB image of the car and race track.
140140
141141
## Rewards
142-
The reward is -0.1 every frame and +1000/N for every track tile visited,
143-
where N is the total number of tiles visited in the track. For example,
144-
if you have finished in 732 frames, your reward is
145-
1000 - 0.1*732 = 926.8 points.
142+
The reward is -0.1 every frame and +1000/N for every track tile visited, where N is the total number of tiles
143+
visited in the track. For example, if you have finished in 732 frames, your reward is 1000 - 0.1*732 = 926.8 points.
146144
147145
## Starting State
148146
The car starts at rest in the center of the road.
149147
150148
## Episode Termination
151-
The episode finishes when all the tiles are visited. The car can also go
152-
outside the playfield - that is, far off the track, in which case it will
153-
receive -100 reward and die.
149+
The episode finishes when all the tiles are visited. The car can also go outside the playfield -
150+
that is, far off the track, in which case it will receive -100 reward and die.
154151
155152
## Arguments
156-
`lap_complete_percent` dictates the percentage of tiles that must be visited by
157-
the agent before a lap is considered complete.
158153
159-
Passing `domain_randomize=True` enables the domain randomized variant of the environment.
160-
In this scenario, the background and track colours are different on every reset.
154+
```python
155+
>>> import gymnasium as gym
156+
>>> env = gym.make("CarRacing-v2", render_mode="rgb_array", lap_complete_percent=0.95, domain_randomize=False, continuous=False)
157+
>>> env
158+
<TimeLimit<OrderEnforcing<PassiveEnvChecker<CarRacing<CarRacing-v2>>>>>
159+
160+
```
161+
162+
* `lap_complete_percent=0.95` dictates the percentage of tiles that must be visited by
163+
the agent before a lap is considered complete.
161164
162-
Passing `continuous=False` converts the environment to use discrete action space.
163-
The discrete action space has 5 actions: [do nothing, left, right, gas, brake].
165+
* `domain_randomize=False` enables the domain randomized variant of the environment.
166+
In this scenario, the background and track colours are different on every reset.
167+
168+
* `continuous=True` converts the environment to use discrete action space.
169+
The discrete action space has 5 actions: [do nothing, left, right, gas, brake].
164170
165171
## Reset Arguments
172+
166173
Passing the option `options["randomize"] = True` will change the current colour of the environment on demand.
167174
Correspondingly, passing the option `options["randomize"] = False` will not change the current colour of the environment.
168175
`domain_randomize` must be `True` on init for this argument to work.
169-
Example usage:
176+
170177
```python
171-
import gymnasium as gym
172-
env = gym.make("CarRacing-v1", domain_randomize=True)
178+
>>> import gymnasium as gym
179+
>>> env = gym.make("CarRacing-v2", domain_randomize=True)
173180
174181
# normal reset, this changes the colour scheme by default
175-
env.reset()
182+
>>> obs, _ = env.reset()
176183
177184
# reset with colour scheme change
178-
env.reset(options={"randomize": True})
185+
>>> randomize_obs, _ = env.reset(options={"randomize": True})
179186
180187
# reset with no colour scheme change
181-
env.reset(options={"randomize": False})
188+
>>> non_random_obs, _ = env.reset(options={"randomize": False})
189+
182190
```
183191
184192
## Version History

gymnasium/envs/box2d/lunar_lander.py

Lines changed: 37 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ class LunarLander(gym.Env, EzPickle):
9393
can learn to fly and then land on its first attempt.
9494
9595
To see a heuristic landing, run:
96-
```
96+
```shell
9797
python gymnasium/envs/box2d/lunar_lander.py
9898
```
9999
<!-- To play yourself, run: -->
@@ -145,74 +145,60 @@ class LunarLander(gym.Env, EzPickle):
145145
> them is destroyed.
146146
147147
## Arguments
148-
To use the _continuous_ environment, you need to specify the
149-
`continuous=True` argument like below:
148+
149+
Lunar Lander has a large number of arguments
150+
150151
```python
151-
import gymnasium as gym
152-
env = gym.make(
153-
"LunarLander-v2",
154-
continuous: bool = False,
155-
gravity: float = -10.0,
156-
enable_wind: bool = False,
157-
wind_power: float = 15.0,
158-
turbulence_power: float = 1.5,
159-
)
152+
>>> import gymnasium as gym
153+
>>> env = gym.make("LunarLander-v2", continuous=False, gravity=-10.0,
154+
... enable_wind=False, wind_power=15.0, turbulence_power=1.5)
155+
>>> env
156+
<TimeLimit<OrderEnforcing<PassiveEnvChecker<LunarLander<LunarLander-v2>>>>>
157+
160158
```
161-
If `continuous=True` is passed, continuous actions (corresponding to the throttle of the engines) will be used and the
162-
action space will be `Box(-1, +1, (2,), dtype=np.float32)`.
163-
The first coordinate of an action determines the throttle of the main engine, while the second
164-
coordinate specifies the throttle of the lateral boosters.
165-
Given an action `np.array([main, lateral])`, the main engine will be turned off completely if
166-
`main < 0` and the throttle scales affinely from 50% to 100% for `0 <= main <= 1` (in particular, the
167-
main engine doesn't work with less than 50% power).
168-
Similarly, if `-0.5 < lateral < 0.5`, the lateral boosters will not fire at all. If `lateral < -0.5`, the left
169-
booster will fire, and if `lateral > 0.5`, the right booster will fire. Again, the throttle scales affinely
170-
from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively).
171-
172-
`gravity` dictates the gravitational constant, this is bounded to be within 0 and -12.
173-
174-
If `enable_wind=True` is passed, there will be wind effects applied to the lander.
175-
The wind is generated using the function `tanh(sin(2 k (t+C)) + sin(pi k (t+C)))`.
176-
`k` is set to 0.01.
177-
`C` is sampled randomly between -9999 and 9999.
178-
179-
`wind_power` dictates the maximum magnitude of linear wind applied to the craft. The recommended value for `wind_power` is between 0.0 and 20.0.
180-
`turbulence_power` dictates the maximum magnitude of rotational wind applied to the craft. The recommended value for `turbulence_power` is between 0.0 and 2.0.
159+
160+
* `continuous` determines if discrete or continuous actions (corresponding to the throttle of the engines) will be used with the
161+
action space being `Discrete(4)` or `Box(-1, +1, (2,), dtype=np.float32)` respectively.
162+
For continuous actions, the first coordinate of an action determines the throttle of the main engine, while the second
163+
coordinate specifies the throttle of the lateral boosters. Given an action `np.array([main, lateral])`, the main
164+
engine will be turned off completely if `main < 0` and the throttle scales affinely from 50% to 100% for
165+
`0 <= main <= 1` (in particular, the main engine doesn't work with less than 50% power).
166+
Similarly, if `-0.5 < lateral < 0.5`, the lateral boosters will not fire at all. If `lateral < -0.5`, the left
167+
booster will fire, and if `lateral > 0.5`, the right booster will fire. Again, the throttle scales affinely
168+
from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively).
169+
170+
* `gravity` dictates the gravitational constant, this is bounded to be within 0 and -12. Default is -10.0
171+
172+
* `enable_wind` determines if there will be wind effects applied to the lander. The wind is generated using
173+
the function `tanh(sin(2 k (t+C)) + sin(pi k (t+C)))` where `k` is set to 0.01 and `C` is sampled randomly between -9999 and 9999.
174+
175+
* `wind_power` dictates the maximum magnitude of linear wind applied to the craft. The recommended value for
176+
`wind_power` is between 0.0 and 20.0.
177+
178+
* `turbulence_power` dictates the maximum magnitude of rotational wind applied to the craft.
179+
The recommended value for `turbulence_power` is between 0.0 and 2.0.
181180
182181
## Version History
183182
- v2: Count energy spent and in v0.24, added turbulence with wind power and turbulence_power parameters
184-
- v1: Legs contact with ground added in state vector; contact with ground
185-
give +10 reward points, and -10 if then lose contact; reward
186-
renormalized to 200; harder initial random push.
183+
- v1: Legs contact with ground added in state vector; contact with ground give +10 reward points,
184+
and -10 if then lose contact; reward renormalized to 200; harder initial random push.
187185
- v0: Initial version
188186
189-
190187
## Notes
191188
192189
There are several unexpected bugs with the implementation of the environment.
193190
194-
1. The position of the side thursters on the body of the lander changes, depending on the orientation of the lander.
195-
This in turn results in an orientation depentant torque being applied to the lander.
191+
1. The position of the side thrusters on the body of the lander changes, depending on the orientation of the lander.
192+
This in turn results in an orientation dependent torque being applied to the lander.
196193
197194
2. The units of the state are not consistent. I.e.
198195
* The angular velocity is in units of 0.4 radians per second. In order to convert to radians per second, the value needs to be multiplied by a factor of 2.5.
199196
200197
For the default values of VIEWPORT_W, VIEWPORT_H, SCALE, and FPS, the scale factors equal:
201-
'x': 10
202-
'y': 6.666
203-
'vx': 5
204-
'vy': 7.5
205-
'angle': 1
206-
'angular velocity': 2.5
198+
'x': 10, 'y': 6.666, 'vx': 5, 'vy': 7.5, 'angle': 1, 'angular velocity': 2.5
207199
208200
After the correction has been made, the units of the state are as follows:
209-
'x': (units)
210-
'y': (units)
211-
'vx': (units/second)
212-
'vy': (units/second)
213-
'angle': (radians)
214-
'angular velocity': (radians/second)
215-
201+
'x': (units), 'y': (units), 'vx': (units/second), 'vy': (units/second), 'angle': (radians), 'angular velocity': (radians/second)
216202
217203
<!-- ## References -->
218204

gymnasium/envs/classic_control/acrobot.py

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -96,15 +96,19 @@ class AcrobotEnv(Env):
9696
9797
## Arguments
9898
99-
No additional arguments are currently supported during construction.
99+
Acrobot only has `render_mode` as a keyword for `gymnasium.make`.
100+
On reset, the `options` parameter allows the user to change the bounds used to determine the new random state.
100101
101102
```python
102-
import gymnasium as gym
103-
env = gym.make('Acrobot-v1')
104-
```
103+
>>> import gymnasium as gym
104+
>>> env = gym.make('Acrobot-v1', render_mode="rgb_array")
105+
>>> env
106+
<TimeLimit<OrderEnforcing<PassiveEnvChecker<AcrobotEnv<Acrobot-v1>>>>>
107+
>>> env.reset(seed=123, options={"low": -0.2, "high": 0.2}) # default low=-0.1, high=0.1
108+
(array([ 0.997341 , 0.07287608, 0.9841162 , -0.17752565, -0.11185605,
109+
-0.12625128], dtype=float32), {})
105110
106-
On reset, the `options` parameter allows the user to change the bounds used to determine
107-
the new random state.
111+
```
108112
109113
By default, the dynamics of the acrobot follow those described in Sutton and Barto's book
110114
[Reinforcement Learning: An Introduction](http://incompleteideas.net/book/11/node4.html).
@@ -118,20 +122,17 @@ class AcrobotEnv(Env):
118122
119123
See the following note for details:
120124
121-
> The dynamics equations were missing some terms in the NIPS paper which
122-
are present in the book. R. Sutton confirmed in personal correspondence
123-
that the experimental results shown in the paper and the book were
124-
generated with the equations shown in the book.
125-
However, there is the option to run the domain with the paper equations
126-
by setting `book_or_nips = 'nips'`
127-
125+
> The dynamics equations were missing some terms in the NIPS paper which are present in the book.
126+
R. Sutton confirmed in personal correspondence that the experimental results shown in the paper and the book were
127+
generated with the equations shown in the book. However, there is the option to run the domain with the paper equations
128+
by setting `book_or_nips = 'nips'`
128129
129130
## Version History
130131
131132
- v1: Maximum number of steps increased from 200 to 500. The observation space for v0 provided direct readings of
132133
`theta1` and `theta2` in radians, having a range of `[-pi, pi]`. The v1 observation space as described here provides the
133134
sine and cosine of each angle instead.
134-
- v0: Initial versions release (1.0.0) (removed from gymnasium for v1)
135+
- v0: Initial versions release
135136
136137
## References
137138
- Sutton, R. S. (1996). Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding.
@@ -383,8 +384,8 @@ def close(self):
383384

384385

385386
def wrap(x, m, M):
386-
"""Wraps ``x`` so m <= x <= M; but unlike ``bound()`` which
387-
truncates, ``wrap()`` wraps x around the coordinate system defined by m,M.\n
387+
"""Wraps `x` so m <= x <= M; but unlike `bound()` which
388+
truncates, `wrap()` wraps x around the coordinate system defined by m,M.\n
388389
For example, m = -180, M = 180 (degrees), x = 360 --> returns 0.
389390
390391
Args:
@@ -439,7 +440,7 @@ def rk4(derivs, y0, t):
439440
>>> yout = rk4(derivs, y0, t)
440441
441442
Args:
442-
derivs: the derivative of the system and has the signature ``dy = derivs(yi)``
443+
derivs: the derivative of the system and has the signature `dy = derivs(yi)`
443444
y0: initial state vector
444445
t: sample times
445446

gymnasium/envs/classic_control/cartpole.py

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -74,29 +74,33 @@ class CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]):
7474
7575
## Arguments
7676
77-
Cartpole only has ``render_mode`` as a keyword for ``gymnasium.make``.
77+
Cartpole only has `render_mode` as a keyword for `gymnasium.make`.
7878
On reset, the `options` parameter allows the user to change the bounds used to determine the new random state.
7979
80-
Examples:
81-
>>> import gymnasium as gym
82-
>>> env = gym.make("CartPole-v1", render_mode="rgb_array")
83-
>>> env
84-
<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>
85-
>>> env.reset(seed=123, options={"low": 0, "high": 1})
86-
(array([0.6823519 , 0.05382102, 0.22035988, 0.18437181], dtype=float32), {})
80+
```python
81+
>>> import gymnasium as gym
82+
>>> env = gym.make("CartPole-v1", render_mode="rgb_array")
83+
>>> env
84+
<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>
85+
>>> env.reset(seed=123, options={"low": -0.1, "high": 0.1}) # default low=-0.05, high=0.05
86+
(array([ 0.03647037, -0.0892358 , -0.05592803, -0.06312564], dtype=float32), {})
87+
88+
```
8789
8890
## Vectorized environment
8991
9092
To increase steps per seconds, users can use a custom vector environment or with an environment vectorizor.
9193
92-
Examples:
93-
>>> import gymnasium as gym
94-
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="vector_entry_point")
95-
>>> envs
96-
CartPoleVectorEnv(CartPole-v1, num_envs=3)
97-
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
98-
>>> envs
99-
SyncVectorEnv(CartPole-v1, num_envs=3)
94+
```python
95+
>>> import gymnasium as gym
96+
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="vector_entry_point")
97+
>>> envs
98+
CartPoleVectorEnv(CartPole-v1, num_envs=3)
99+
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
100+
>>> envs
101+
SyncVectorEnv(CartPole-v1, num_envs=3)
102+
103+
```
100104
"""
101105

102106
metadata = {

gymnasium/envs/classic_control/continuous_mountain_car.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -91,17 +91,22 @@ class Continuous_MountainCarEnv(gym.Env):
9191
9292
## Arguments
9393
94+
Continuous Mountain Car has two parameters for `gymnasium.make` with `render_mode` and `goal_velocity`.
95+
On reset, the `options` parameter allows the user to change the bounds used to determine the new random state.
96+
9497
```python
95-
import gymnasium as gym
96-
gym.make('MountainCarContinuous-v0')
97-
```
98+
>>> import gymnasium as gym
99+
>>> env = gym.make("MountainCarContinuous-v0", render_mode="rgb_array", goal_velocity=0.1) # default goal_velocity=0
100+
>>> env
101+
<TimeLimit<OrderEnforcing<PassiveEnvChecker<Continuous_MountainCarEnv<MountainCarContinuous-v0>>>>>
102+
>>> env.reset(seed=123, options={"low": -0.7, "high": -0.5}) # default low=-0.6, high=-0.4
103+
(array([-0.5635296, 0. ], dtype=float32), {})
98104
99-
On reset, the `options` parameter allows the user to change the bounds used to determine
100-
the new random state.
105+
```
101106
102107
## Version History
103108
104-
* v0: Initial versions release (1.0.0)
109+
* v0: Initial versions release
105110
"""
106111

107112
metadata = {

0 commit comments

Comments
 (0)