Skip to content

Commit f88a618

Browse files
Add tutorial Load custom quadruped robot environments using Gymnasium/MuJoCo/Ant-v5 framework (#838)
1 parent de909da commit f88a618

1 file changed

Lines changed: 247 additions & 0 deletions

File tree

Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
Load custom quadruped robot environments
2+
================================
3+
4+
In this tutorial we will see how to use the `MuJoCo/Ant-v5` framework to create a quadruped walking environment, using a model file (ending in `.xml`) without having to create a new class.
5+
6+
Steps:
7+
8+
0. Get your **MJCF** (or **URDF**) model file of your robot.
9+
0.a. Create your own model (see the [Guide](https://mujoco.readthedocs.io/en/stable/m22odeling.html)).
10+
0.b. Find a ready-made model (in this tutorial we will use a model from the [**MuJoCo Menagerie**](https://github.com/google-deepmind/mujoco_menagerie) collection).
11+
1. Load the model with the `xml_file` argument.
12+
2. Tweak the environment parameters to get the desired behavior.
13+
1. Tweak the environment simulation parameters.
14+
2. Tweak the environment termination parameters.
15+
3. Tweak the environment reward parameters.
16+
4. Tweak the environment observation parameters.
17+
3. Train an agent to move your robot.
18+
19+
20+
The reader is expected to be familiar with the `Gymnasium` API & library, the basics of robotics, and the included `Gymnasium/MuJoCo` environments with the robot model they use. Familiarity with the **MJCF** file model format and the `MuJoCo` simulator is not required but is recommended.
21+
22+
Setup
23+
------
24+
We will need `gymnasium>=1.0.0`.
25+
26+
```sh
27+
pip install "gymnasium>=1.0.0"
28+
```
29+
30+
Step 0.1 - Download a Robot Model
31+
-------------------------
32+
In this tutorial we will load the [Unitree Go1](
33+
https://github.com/google-deepmind/mujoco_menagerie/blob/main/unitree_go1/README.md) robot from the excellent [MuJoCo Menagerie](https://github.com/google-deepmind/mujoco_menagerie) robot model collection.
34+
![Unitree Go1 robot in a flat terrain scene](https://github.com/google-deepmind/mujoco_menagerie/blob/main/unitree_go1/go1.png?raw=true)
35+
36+
`Go1` is a quadruped robot, controlling it to move is a significant learning problem, much harder than the `Gymnasium/MuJoCo/Ant` environment.
37+
38+
We can download the whole MuJoCo Menagerie collection (which includes `Go1`),
39+
```sh
40+
git clone https://github.com/google-deepmind/mujoco_menagerie.git
41+
```
42+
You can use any other quadruped robot with this tutorial, just adjust the environment parameter values for your robot.
43+
44+
45+
Step 1 - Load the model
46+
-------------------------
47+
To load the model, all we have to do is use the `xml_file` argument with the `Ant-v5` framework.
48+
49+
```py
50+
import gymnasium
51+
import numpy as np
52+
env = gymnasium.make('Ant-v5', xml_file='./mujoco_menagerie/unitree_go1/scene.xml')
53+
```
54+
55+
Although this is enough to load the model, we will need to tweak some environment parameters to get the desired behavior for our environment, for now we will also explicitly set the simulation, termination, reward and observation arguments, which we will tweak in the next step.
56+
57+
```py
58+
env = gymnasium.make(
59+
'Ant-v5',
60+
xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
61+
forward_reward_weight=0,
62+
ctrl_cost_weight=0,
63+
contact_cost_weight=0,
64+
healthy_reward=0,
65+
main_body=1,
66+
healthy_z_range=(0, np.inf),
67+
include_cfrc_ext_in_observation=True,
68+
exclude_current_positions_from_observation=False,
69+
reset_noise_scale=0,
70+
frame_skip=1,
71+
max_episode_steps=1000,
72+
)
73+
```
74+
75+
76+
Step 2 - Tweaking the Environment Parameters
77+
-------------------------
78+
Tweaking the environment parameters is essential to get the desired behavior for learning.
79+
In the following subsections, the reader is encouraged to consult the [documentation of the arguments](https://gymnasium.farama.org/main/environments/mujoco/ant/#arguments) for more detailed information.
80+
81+
82+
83+
Step 2.1 - Tweaking the Environment Simulation Parameters
84+
-------------------------
85+
The arguments of interest are `frame_skip`, `reset_noise_scale` and `max_episode_steps`.
86+
87+
We want to tweak the `frame_skip` parameter to get `dt` to an acceptable value (typical values are `dt` $\in [0.01, 0.1]$ seconds),
88+
89+
Reminder: $dt = frame\_skip \times model.opt.timestep$, where `model.opt.timestep` is the integrator time step selected in the MJCF model file.
90+
91+
The `Go1` model we are using has an integrator timestep of `0.002`, so by selecting `frame_skip=25` we can set the value of `dt` to `0.05s`.
92+
93+
To avoid overfitting the policy, `reset_noise_scale` should be set to a value appropriate to the size of the robot, we want the value to be as large as possible without the initial distribution of states being invalid (`Terminal` regardless of control actions), for `Go1` we choose a value of `0.1`.
94+
95+
And `max_episode_steps` determines the number of steps per episode before `truncation`, here we set it to 1000 to be consistent with the based `Gymnasium/MuJoCo` environments, but if you need something higher you can set it so.
96+
97+
98+
```py
99+
env = gymnasium.make(
100+
'Ant-v5',
101+
xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
102+
forward_reward_weight=0,
103+
ctrl_cost_weight=0,
104+
contact_cost_weight=0,
105+
healthy_reward=0,
106+
main_body=1,
107+
healthy_z_range=(0, np.inf),
108+
include_cfrc_ext_in_observation=True,
109+
exclude_current_positions_from_observation=False,
110+
reset_noise_scale=0.1, # set to avoid policy overfitting
111+
frame_skip=25, # set dt=0.05
112+
max_episode_steps=1000, # kept at 1000
113+
)
114+
```
115+
116+
117+
Step 2.2 - Tweaking the Environment Termination Parameters
118+
-------------------------
119+
Termination is important for robot environments to avoid sampling "useless" time steps.
120+
121+
The arguments of interest are `terminate_when_unhealthy` and `healthy_z_range`.
122+
123+
We want to set `healthy_z_range` to terminate the environment when the robot falls over, or jumps really high, here we have to choose a value that is logical for the height of the robot, for `Go1` we choose `(0.195, 0.75)`.
124+
Note: `healthy_z_range` checks the absolute value of the height of the robot, so if your scene contains different levels of elevation it should be set to `(-np.inf, np.inf)`
125+
126+
We could also set `terminate_when_unhealthy=False` to disable termination altogether, which is not desirable in the case of `Go1`.
127+
128+
```py
129+
env = gymnasium.make(
130+
'Ant-v5',
131+
xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
132+
forward_reward_weight=0,
133+
ctrl_cost_weight=0,
134+
contact_cost_weight=0,
135+
healthy_reward=0,
136+
main_body=1,
137+
healthy_z_range=(0.195, 0.75), # set to avoid sampling steps where the robot has fallen or jumped too high
138+
include_cfrc_ext_in_observation=True,
139+
exclude_current_positions_from_observation=False,
140+
reset_noise_scale=0.1,
141+
frame_skip=25,
142+
max_episode_steps=1000,
143+
)
144+
```
145+
146+
Note: If you need a different termination condition, you can write your own `TerminationWrapper` (see the [documentation](https://gymnasium.farama.org/main/api/wrappers/)).
147+
148+
149+
150+
Step 2.3 - Tweaking the Environment Reward Parameters
151+
-------------------------
152+
The arguments of interest are `forward_reward_weight`, `ctrl_cost_weight`, `contact_cost_weight`, `healthy_reward`, and `main_body`.
153+
154+
For the arguments `forward_reward_weight`, `ctrl_cost_weight`, `contact_cost_weight` and `healthy_reward` we have to pick values that make sense for our robot, you can use the default `MuJoCo/Ant` parameters for references and tweak them if a change is needed for your environment. In the case of `Go1` we only change the `ctrl_cost_weight` since it has a higher actuator force range.
155+
156+
For the argument `main_body` we have to choose which body part is the main body (usually called something like "torso" or "trunk" in the model file) for the calculation of the `forward_reward`, in the case of `Go1` it is the `"trunk"` (Note: in most cases including this one, it can be left at the default value).
157+
158+
```py
159+
env = gymnasium.make(
160+
'Ant-v5',
161+
xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
162+
forward_reward_weight=1, # kept the same as the 'Ant' environment
163+
ctrl_cost_weight=0.05, # changed because of the stronger motors of `Go1`
164+
contact_cost_weight=5e-4, # kept the same as the 'Ant' environment
165+
healthy_reward=1, # kept the same as the 'Ant' environment
166+
main_body=1, # represents the "trunk" of the `Go1` robot
167+
healthy_z_range=(0.195, 0.75),
168+
include_cfrc_ext_in_observation=True,
169+
exclude_current_positions_from_observation=False,
170+
reset_noise_scale=0.1,
171+
frame_skip=25,
172+
max_episode_steps=1000,
173+
)
174+
```
175+
176+
Note: If you need a different reward function, you can write your own `RewardWrapper` (see the [documentation](https://gymnasium.farama.org/main/api/wrappers/reward_wrappers/)).
177+
178+
179+
180+
Step 2.4 - Tweaking the Environment Observation Parameters
181+
-------------------------
182+
The arguments of interest are `include_cfrc_ext_in_observation` and `exclude_current_positions_from_observation`.
183+
184+
Here for `Go1` we have no particular reason to change them.
185+
186+
```py
187+
env = gymnasium.make(
188+
'Ant-v5',
189+
xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
190+
forward_reward_weight=1,
191+
ctrl_cost_weight=0.05,
192+
contact_cost_weight=5e-4,
193+
healthy_reward=1,
194+
main_body=1,
195+
healthy_z_range=(0.195, 0.75),
196+
include_cfrc_ext_in_observation=True, # kept the game as the 'Ant' environment
197+
exclude_current_positions_from_observation=False, # kept the game as the 'Ant' environment
198+
reset_noise_scale=0.1,
199+
frame_skip=25,
200+
max_episode_steps=1000,
201+
)
202+
```
203+
204+
205+
Note: If you need additional observation elements (such as additional sensors), you can write your own `ObservationWrapper` (see the [documentation](https://gymnasium.farama.org/main/api/wrappers/observation_wrappers/)).
206+
207+
208+
209+
Step 3 - Train your Agent
210+
-------------------------
211+
Finally, we are done, we can use a RL algorithm to train an agent to walk/run the `Go1` robot.
212+
Note: If you have followed this guide with your own robot model, you may discover during training that some environment parameters were not as desired, feel free to go back to step 2 and change anything as needed.
213+
214+
```py
215+
import gymnasium
216+
217+
env = gymnasium.make(
218+
'Ant-v5',
219+
xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
220+
forward_reward_weight=1,
221+
ctrl_cost_weight=0.05,
222+
contact_cost_weight=5e-4,
223+
healthy_reward=1,
224+
main_body=1,
225+
healthy_z_range=(0.195, 0.75),
226+
include_cfrc_ext_in_observation=True,
227+
exclude_current_positions_from_observation=False,
228+
reset_noise_scale=0.1,
229+
frame_skip=25,
230+
max_episode_steps=1000,
231+
)
232+
... # run your RL algorithm
233+
```
234+
![image](https://github.com/Kallinteris-Andreas/Gymnasium-kalli/assets/30759571/bf1797a3-264d-47de-b14c-e3c16072f695)
235+
236+
<iframe id="odysee-iframe" style="width:100%; aspect-ratio:16 / 9;" src="https://odysee.com/$/embed/@Kallinteris-Andreas:7/video0-step-0-to-step-1000:1?r=6fn5jA9uZQUZXGKVpwtqjz1eyJcS3hj3" allowfullscreen></iframe>
237+
<!--
238+
Which can run up to `4.7 m/s` according to the manufacturer
239+
-->
240+
241+
242+
Epilogue
243+
-------------------------
244+
You can follow this guide to create most quadruped environments.
245+
To create humanoid/bipedal robots, you can also follow this guide using the `Gymnasium/MuJoCo/Humnaoid-v5` framework.
246+
247+
Author: [@kallinteris-andreas](https://github.com/Kallinteris-Andreas)

0 commit comments

Comments
 (0)