This repository provides the essential codebase to run custom RL policies on an Unitree H1-2 robot.
Currently the RL training uses Isaac Lab. The sim2sim validation is done with Mujoco. The sim2real deployment pipe depends on unitree_sdk2_python.
The project is structured around three main packages:
- robot_assets: Manages the models of the robot
- robot_deploy: Handles deployment-related functionalities (sim2sim and sim2real).
- robot_tasks: Contains the core definition of the training environment for the robot.
Both robot_deploy and robot_assets are adapted from the template provided for Isaac Lab Projects to facilitate integration.
This project has been packaged using UV.
To run a training:
uv run scripts/rsl_rl/train.py --task Isaac-Velocity-Flat-H12_12dof-v0This command initiates the training process for the specified task.
To deploy a trained policy onto a real robot, use
uv run --package robot_deploy scripts/deploy/main.pyNote: the --package robot_deploy flag avoids pulling the dependencies from the robot_tasks package used for training (e.g. isaacsim, isaaclab...).
To evaluate a policy in simulation, set the use_mujoco flag to True in the config.yaml file before running the main.py script: it will spawn a MuJoCo simulator instance in the background and communicate with it through DDS, as when deploying on a real robot.
It's also possible to run the trained policy in simulation directly (i.e. without using DDS) by passing the flag --sim to the main.py script, in this case the policy is synchronously run in the simulator.
Currently, three input methods are implemented to control the robot's behavior:
- Unitree controller: when the code is not run in simulation (i.e. no flag
--simANDuse_mujoco: Falsein the configuration file), the default input device is the Unitree Controller - Gamepad: when running in simulation, the code will automatically attempt to detect and use any connected gamepad
- Keyboard: if no gamepad is detected, keyboard inputs are read through the MuJoCo simulator window
The default keybindings are:
start: initialize the robotselect: kill the robot and activate damping modeL1/R1: switch between control policies specified in the configuration file- (in MuJoCo only)
B: toggle the elastic band maintaining the robot in its standing position - (in MuJoCo only)
L2/R2: modify the length of the elastic band
The keyboard inputs are mapped to the generic controller commands as follows:
| Keyboard key | Mapped key |
|---|---|
| Enter | Start |
| Escape | Select |
| A/B/X/Y | A/B/X/Y |
| J | L1 |
| K | R1 |
| I | L2 |
| O | R2 |
This project is licensed under the BSD 2-Clause License - see the LICENSE file for details.
- Valentin Guillet: Core developer
- Côme Perrot: Core developer
- Constant Roux: Core developer
- Victor Lutz: Robot model integration
- Alessandro Trovatello: RSL-RL training implementation
- Olivier Stasse: Project supervisor
To cite this work in a publication:
@misc{h1v2Isaac2025,
author = {Valentin Guillet and Côme Perrot and Constant Roux and Olivier Stasse},
title = {h1v2-Isaac: Reinforcement Learning Framework for Unitree H1-2 Robot},
year = {2025},
howpublished = {\url{https://github.com/Gepetto/h1v2-Isaac}},
}