Isaac Lab DX1 Quadruped Robot RL

Teaching DX1, my quadruped robot, how to walk using reinforcement learning.

DX1.MOV

Technical Summary

Model Design & Setup

This project began by designing a custom robot dog model in Fusion360 and importing it into Isaac Sim as a USD file. The model includes separate USD files for base geometry, physics properties, and sensor configurations. Terrain and rigid body simulation parameters were configured to enable reinforcement learning training.

Algorithm Choice: PPO

Choosing the right reward functions is critical for emerging a proper walking gait. Otherwise, the agent can learn unwanted behaviors like crawling (dragging the body) or jumping (excessive vertical motion). This project uses the PPO algorithm, chosen for its on-policy nature and built-in clipping mechanism. On-policy learning provides more stable, conservative updates by only using data from the current policy, compared to off-policy methods like SAC which reuse old data (more sample-efficient but prone to distribution mismatch). PPO's clipping constrains policy updates, making it safer where large changes can cause instability.

Reward Function Design

Primary task rewards:

track_lin_vel_xy_exp: Primary task reward for tracking desired linear velocity commands in the xy-plane.
track_ang_vel_z_exp: Rewards tracking desired angular velocity (yaw) commands.

Behavior-shaping rewards:

feet_air_time: Rewards proper foot lifting above a threshold (0.5s), preventing crawling behavior and encouraging rhythmic step cycles.
flat_orientation_l2: Maintains stable body orientation during locomotion.
undesired_contacts: Prevents contact on non-foot body parts (thighs, base), ensuring only feet touch the ground.

Stability and efficiency penalties:

lin_vel_z_l2: Penalizes vertical velocity to prevent jumping and keep the robot grounded.
ang_vel_xy_l2: Penalizes unwanted angular velocities (roll/pitch) for stability.
dof_torques_l2: Encourages energy-efficient movements by penalizing joint torques.
action_rate_l2: Penalizes rapid action changes for smoother control.

The combination of these rewards shapes a stable, omnidirectional walking gait that tracks velocity commands while avoiding crawling and jumping behaviors, emerging a natural-looking walking gait.

Domain Randomization for Sim-to-Real Transfer

To bridge the sim-to-real gap, domain randomization is applied during training, including randomizing robot mass distribution, initial poses and velocities, joint positions, and periodic external disturbances (lateral pushes every 10-15s). This helps make the policy robust and generalize to real-world variations in terrain and unmodeled dynamics, improving transfer from sim-to-real.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
DX1_RL		DX1_RL
ModelAssets		ModelAssets
ModelAssets2		ModelAssets2
.gitignore		.gitignore
DX1.py		DX1.py
DX1_Scene.usd		DX1_Scene.usd
README.md		README.md
training_results.txt		training_results.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Isaac Lab DX1 Quadruped Robot RL

Technical Summary

Model Design & Setup

Algorithm Choice: PPO

Reward Function Design

Domain Randomization for Sim-to-Real Transfer

About

Uh oh!

Releases

Packages

Languages

wilsonchenghy/DX1_Quadruped_Robot_Reinforcement_Learning

Folders and files

Latest commit

History

Repository files navigation

Isaac Lab DX1 Quadruped Robot RL

Technical Summary

Model Design & Setup

Algorithm Choice: PPO

Reward Function Design

Domain Randomization for Sim-to-Real Transfer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages