Compatible with mjlab v1.0.0's API.
Warning
This repository is for LEARNING PURPOSES only.
The current project structure (forking mjlab directly) was chosen to facilitate synchronization with upstream changes during development since last October.
However, this is NOT the recommended structure.
For standard projects, please adopt the External Project Structure. Recommended Codebase Reference: mujocolab/anymal_c_velocity.
homie_demo.mp4
mjlab (Zakka, K., Yi, B., Liao, Q., & Le Lay, L. (2025). MJLab: Isaac Lab API, powered by MuJoCo-Warp, for RL and robotics research) combines Isaac Lab's proven API with best-in-class MuJoCo physics to provide lightweight, modular abstractions for RL robotics research and sim-to-real deployment.
From source:
git clone https://github.com/Nagi-ovo/mjlab-homierl.git
cd mjlab-homierlImportant
This repository aims to reproduce the lower-body locomotion reinforcement learning part of the following work using mjlab: HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit (arXiv:2502.13013).
Note
You can also download both locomotion policies from huggingface to skip the training part and start playing :)
uv run train Mjlab-Homie-Unitree-H1 --env.scene.num-envs 4096
uv run train Mjlab-Homie-Unitree-H1 \
--gpu-ids 0 1 \
--env.scene.num-envs 4096uv run train Mjlab-Homie-Unitree-H1-with_hands --env.scene.num-envs 4096
uv run train Mjlab-Homie-Unitree-H1-with_hands \
--gpu-ids 0 1 \
--env.scene.num-envs 4096uv run play Mjlab-Homie-Unitree-H1 --checkpoint-file MODEL_PATH --num-envs 30 --viewer viseruv run play Mjlab-Homie-Unitree-H1-with_hands --checkpoint-file homie_rl.pt --num-envs 30 --viewer viser --device cuda:0Use built-in agents to sanity check your MDP before training.
uv run play Mjlab-Your-Task-Id --agent zero # Sends zero actions.
uv run play Mjlab-Your-Task-Id --agent random # Sends uniform random actions.Note
Reference implementation (for diffing) is vendored in this repo under HOMIE's orignal implementation repo's legged_gym/ (env/MDP) and rsl_rl/ (HIMPPO + estimator).
The mjlab reimplementation lives under src/mjlab/tasks/homie/ (env/MDP) and src/mjlab/rl/himppo/ + src/mjlab/tasks/homie/rl/ (algo/runner/export).
- HIMPPO (symmetry + estimator): Ported into mjlab (
src/mjlab/rl/himppo/) from the referencersl_rl/rsl_rl/ - Runner wiring: HOMIE tasks use
HomieHimOnPolicyRunnerso HIMPPO is the default training loop (seesrc/mjlab/tasks/homie/config/h1/__init__.py,src/mjlab/tasks/homie/rl/him_runner.py) - Observation layout + history: One-step layout + actor history=6 / critic history=1 matches the reference (see
src/mjlab/tasks/homie/homie_env_cfg.py,src/mjlab/tasks/homie/mdp/observations.py) - H1 adaptation: Mirror maps + lower-body
act_dim=10are adapted to Unitree H1 (reference G1 usesact_dim=12)
- Progressive upper-body motion disturbances: Implemented via
UpperBodyPoseActionthat generates smooth, randomized pose targets for torso and arm joints - Curriculum ratio scaling:
upper_body_action_curriculumgradually increases disturbance amplitude (0→1) based on velocity tracking performance (threshold: 0.8) - Smooth interpolation: Uses lerp with configurable rate (default 0.05) to avoid abrupt pose changes
- Default pose initialization: Upper body starts at default positions and progressively adds noise
Original Implementation Details:
- Samples new upper-body targets every
upper_intervalcontrol steps (computed fromupper_interval_s; defaultupper_interval_s=1.0sin the reference HOMIE config) - Uses exponential distribution for joint ratio:
-1.0 / (20 * (1-ratio*0.99)) * log(1 - u + u * exp(-20 * (1-ratio*0.99))) - Target range:
(action_min, action_max) * random_joint_ratio
mjlab Implementation:
- Samples new upper-body goal targets on an interval schedule (2.0s by default; see
src/mjlab/tasks/homie/config/h1/env_cfgs.py) - Uses uniform joint deltas from a configurable range (scaled by curriculum ratio and clamped to joint limits; see
src/mjlab/tasks/homie/config/h1/env_cfgs.py) - Cleaner separation: policy controls only lower-body (legs), upper-body follows internal targets
- Three task modes: Velocity tracking, squatting (height control), and standing
- Coupled 3-mode command resampling (HOMIE): At each resampling instant, sample mutually-exclusive modes: 1/3 squat (height), 1/2 velocity tracking, 1/6 standing; twist+height are coupled via
homie_three_mode(seesrc/mjlab/tasks/homie/mdp/velocity_command.py) - Height tracking reward: Tracks relative height (pelvis to lowest foot) with configurable target range
- Task-specific observations: Height command included in policy observations
Range Configuration:
- H1 squat range: 0.40-0.98m (pelvis height relative to the lowest foot site)
- Inactive height for non-squat mode (velocity + standing): 0.98m (standing)
Note
The reference HOMIE config uses resampling_time=4.0s for command updates; mjlab currently uses a fixed 5.0s resampling in make_homie_env_cfg().
- Knee deviation penalty: Penalizes using knees incorrectly during height changes
- Feet parallel rewards: Encourages symmetric gait and flat foot contact
- No-fly + air-time (LeggedGym-style):
no_flyandfeet_air_timealign with the reference single-support and first-contact landing logic - Velocity/height tracking: Dual tracking rewards for locomotion and squatting
- Posture rewards: Variable posture constraints (tighter when standing, looser when walking/running)
- Random push via external wrenches: Periodic horizontal force pulses applied to the base (
pelvis) to simulate external disturbances (seesrc/mjlab/tasks/homie/config/h1/env_cfgs.py) - Hand load (0–5kg): Constant downward force per hand (per-env, sampled once) applied via
xfrc_applied; uses wrist links when hands are enabled, otherwise elbow links - No curriculum: Push and hand load are active from the start at full range
- Exponential distribution sampling: mjlab uses uniform distribution instead of the original's exponential distribution for joint targets
Reference implementation (legged_gym)
# From legged_robot.py line 117-124 if (self.common_step_counter % self.cfg.domain_rand.upper_interval == 0): # (NOTE) implementation of upper-body curriculum self.random_upper_ratio = min(self.action_curriculum_ratio, 1.0) uu = torch.rand(self.num_envs, self.num_actions - self.num_lower_dof, device=self.device) # Exponential distribution for joint ratio self.random_upper_ratio = -1.0 / (20 * (1-self.random_upper_ratio*0.99))*torch.log(1 - uu + uu * np.exp(-20 * (1-self.random_upper_ratio*0.99))) self.random_joint_ratio = self.random_upper_ratio * torch.rand(self.num_envs, self.num_actions - self.num_lower_dof).to(self.device) rand_pos = torch.rand(self.num_envs, self.num_actions - self.num_lower_dof, device=self.device) - 0.5 self.random_upper_actions = ((self.action_min[:, self.num_lower_dof:] * (rand_pos >= 0)) + (self.action_max[:, self.num_lower_dof:] * (rand_pos < 0) ))* self.random_joint_ratio
- Per-joint curriculum ratio: Original computes a random ratio per joint; mjlab samples a single ratio per environment (uniform in
[0, max_ratio]) and applies it across all upper-body joints
Reference implementation (legged_gym)
# From legged_robot.py line 122 # Each joint gets its own ratio = global_ratio * random_value self.random_joint_ratio = self.random_upper_ratio * torch.rand(self.num_envs, self.num_actions - self.num_lower_dof).to(self.device) # Shape: [num_envs, num_upper_body_dofs], different ratio per joint
- Sampling cadence / interpolation: Reference samples a new goal every
upper_intervalsteps (computed fromupper_interval_s) and integratesdelta_upper_actionseach step; mjlab samples new goals on a 2.0s schedule but interpolates toward them every step viatorch.lerp
Reference implementation (legged_gym)
# From legged_robot.py line 117-126 # Updates happen every upper_interval steps (upper_interval = ceil(upper_interval_s / dt)) if (self.common_step_counter % self.cfg.domain_rand.upper_interval == 0): # Sample new goal targets self.random_upper_actions = ... # Compute delta for smooth interpolation self.delta_upper_actions = (self.random_upper_actions - self.current_upper_actions) / (self.cfg.domain_rand.upper_interval) # Every simulation step (line 126): self.current_upper_actions += self.delta_upper_actions actions = torch.cat((actions, self.current_upper_actions), dim=-1)
- Hand payload mass randomization: Not implemented as mass changes in MuJoCo; mjlab-homierl applies an equivalent downward load (0–5kg per hand) via external wrenches instead
Reference implementation (legged_gym)
# From legged_robot.py line 764-765 if self.cfg.domain_rand.randomize_payload_mass: self.payload = torch_rand_float(self.cfg.domain_rand.payload_mass_range[0], self.cfg.domain_rand.payload_mass_range[1], (self.num_envs, 1), device=self.device) self.hand_payload = torch_rand_float(self.cfg.domain_rand.hand_payload_mass_range[0], self.cfg.domain_rand.hand_payload_mass_range[1], (self.num_envs ,2), device=self.device) # Applied in _process_rigid_body_props (line 488-490): props[self.torso_body_index].mass = self.default_rigid_body_mass[self.torso_body_index] + self.payload[env_id, 0] props[self.left_hand_index].mass = self.default_rigid_body_mass[self.left_hand_index] + self.hand_payload[env_id, 0] props[self.right_hand_index].mass = self.default_rigid_body_mass[self.right_hand_index] + self.hand_payload[env_id, 1]
- Body COM displacement: Original randomizes torso center-of-mass displacement
Reference implementation (legged_gym)
# From legged_robot.py line 769-770 if self.cfg.domain_rand.randomize_body_displacement: self.body_displacement = torch_rand_float(self.cfg.domain_rand.body_displacement_range[0], self.cfg.domain_rand.body_displacement_range[1], (self.num_envs, 3), device=self.device) # Applied in _process_rigid_body_props (line 494-495): if self.cfg.domain_rand.randomize_body_displacement: props[self.torso_body_index].com = self.default_body_com + gymapi.Vec3(self.body_displacement[env_id, 0], self.body_displacement[env_id, 1], self.body_displacement[env_id, 2])
- Joint injection noise: Action-level noise injection at torque level
Reference implementation (legged_gym)
# From legged_robot.py line 137-138 # Randomize Joint Injections every step if self.cfg.domain_rand.randomize_joint_injection: self.joint_injection = torch_rand_float(self.cfg.domain_rand.joint_injection_range[0], self.cfg.domain_rand.joint_injection_range[1], (self.num_envs, self.num_dof), device=self.device) * self.torque_limits.unsqueeze(0) # Added to torques in _compute_torques (line 552): torques = self.p_gains * self.Kp_factors * (self.joint_pos_target - self.dof_pos) - self.d_gains * self.Kd_factors * self.dof_vel torques = torques + self.actuation_offset + self.joint_injection
- Actuation offset: Per-DOF torque offset randomization
Reference implementation (legged_gym)
# From legged_robot.py line 761-762 if self.cfg.domain_rand.randomize_actuation_offset: self.actuation_offset = torch_rand_float(self.cfg.domain_rand.actuation_offset_range[0], self.cfg.domain_rand.actuation_offset_range[1], (self.num_envs, self.num_dof), device=self.device) * self.torque_limits.unsqueeze(0) # Re-randomized on reset (line 273-274): if self.cfg.domain_rand.randomize_actuation_offset: self.actuation_offset[env_ids] = torch_rand_float(self.cfg.domain_rand.actuation_offset_range[0], self.cfg.domain_rand.actuation_offset_range[1], (len(env_ids), self.num_dof), device=self.device) * self.torque_limits.unsqueeze(0)
- Delay simulation: Action delay randomization for sim-to-real transfer
Reference implementation (legged_gym)
# From legged_robot.py line 130-134 self.delayed_actions = self.actions.clone().view(1, self.num_envs, self.num_actions).repeat(self.cfg.control.decimation, 1, 1) delay_steps = torch.randint(0, self.cfg.control.decimation, (self.num_envs, 1), device=self.device) if self.cfg.domain_rand.delay: for i in range(self.cfg.control.decimation): self.delayed_actions[i] = self.last_actions + (self.actions - self.last_actions) * (i >= delay_steps)
- Mixed control mode (M): Original uses torque control for lower-body (legs) and position control for upper-body (torso+arms) simultaneously
Reference implementation (legged_gym)
# From legged_robot.py line 558-564 elif control_type=="M": # Mixed mode # Compute PD torques for all joints torques = self.p_gains * self.Kp_factors * ( self.joint_pos_target - self.dof_pos) - self.d_gains * self.Kd_factors * self.dof_vel torques = torques + self.actuation_offset + self.joint_injection torques = torch.clip(torques, -self.torque_limits, self.torque_limits) # Return: [torques for lower body, position targets for upper body] return torch.cat((torques[..., :self.num_lower_dof], self.joint_pos_target[..., self.num_lower_dof:]), dim=-1) # Applied in step() (line 144-146): self.torques = self._compute_torques(self.actions).view(self.torques.shape) # Send both force and position commands self.gym.set_dof_actuation_force_tensor(self.sim, gymtorch.unwrap_tensor(self.torques)) self.gym.set_dof_position_target_tensor(self.sim, gymtorch.unwrap_tensor(self.torques))
Key Differences: Original vs. mjlab Implementation
Reference legged_gym (Mixed Control Mode, G1):
- Lower-body (legs): Policy outputs 12 dimensions → converted to torque control
- Upper-body (torso+arms): Randomly generated position targets → position control
- Simultaneously sends torque commands and position targets to the simulator
mjlab Current Implementation (Pure Position Control):
- Lower-body (legs): Policy outputs 10 dimensions (H1) → converted to position targets → position control
- Upper-body (torso+arms): Randomly generated position targets → position control
- All joints use position control
Why Mixed Mode Matters:
- Real-world robots often use Mixed mode: Torque control for legs to handle ground contact, and position control for the upper body to track poses.
- Consistency reduces sim-to-real gap: Training with the target deployment control mode from the start is critical.
- Control Dynamics: Torque control is more direct but requires the policy to maintain stability; position control is inherently more stable but may lack the necessary compliance or response speed.
TODO: For sim-to-real deployment, it is recommended to implement Mixed control mode or verify the performance of pure position control on the physical hardware.
- Random push implementation: mjlab-homierl implements periodic pushes (4s cadence) but uses MuJoCo external wrenches instead of instantaneous base-velocity impulses
Reference implementation (legged_gym)
# From g1_29dof_config.py line 198 upper_interval_s = 1 # Update upper-body targets every 1 second push_interval_s = 4 # Push robot every 4 seconds # Used in legged_robot.py: # Line 117: Upper-body updates if (self.common_step_counter % self.cfg.domain_rand.upper_interval == 0): # Sample new upper-body targets # Line 514: Robot pushing if self.cfg.domain_rand.push_robots and (self.common_step_counter % self.cfg.domain_rand.push_interval == 0): self._push_robots()
- Action delay buffer: Implements delayed action execution with random delay steps
Reference implementation (legged_gym)
# From legged_robot.py line 130-134 # Create a buffer for decimation steps self.delayed_actions = self.actions.clone().view(1, self.num_envs, self.num_actions).repeat(self.cfg.control.decimation, 1, 1) # Random delay: 0 to decimation-1 steps delay_steps = torch.randint(0, self.cfg.control.decimation, (self.num_envs, 1), device=self.device) if self.cfg.domain_rand.delay: for i in range(self.cfg.control.decimation): # Linearly interpolate from last_action to new action based on delay self.delayed_actions[i] = self.last_actions + (self.actions - self.last_actions) * (i >= delay_steps)
- IMU angular velocity: mjlab uses a dedicated IMU ang-vel sensor (
robot/imu_ang_vel) in the one-step observation - IMU projected gravity: mjlab currently uses root-link
projected_gravity(reference uses the IMU link orientation)
Reference implementation (legged_gym)
# From legged_robot.py line 316-317, 336-337 # Uses IMU link instead of base link for observations imu_ang_vel = quat_rotate_inverse(self.rigid_body_states[:, self.imu_index,3:7], self.rigid_body_states[:, self.imu_index,10:13]) imu_projected_gravity = quat_rotate_inverse(self.rigid_body_states[:, self.imu_index,3:7], self.gravity_vec) # Observations include IMU readings (line 318-325): current_obs = torch.cat((self.commands[:, :3] * self.commands_scale, self.commands[:, 4].unsqueeze(1), imu_ang_vel * self.obs_scales.ang_vel, imu_projected_gravity, (self.dof_pos - self.default_dof_pos) * self.obs_scales.dof_pos, self.dof_vel * self.obs_scales.dof_vel, self.actions[:, :12], ),dim=-1)
- Joint power tracking: Tracks 100-step history of joint power consumption
Reference implementation (legged_gym)
# From legged_robot.py line 777 # Buffer initialization self.joint_powers = torch.zeros(self.num_envs, 100, self.num_dof, dtype=torch.float, device=self.device, requires_grad=False) # Updated every step (line 191-193): joint_power = torch.abs(self.torques * self.dof_vel).unsqueeze(1) # Rolling window: shift left and append new power self.joint_powers = torch.cat((self.joint_powers[:, 1:], joint_power), dim=1) # Reset on episode end (line 258): self.joint_powers[env_ids] = 0.
Full documentation is available at mujocolab.github.io/mjlab.
Run tests:
make test # Run all tests
make test-fast # Skip slow integration testsFormat code:
uvx pre-commit install
make formatCompile documentation locally:
uv pip install -r docs/requirements.txt
make docsmjlab is licensed under the Apache License, Version 2.0.
- HOMIE: HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit.
- mjlab: Zakka, K., Yi, B., Liao, Q., & Le Lay, L. (2025). MJLab: Isaac Lab API, powered by MuJoCo-Warp, for RL and robotics research. (Version 0.1.0) [Computer software].
