mjlab-homierl

Compatible with mjlab v1.0.0's API.

Warning

This repository is for LEARNING PURPOSES only.

The current project structure (forking mjlab directly) was chosen to facilitate synchronization with upstream changes during development since last October.

However, this is NOT the recommended structure.

For standard projects, please adopt the External Project Structure. Recommended Codebase Reference: mujocolab/anymal_c_velocity.

homie_demo.mp4

What is mjlab?

mjlab (Zakka, K., Yi, B., Liao, Q., & Le Lay, L. (2025). MJLab: Isaac Lab API, powered by MuJoCo-Warp, for RL and robotics research) combines Isaac Lab's proven API with best-in-class MuJoCo physics to provide lightweight, modular abstractions for RL robotics research and sim-to-real deployment.

Installation

From source:

git clone https://github.com/Nagi-ovo/mjlab-homierl.git
cd mjlab-homierl

Training HOMIE

Important

This repository aims to reproduce the lower-body locomotion reinforcement learning part of the following work using mjlab: HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit (arXiv:2502.13013).

Note

You can also download both locomotion policies from huggingface to skip the training part and start playing :)

Train (w/o hands)

uv run train Mjlab-Homie-Unitree-H1 --env.scene.num-envs 4096

uv run train Mjlab-Homie-Unitree-H1 \
  --gpu-ids 0 1 \
  --env.scene.num-envs 4096

Train (with robotiq hands)

uv run train Mjlab-Homie-Unitree-H1-with_hands --env.scene.num-envs 4096

uv run train Mjlab-Homie-Unitree-H1-with_hands \
  --gpu-ids 0 1 \
  --env.scene.num-envs 4096

Play (w/o hands)

uv run play Mjlab-Homie-Unitree-H1 --checkpoint-file MODEL_PATH --num-envs 30 --viewer viser

Play (with robotiq hands)

uv run play Mjlab-Homie-Unitree-H1-with_hands --checkpoint-file homie_rl.pt --num-envs 30 --viewer viser --device cuda:0

Sanity-check with Dummy Agents

Use built-in agents to sanity check your MDP before training.

uv run play Mjlab-Your-Task-Id --agent zero  # Sends zero actions.
uv run play Mjlab-Your-Task-Id --agent random  # Sends uniform random actions.

✅ Implemented Key Contributions

Note

Reference implementation (for diffing) is vendored in this repo under HOMIE's orignal implementation repo's legged_gym/ (env/MDP) and rsl_rl/ (HIMPPO + estimator). The mjlab reimplementation lives under src/mjlab/tasks/homie/ (env/MDP) and src/mjlab/rl/himppo/ + src/mjlab/tasks/homie/rl/ (algo/runner/export).

0. RL Algorithm + Observation Layout (HIMPPO)

HIMPPO (symmetry + estimator): Ported into mjlab (src/mjlab/rl/himppo/) from the reference rsl_rl/rsl_rl/
Runner wiring: HOMIE tasks use HomieHimOnPolicyRunner so HIMPPO is the default training loop (see src/mjlab/tasks/homie/config/h1/__init__.py, src/mjlab/tasks/homie/rl/him_runner.py)
Observation layout + history: One-step layout + actor history=6 / critic history=1 matches the reference (see src/mjlab/tasks/homie/homie_env_cfg.py, src/mjlab/tasks/homie/mdp/observations.py)
H1 adaptation: Mirror maps + lower-body act_dim=10 are adapted to Unitree H1 (reference G1 uses act_dim=12)

1. Upper-Body Curriculum Learning

Progressive upper-body motion disturbances: Implemented via UpperBodyPoseAction that generates smooth, randomized pose targets for torso and arm joints
Curriculum ratio scaling: upper_body_action_curriculum gradually increases disturbance amplitude (0→1) based on velocity tracking performance (threshold: 0.8)
Smooth interpolation: Uses lerp with configurable rate (default 0.05) to avoid abrupt pose changes
Default pose initialization: Upper body starts at default positions and progressively adds noise

Original Implementation Details:

Samples new upper-body targets every upper_interval control steps (computed from upper_interval_s; default upper_interval_s=1.0s in the reference HOMIE config)
Uses exponential distribution for joint ratio: -1.0 / (20 * (1-ratio*0.99)) * log(1 - u + u * exp(-20 * (1-ratio*0.99)))
Target range: (action_min, action_max) * random_joint_ratio

mjlab Implementation:

Samples new upper-body goal targets on an interval schedule (2.0s by default; see src/mjlab/tasks/homie/config/h1/env_cfgs.py)
Uses uniform joint deltas from a configurable range (scaled by curriculum ratio and clamped to joint limits; see src/mjlab/tasks/homie/config/h1/env_cfgs.py)
Cleaner separation: policy controls only lower-body (legs), upper-body follows internal targets

2. Multi-Task Framework

Three task modes: Velocity tracking, squatting (height control), and standing
Coupled 3-mode command resampling (HOMIE): At each resampling instant, sample mutually-exclusive modes: 1/3 squat (height), 1/2 velocity tracking, 1/6 standing; twist+height are coupled via homie_three_mode (see src/mjlab/tasks/homie/mdp/velocity_command.py)
Height tracking reward: Tracks relative height (pelvis to lowest foot) with configurable target range
Task-specific observations: Height command included in policy observations

Range Configuration:

H1 squat range: 0.40-0.98m (pelvis height relative to the lowest foot site)
Inactive height for non-squat mode (velocity + standing): 0.98m (standing)

Note

The reference HOMIE config uses resampling_time=4.0s for command updates; mjlab currently uses a fixed 5.0s resampling in make_homie_env_cfg().

3. Reward Shaping

Knee deviation penalty: Penalizes using knees incorrectly during height changes
Feet parallel rewards: Encourages symmetric gait and flat foot contact
No-fly + air-time (LeggedGym-style): no_fly and feet_air_time align with the reference single-support and first-contact landing logic
Velocity/height tracking: Dual tracking rewards for locomotion and squatting
Posture rewards: Variable posture constraints (tighter when standing, looser when walking/running)

4. Disturbance Randomization (Random Push + Hand Load)

Random push via external wrenches: Periodic horizontal force pulses applied to the base (pelvis) to simulate external disturbances (see src/mjlab/tasks/homie/config/h1/env_cfgs.py)
Hand load (0–5kg): Constant downward force per hand (per-env, sampled once) applied via xfrc_applied; uses wrist links when hands are enabled, otherwise elbow links
No curriculum: Push and hand load are active from the start at full range

⚠️ Missing/Different Components

1. Upper-Body Motion Details

Exponential distribution sampling: mjlab uses uniform distribution instead of the original's exponential distribution for joint targets

Reference implementation (legged_gym)

# From legged_robot.py line 117-124
if (self.common_step_counter % self.cfg.domain_rand.upper_interval == 0):
    # (NOTE) implementation of upper-body curriculum
    self.random_upper_ratio = min(self.action_curriculum_ratio, 1.0)
    uu = torch.rand(self.num_envs, self.num_actions - self.num_lower_dof, device=self.device)
    # Exponential distribution for joint ratio
    self.random_upper_ratio = -1.0 / (20 * (1-self.random_upper_ratio*0.99))*torch.log(1 - uu + uu * np.exp(-20 * (1-self.random_upper_ratio*0.99)))
    self.random_joint_ratio = self.random_upper_ratio * torch.rand(self.num_envs, self.num_actions - self.num_lower_dof).to(self.device)
    rand_pos = torch.rand(self.num_envs, self.num_actions - self.num_lower_dof, device=self.device) - 0.5
    self.random_upper_actions = ((self.action_min[:, self.num_lower_dof:] * (rand_pos >= 0)) + (self.action_max[:, self.num_lower_dof:] * (rand_pos < 0) ))* self.random_joint_ratio

Per-joint curriculum ratio: Original computes a random ratio per joint; mjlab samples a single ratio per environment (uniform in [0, max_ratio]) and applies it across all upper-body joints

Reference implementation (legged_gym)

# From legged_robot.py line 122
# Each joint gets its own ratio = global_ratio * random_value
self.random_joint_ratio = self.random_upper_ratio * torch.rand(self.num_envs, self.num_actions - self.num_lower_dof).to(self.device)
# Shape: [num_envs, num_upper_body_dofs], different ratio per joint

Sampling cadence / interpolation: Reference samples a new goal every upper_interval steps (computed from upper_interval_s) and integrates delta_upper_actions each step; mjlab samples new goals on a 2.0s schedule but interpolates toward them every step via torch.lerp

Reference implementation (legged_gym)

# From legged_robot.py line 117-126
# Updates happen every upper_interval steps (upper_interval = ceil(upper_interval_s / dt))
if (self.common_step_counter % self.cfg.domain_rand.upper_interval == 0):
    # Sample new goal targets
    self.random_upper_actions = ...
    # Compute delta for smooth interpolation
    self.delta_upper_actions = (self.random_upper_actions - self.current_upper_actions) / (self.cfg.domain_rand.upper_interval)

# Every simulation step (line 126):
self.current_upper_actions += self.delta_upper_actions
actions = torch.cat((actions, self.current_upper_actions), dim=-1)

2. Domain Randomization

Hand payload mass randomization: Not implemented as mass changes in MuJoCo; mjlab-homierl applies an equivalent downward load (0–5kg per hand) via external wrenches instead

Reference implementation (legged_gym)

# From legged_robot.py line 764-765
if self.cfg.domain_rand.randomize_payload_mass:
    self.payload = torch_rand_float(self.cfg.domain_rand.payload_mass_range[0], self.cfg.domain_rand.payload_mass_range[1], (self.num_envs, 1), device=self.device)
    self.hand_payload = torch_rand_float(self.cfg.domain_rand.hand_payload_mass_range[0], self.cfg.domain_rand.hand_payload_mass_range[1], (self.num_envs ,2), device=self.device)

# Applied in _process_rigid_body_props (line 488-490):
props[self.torso_body_index].mass = self.default_rigid_body_mass[self.torso_body_index] + self.payload[env_id, 0]
props[self.left_hand_index].mass = self.default_rigid_body_mass[self.left_hand_index] + self.hand_payload[env_id, 0]
props[self.right_hand_index].mass = self.default_rigid_body_mass[self.right_hand_index] + self.hand_payload[env_id, 1]

Body COM displacement: Original randomizes torso center-of-mass displacement

Reference implementation (legged_gym)

# From legged_robot.py line 769-770
if self.cfg.domain_rand.randomize_body_displacement:
    self.body_displacement = torch_rand_float(self.cfg.domain_rand.body_displacement_range[0], self.cfg.domain_rand.body_displacement_range[1], (self.num_envs, 3), device=self.device)

# Applied in _process_rigid_body_props (line 494-495):
if self.cfg.domain_rand.randomize_body_displacement:
    props[self.torso_body_index].com = self.default_body_com + gymapi.Vec3(self.body_displacement[env_id, 0], self.body_displacement[env_id, 1], self.body_displacement[env_id, 2])

Joint injection noise: Action-level noise injection at torque level

Reference implementation (legged_gym)

# From legged_robot.py line 137-138
# Randomize Joint Injections every step
if self.cfg.domain_rand.randomize_joint_injection:
    self.joint_injection = torch_rand_float(self.cfg.domain_rand.joint_injection_range[0], self.cfg.domain_rand.joint_injection_range[1], (self.num_envs, self.num_dof), device=self.device) * self.torque_limits.unsqueeze(0)

# Added to torques in _compute_torques (line 552):
torques = self.p_gains * self.Kp_factors * (self.joint_pos_target - self.dof_pos) - self.d_gains * self.Kd_factors * self.dof_vel
torques = torques + self.actuation_offset + self.joint_injection

Actuation offset: Per-DOF torque offset randomization

Reference implementation (legged_gym)

# From legged_robot.py line 761-762
if self.cfg.domain_rand.randomize_actuation_offset:
    self.actuation_offset = torch_rand_float(self.cfg.domain_rand.actuation_offset_range[0], self.cfg.domain_rand.actuation_offset_range[1], (self.num_envs, self.num_dof), device=self.device) * self.torque_limits.unsqueeze(0)

# Re-randomized on reset (line 273-274):
if self.cfg.domain_rand.randomize_actuation_offset:
    self.actuation_offset[env_ids] = torch_rand_float(self.cfg.domain_rand.actuation_offset_range[0], self.cfg.domain_rand.actuation_offset_range[1], (len(env_ids), self.num_dof), device=self.device) * self.torque_limits.unsqueeze(0)

Delay simulation: Action delay randomization for sim-to-real transfer

Reference implementation (legged_gym)

# From legged_robot.py line 130-134
self.delayed_actions = self.actions.clone().view(1, self.num_envs, self.num_actions).repeat(self.cfg.control.decimation, 1, 1)
delay_steps = torch.randint(0, self.cfg.control.decimation, (self.num_envs, 1), device=self.device)
if self.cfg.domain_rand.delay:
    for i in range(self.cfg.control.decimation):
        self.delayed_actions[i] = self.last_actions + (self.actions - self.last_actions) * (i >= delay_steps)

3. Control Architecture

Mixed control mode (M): Original uses torque control for lower-body (legs) and position control for upper-body (torso+arms) simultaneously

Reference implementation (legged_gym)

# From legged_robot.py line 558-564
elif control_type=="M":  # Mixed mode
    # Compute PD torques for all joints
    torques = self.p_gains * self.Kp_factors * (
            self.joint_pos_target - self.dof_pos) - self.d_gains * self.Kd_factors * self.dof_vel
    
    torques = torques + self.actuation_offset + self.joint_injection
    torques = torch.clip(torques, -self.torque_limits, self.torque_limits)
    # Return: [torques for lower body, position targets for upper body]
    return torch.cat((torques[..., :self.num_lower_dof], self.joint_pos_target[..., self.num_lower_dof:]), dim=-1)

# Applied in step() (line 144-146):
self.torques = self._compute_torques(self.actions).view(self.torques.shape)
# Send both force and position commands
self.gym.set_dof_actuation_force_tensor(self.sim, gymtorch.unwrap_tensor(self.torques))
self.gym.set_dof_position_target_tensor(self.sim, gymtorch.unwrap_tensor(self.torques))

Key Differences: Original vs. mjlab Implementation

Reference legged_gym (Mixed Control Mode, G1):

Lower-body (legs): Policy outputs 12 dimensions → converted to torque control
Upper-body (torso+arms): Randomly generated position targets → position control
Simultaneously sends torque commands and position targets to the simulator

mjlab Current Implementation (Pure Position Control):

Lower-body (legs): Policy outputs 10 dimensions (H1) → converted to position targets → position control
Upper-body (torso+arms): Randomly generated position targets → position control
All joints use position control

Why Mixed Mode Matters:

Real-world robots often use Mixed mode: Torque control for legs to handle ground contact, and position control for the upper body to track poses.
Consistency reduces sim-to-real gap: Training with the target deployment control mode from the start is critical.
Control Dynamics: Torque control is more direct but requires the policy to maintain stability; position control is inherently more stable but may lack the necessary compliance or response speed.

TODO: For sim-to-real deployment, it is recommended to implement Mixed control mode or verify the performance of pure position control on the physical hardware.

Random push implementation: mjlab-homierl implements periodic pushes (4s cadence) but uses MuJoCo external wrenches instead of instantaneous base-velocity impulses

Reference implementation (legged_gym)

# From g1_29dof_config.py line 198
upper_interval_s = 1  # Update upper-body targets every 1 second
push_interval_s = 4   # Push robot every 4 seconds

# Used in legged_robot.py:
# Line 117: Upper-body updates
if (self.common_step_counter % self.cfg.domain_rand.upper_interval == 0):
    # Sample new upper-body targets

# Line 514: Robot pushing
if self.cfg.domain_rand.push_robots and (self.common_step_counter % self.cfg.domain_rand.push_interval == 0):
    self._push_robots()

Action delay buffer: Implements delayed action execution with random delay steps

Reference implementation (legged_gym)

# From legged_robot.py line 130-134
# Create a buffer for decimation steps
self.delayed_actions = self.actions.clone().view(1, self.num_envs, self.num_actions).repeat(self.cfg.control.decimation, 1, 1)
# Random delay: 0 to decimation-1 steps
delay_steps = torch.randint(0, self.cfg.control.decimation, (self.num_envs, 1), device=self.device)
if self.cfg.domain_rand.delay:
    for i in range(self.cfg.control.decimation):
        # Linearly interpolate from last_action to new action based on delay
        self.delayed_actions[i] = self.last_actions + (self.actions - self.last_actions) * (i >= delay_steps)

4. Observation/State Details

IMU angular velocity: mjlab uses a dedicated IMU ang-vel sensor (robot/imu_ang_vel) in the one-step observation
IMU projected gravity: mjlab currently uses root-link projected_gravity (reference uses the IMU link orientation)

Reference implementation (legged_gym)

# From legged_robot.py line 316-317, 336-337
# Uses IMU link instead of base link for observations
imu_ang_vel = quat_rotate_inverse(self.rigid_body_states[:, self.imu_index,3:7], self.rigid_body_states[:, self.imu_index,10:13])
imu_projected_gravity = quat_rotate_inverse(self.rigid_body_states[:, self.imu_index,3:7], self.gravity_vec)

# Observations include IMU readings (line 318-325):
current_obs = torch.cat((self.commands[:, :3] * self.commands_scale,
                        self.commands[:, 4].unsqueeze(1),
                        imu_ang_vel  * self.obs_scales.ang_vel,
                        imu_projected_gravity,
                        (self.dof_pos - self.default_dof_pos) * self.obs_scales.dof_pos,
                        self.dof_vel * self.obs_scales.dof_vel,
                        self.actions[:, :12],
                        ),dim=-1)

Joint power tracking: Tracks 100-step history of joint power consumption

Reference implementation (legged_gym)

# From legged_robot.py line 777
# Buffer initialization
self.joint_powers = torch.zeros(self.num_envs, 100, self.num_dof, dtype=torch.float, device=self.device, requires_grad=False)

# Updated every step (line 191-193):
joint_power = torch.abs(self.torques * self.dof_vel).unsqueeze(1)
# Rolling window: shift left and append new power
self.joint_powers = torch.cat((self.joint_powers[:, 1:], joint_power), dim=1)

# Reset on episode end (line 258):
self.joint_powers[env_ids] = 0.

Documentation

Full documentation is available at mujocolab.github.io/mjlab.

Development

Run tests:

make test          # Run all tests
make test-fast     # Skip slow integration tests

Format code:

uvx pre-commit install
make format

Compile documentation locally:

uv pip install -r docs/requirements.txt
make docs

License

mjlab is licensed under the Apache License, Version 2.0.

Acknowledgments

HOMIE: HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit.
mjlab: Zakka, K., Yi, B., Liao, Q., & Le Lay, L. (2025). MJLab: Isaac Lab API, powered by MuJoCo-Warp, for RL and robotics research. (Version 0.1.0) [Computer software].

Name		Name	Last commit message	Last commit date
Latest commit History 726 Commits
.claude		.claude
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
scripts		scripts
src/mjlab		src/mjlab
tests		tests
typings		typings
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
homie_rl.pt		homie_rl.pt
pyproject.toml		pyproject.toml
run_demo.py		run_demo.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mjlab-homierl

What is mjlab?

Installation

Training HOMIE

Train (w/o hands)

Train (with robotiq hands)

Play (w/o hands)

Play (with robotiq hands)

Sanity-check with Dummy Agents

✅ Implemented Key Contributions

0. RL Algorithm + Observation Layout (HIMPPO)

1. Upper-Body Curriculum Learning

2. Multi-Task Framework

3. Reward Shaping

4. Disturbance Randomization (Random Push + Hand Load)

⚠️ Missing/Different Components

1. Upper-Body Motion Details

2. Domain Randomization

3. Control Architecture

4. Observation/State Details

Documentation

Development

License

Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mjlab-homierl

What is mjlab?

Installation

Training HOMIE

Train (w/o hands)

Train (with robotiq hands)

Play (w/o hands)

Play (with robotiq hands)

Sanity-check with Dummy Agents

✅ Implemented Key Contributions

0. RL Algorithm + Observation Layout (HIMPPO)

1. Upper-Body Curriculum Learning

2. Multi-Task Framework

3. Reward Shaping

4. Disturbance Randomization (Random Push + Hand Load)

⚠️ Missing/Different Components

1. Upper-Body Motion Details

2. Domain Randomization

3. Control Architecture

4. Observation/State Details

Documentation

Development

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages