Weidong Huang* ·
Jingwen Zhang*, † ·
Jiongye Li ·
Shibowen Zhang ·
Jiayang Wu ·
Jiayi Wang ·
Hangxin Liu ·
Yaodong Yang ·
Yao Su†
*Equal contribution · †Corresponding authors
We release ECO training/evaluation code together with baseline implementations: PPO, IPO, P3O, and CRPO.
- Energy as an explicit constraint (not a reward term): ECO reformulates motor energy consumption as an inequality constraint for more interpretable and tunable energy optimization.
- Stable + energy-efficient humanoid walking: Achieves robust locomotion while driving energy down to a target budget via PPO-Lagrangian (primal-dual updates).
- Real-world validation on BRUCE: Demonstrates sim-to-real deployment with substantially reduced energy consumption compared to MPC and standard PPO.
- Emergent efficient behaviors: Reduced body shaking, lighter steps, and less flexed knees—without manually prescribing “efficient gait” heuristics.
ECO (Energy-Constrained Optimization) is a constrained reinforcement learning framework for humanoid locomotion that separates task rewards (e.g., velocity tracking, stability) from energy optimization by treating energy as an explicit constraint.
Instead of tuning many reward weights (often non-intuitive and time-consuming), ECO uses physically meaningful thresholds for constraints:
- Energy constraint: discounted cumulative motor power (torque × joint velocity)
- Reference motion / symmetry constraint: mirror-consistency loss to encourage stable and symmetric gait
ECO is trained with PPO-Lagrangian, which dynamically adjusts the Lagrange multipliers to satisfy constraints during learning.
Project website (demos + videos): https://sites.google.com/view/eco-humanoid
- Constraint RL baselines compared: PPO-Lag (ECO), IPO, P3O, CRPO, plus MPC and standard PPO (reward shaping).
- Sim-to-sim transfer: policies transferred across Isaac Gym → MuJoCo / Gazebo with stable walking and consistent energy reduction.
- Sim-to-real on BRUCE: ECO maintains low motor power near the specified budget while remaining robust.
For qualitative demos and detailed plots, see the project website: https://sites.google.com/view/eco-humanoid
Run all commands from the repository root. Tested environment: Ubuntu 22.04.5 LTS with NVIDIA GeForce RTX 4090.
# create and activate conda env
conda create -n bruce_gym python=3.8 -y
conda activate bruce_gym
# install pytorch (CUDA 11.7)
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 \
--extra-index-url https://download.pytorch.org/whl/cu117
# install Isaac Gym (Preview 4) after PyTorch
# download Isaac Gym from the official NVIDIA Isaac Gym website,
# unzip it, then run:
cd isaacgym/python
pip install -e .
cd ../..
# install project + dependencies from setup.py / requirements.py
pip install -e .Before running Isaac Gym training/play on Ubuntu 22.04, set:
export CC=/usr/bin/gcc-11
export CXX=/usr/bin/g++-11
# use your own environment path (recommended)
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
# optional: if Isaac Gym still cannot find libpython3.8.so.1.0,
# locate it manually and export that directory:
find / -name "*ibpython3.8.so.1.0*" 2>/dev/null
# example only, replace with your own path:
# export LD_LIBRARY_PATH=/home/<your_username>/miniconda3/envs/bruce_gym/lib:$LD_LIBRARY_PATHTraining setup note:
- To reproduce the training curve from the original paper, use
mesh_type = 'plane'andnum_envs = 4096. - Using
num_envs = 8192uses more GPU memory but can converge faster. - Using
mesh_type = 'trimesh'is recommended when targeting a more robust sim-to-real policy. - Current paper-reproduction command setting is
lin_vel_x = [0.1, 0.1]andlin_vel_y = [0.0, 0.0](incommands.ranges). - To train policies that can run at different velocity commands, increase the command ranges (especially
lin_vel_x, and optionallylin_vel_y) instead of fixing them to single values. - These parameter settings can be changed in config files under
bruce_gym/envs/custom/(for examplebruce_gym/envs/custom/brucewalk_config.pyand relatedbrucewalk_config_*.pyfiles).
# ECO (PPO-Lagrangian)
python bruce_gym/scripts/train.py \
--task=bruce_ppolag \
--headless \
--run_name=eco_ppolag_train \
--sim_device=cuda:0 \
--rl_device=cuda:0
# PPO baseline
python bruce_gym/scripts/train.py \
--task=bruce_ppo \
--headless \
--run_name=baseline_ppo_train \
--sim_device=cuda:0 \
--rl_device=cuda:0
# IPO baseline
python bruce_gym/scripts/train.py \
--task=bruce_ipo \
--headless \
--run_name=baseline_ipo_train \
--sim_device=cuda:0 \
--rl_device=cuda:0
# P3O baseline
python bruce_gym/scripts/train.py \
--task=bruce_p3o \
--headless \
--run_name=baseline_p3o_train \
--sim_device=cuda:0 \
--rl_device=cuda:0
# CRPO baseline
python bruce_gym/scripts/train.py \
--task=bruce_crpo \
--headless \
--run_name=baseline_crpo_train \
--sim_device=cuda:0 \
--rl_device=cuda:0We store trained models in logs/exp/. Before evaluation, choose the target run directory from logs/exp/<timestamp>_<run_name> and use that path for --load_run / --load_model.
python bruce_gym/scripts/play.py \
--task=bruce_ppolag \
--load_run=<run_name_or_absolute_run_path> \
--checkpoint=<checkpoint_id>python bruce_gym/scripts/sim2sim_bruce.py \
--load_model=<absolute_run_path> \
--checkpoint=<checkpoint_id>Optional: add --terrain to use terrain MJCF.
Gazebo playback depends on the BRUCE official library.
Install that library first, otherwise bruce_gym/scripts/gazebo/* will fail on BRUCE interface imports.
Also replace ~/.gazebo/models/bruce/model.sdf with bruce_gym/scripts/gazebo/model.sdf before running Gazebo playback.
Typical setup:
# from your BRUCE official library repo
pip install -e .
pip install termcolor==1.1.0
pip install numba==0.49.0
python3 -m Startups.memory_managerStart Gazebo first:
gzserver bruce.worldThen run:
python bruce_gym/scripts/gazebo/test_checkpoint.py \
--task=bruce_ppolag \
--load_run=<run_name_or_absolute_run_path> \
--checkpoint=<checkpoint_id>If your local package path is humanoid/ instead of bruce_gym/, replace the script prefix accordingly.
If you find this work useful, please consider citing:
@ARTICLE{eco_humanoid,
author={Huang, Weidong and Zhang, Jingwen and Li, Jiongye and Zhang, Shibowen and Wu, Jiayang and Wang, Jiayi and Liu, Hangxin and Yang, Yaodong and Su, Yao},
journal={IEEE Transactions on Automation Science and Engineering},
title={ECO: Energy-Constrained Optimization With Reinforcement Learning for Humanoid Walking},
year={2026},
volume={23},
number={},
pages={4861-4876},
keywords={Legged locomotion;Humanoid robots;Energy efficiency;Energy consumption;Tuning;Optimization;Costs;Stability criteria;Reinforcement learning;Automation;Humanoid and bipedal locomotion;constrained reinforcement learning;legged robots},
doi={10.1109/TASE.2026.3662755}}