This repository provides the Wasserstein Adversarial Behavior Imitation (WASABI) algorithm that enables Solo to acquire agile skills through adversarial imitation from rough, partial demonstrations using NVIDIA Isaac Gym.
Paper: Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations
Project website: https://sites.google.com/view/corl2022-wasabi/home
Maintainer: Chenhao Li
Affiliation: Autonomous Learning Group, Max Planck Institute for Intelligent Systems, and Robotic Systems Lab, ETH Zurich
Contact: [email protected]
-
Create a new python virtual environment with
python 3.8 -
Install
pytorch 1.10withcuda-11.3pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html -
Install Isaac Gym
-
Download and install Isaac Gym Preview 4
cd isaacgym/python pip install -e . -
Try running an example
cd examples python 1080_balls_of_solitude.py -
For troubleshooting, check docs in
isaacgym/docs/index.html
-
-
Install
solo_gymgit clone https://github.com/martius-lab/wasabi.git cd solo_gym pip install -e .
- The Solo environment is defined by an env file
solo8.pyand a config filesolo8_config.pyundersolo_gym/envs/solo8/. The config file sets both the environment parameters in classSolo8FlatCfgand the training parameters in classSolo8FlatCfgPPO. - The provided code examplifies the training of Solo 8 with handheld wave motions. 20 recorded demonstrations are augmented with perturbations to 1000 trajectoires with 130 frames and stored in
resources/robots/solo8/datasets/motion_data.pt. The state dimension indices are specified inreference_state_idx_dict.json. To train with other demonstrations, replacemotion_data.ptand adapt reward functions defined insolo_gym/envs/solo8/solo8.pyaccordingly.
python scripts/train.py --task solo8
- The trained policy is saved in
logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt, where<experiment_name>and<run_name>are defined in the train config. - To disable rendering, append
--headless.
python scripts/play.py
- By default the loaded policy is the last model of the last run of the experiment folder.
- Other runs/model iteration can be selected by setting
load_runandcheckpointin the train config. - Use
uandjto command the forward velocity.
@inproceedings{li2023learning,
title={Learning agile skills via adversarial imitation of rough partial demonstrations},
author={Li, Chenhao and Vlastelica, Marin and Blaes, Sebastian and Frey, Jonas and Grimminger, Felix and Martius, Georg},
booktitle={Conference on Robot Learning},
pages={342--352},
year={2023},
organization={PMLR}
}
The code is built upon the open-sourced Isaac Gym Environments for Legged Robots and the PPO implementation. We refer to the original repositories for more details.
