- Update the
prefix
parameter inenvironment.yml
- Build Python environment with following command
conda env create -f environment.yml
- Collect the multi-view data from Metaworld with the following command, make sure you have installed mujoco, and we recommend using mujoco-210.
python collect_data/collect_multi_view_data.py
- Train the view-invariant encoder by running, the configs of training is referred to path
configs/config.yaml
:
python tokenizer_main.py
- Collect the single-view data for COMBO with the following command:
python collect_data/collect_world_model_training_data.py --env_name ${your_metaworld_env_name}
- Running COMBO with the following command. A self-trained checkpoint can be found in "checkpoints/multiview_v0/model.pth" with the default model config in "configs/config.yaml". We provide three settings for evaluation:
- Training View:
python rl_main.py --env_name ${your_metaworld_env_name} --env_mode "normal"
- Novel View(CIP):
python rl_main.py --env_name ${your_metaworld_env_name} --env_mode "novel" --camera_change ${change_of_azimuth}
- Shaking View(CSH):
python rl_main.py --env_name ${your_metaworld_env_name} --env_mode "shake"
We would like to thank the authors of OfflineRLKit for their great work and generously providing source codes, which inspired our work and helped us a lot in the implementation.
If you find our work helpful, please consider citing:
@inproceedings{pang2025reviwo,
title={Learning View-invariant World Models for Visual Robotic Manipulation},
author={Pang, jingcheng and Tang, nan and Li, kaiyuan, and Tang, Yuting and Cai, Xin-Qiang and Zhang, Zhen-Yu and Niu, Gang and Masashi, Sugiyama and Yu, yang},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025}
}