Skip to content

[CVPR 2025] InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

Notifications You must be signed in to change notification settings

wzyabcas/InterAct

Repository files navigation

InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

Sirui Xu*Dongting Li*Yucheng Zhang*Xiyan Xu*Qi Long*Ziyin Wang*Yunzhi LuShuchang DongHezi JiangAkshat GuptaYu-Xiong WangLiang-Yan Gui
University of Illinois Urbana Champaign
*Equal contribution
CVPR 2025

News

  • [2025-04-20] Initial release of the InterAct dataset

TODO

  • Release comprehensive text descriptions, data processing workflows, visualization tools, and usage guidelines
  • Publish the paper on arXiv
  • Release the evaluation pipeline for the benchmark
  • Release the dataset with unified SMPL representation
  • Release HOI correction and augmentation data and pipeline
  • Release retargeted HOI dataset with unified human shape
  • Release baseline constructions for HOI generative tasks

General Description

We introduce InterAct, a comprehensive large-scale 3D human-object interaction (HOI) dataset, originally comprising 21.81 hours of HOI data consolidated from diverse sources, the dataset is meticulously refined by correcting contact artifacts and augmented with varied motion patterns to extend the total duration to approximately 30 hours. It includes 34.1K sequence-level detailed text descriptions.

Dataset Download

The InterAct dataset is consolidated according to the licenses of its original data sources. For data approved for redistribution, direct download links are provided; for others, we supply processing code to convert the raw data into our standardized format.

Please follow the steps below to download, process, and organize the data.

1. Request authorization

Please fill out this form to request non-commercial access to InterAct. Once authorized, you'll receive the download links. Organize the data from neuraldome, imhd, and chairs according to the following directory structure.

data
│── neuraldome
│   ├── objects
│   │   └── baseball
│   │       ├── baseball.obj             # object mesh
│   │       └── sample_points.npy        # sampled object pointcloud
│   		└── ...
│   ├── objects_bps
│   │   └── baseball
│   │       └── baseball.npy             # static bps representation
│   	└── ...
│   ├── sequences
│   │   └── subject01_baseball_0
│   │       ├── action.npy 
│   │       ├── action.txt
│   │       ├── human.npz
│   │       ├── markers.npy
│   │		    ├── joints.npy
│   │		    ├── motion.npy
│   │       ├── object.npz
│   │       └── text.txt
│   	└── ...
│   └── sequences_canonical
│       └── subject01_baseball_0
│           ├── action.npy
│           ├── action.txt
│           ├── human.npz
│           ├── markers.npy
│           ├── joints.npy
│           ├── motion.npy
│           ├── object.npz
│           └── text.txt
│   	└── ...
│── imhd
│── chairs
└── annotations

2. Process from scratch

The GRAB, BEHAVE, and INTERCAP datasets are available for academic research under custom licenses from the Max Planck Institute for Intelligent Systems. Note that we do not distribute the original motion data—instead, we provide the text labels annotated by our team. To download these datasets, please visit their respective websites and agree to the terms of their licenses:

Please follow these steps to get started:
  1. Download SMPL+H, SMPLX, DMPLs.

    Download SMPL+H mode from SMPL+H (choose Extended SMPL+H model used in the AMASS project), DMPL model from DMPL (choose DMPLs compatible with SMPL), and SMPL-X model from SMPL-X. Then, please place all the models under ./models/. The ./models/ folder tree should be:

    models
    │── smplh
    │   ├── female
    │   │   ├── model.npz
    │   ├── male
    │   │   ├── model.npz
    │   ├── neutral
    │   │   ├── model.npz
    │   ├── SMPLH_FEMALE.pkl
    │   └── SMPLH_MALE.pkl
    └── smplx
        ├── SMPLX_FEMALE.npz
        ├── SMPLX_FEMALE.pkl
        ├── SMPLX_MALE.npz
        ├── SMPLX_MALE.pkl
        ├── SMPLX_NEUTRAL.npz
        └── SMPLX_NEUTRAL.pkl
    

    Please follow smplx tools to merge SMPL-H and MANO parameters.

  2. Prepare Environment

  • Option A: From environment.yml

    Create the Conda environment:

    conda env create -f environment.yml

    To install PyTorch3D, please follow the official instructions: Pytorch3D

    Install remaining packages:

    pip install git+https://github.com/otaheri/chamfer_distance
    pip install git+https://github.com/otaheri/bps_torch
    python -m spacy download en_core_web_sm
    
  • Option B: Manual setup

    Create and activate a fresh environment:

    conda create -n interact python=3.8
    conda activate interact
    pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

    To install PyTorch3D, please follow the official instructions: Pytorch3D.

    Install remaining packages:

    pip install -r requirements.txt
    python -m spacy download en_core_web_sm
    
  1. Prepare raw data
  • BEHAVE

    Download the motion data from this link, and put them into ./data/behave/sequences. Download object data from this link, and put them into ./data/behave/objects.

    Expected File Structure:

    data/behave/
    ├── sequences
    │   ├── data_name
    │       ├── object_fit_all.npz        # object's pose sequences
    │       └── smpl_fit_all.npz          # human's pose sequences
    └── objects
        └── object_name
            ├── object_name.jpg       # one photo of the object
            ├── object_name.obj       # reconstructed 3D scan of the object
            ├── object_name.obj.mtl   # mesh material property
            ├── object_name_tex.jpg   # mesh texture
            └── object_name_fxxx.ply  # simplified object mesh 
  • OMOMO

    Download the dataset from this link.

    Expected File Structure:

    data/omomo/raw
    ├── omomo_text_anno_json_data              # Annotation JSON data
    ├── captured_objects 
    │   └── object_name_cleaned_simplified.obj # Simplified object mesh
    ├── test_diffusion_manip_seq_joints24.p	   # Test sequences
    └── train_diffusion_manip_seq_joints24.p   # Train sequences
  • InterCap

    Dowload InterCap from the the project website. Please download the one with "new results via newly trained LEMO hand models"

    Expected File Structure:

    data/intercap/raw
    └── 01
        └── 01
            └── Seg_id
                ├── res.pkl					   # Human and Object Motion 				
                └── Mesh
                    └── 00000_second_obj.ply   # Object mesh 
      ...
  • GRAB

    Download GRAB from the project website.

    Expected File Structure:

    data/grab/raw
    ├── grab
    │   ├── s1
    │       └── seq_name.npz      # Human and Object Motion 
        ...
    └── tool
        ├── object_meshes         # Object mesh
        ├── object_settings       
        ├── subject_meshes        # Subject mesh
        └── subject_settings	  
  1. Data Processing

After organizing the raw data, execute the following steps to process the datasets into our standard representations.

  • Run the processing scripts for each dataset:

    python process/process_behave.py
    python process/process_grab.py
    python process/process_intercap.py
    python process/process_omomo.py
  • Canonicalize the object mesh:

    python process/canonicalize_obj.py
    
  • Segment the sequences according to annotations and generate associated text files:

    python process/process_text.py
    python process/process_text_omomo.py

    After processing, the directory structure under data/ should include all sub-datasets, including:

    data
    ├── annotation
    ├── behave
    │   ├── objects
    │   │   └── object_name
    │   │       └── object_name.obj
    │   └── sequences
    │       └── id
    │           ├── human.npz
    │           ├── object.npz
    │           └── text.txt
    ├── omomo
    │   ├── objects
    │   │   └── object_name
    │   │       └── object_name.obj
    │   └── sequences
    │       └── id
    │           ├── human.npz
    │           ├── object.npz
    │           └── text.txt
    ├── intercap
    │   ├── objects
    │   │   └── object_name
    │   │       └── object_name.obj
    │   └── sequences
    │       └── id
    │           ├── human.npz
    │           ├── object.npz
    │           └── text.txt
    └── grab
        ├── objects
        │   └── object_name
        │       └── object_name.obj
        └── sequences
            └── id
                ├── human.npz
                ├── object.npz
                └── text.txt
    
    
  • Canonicalize the human data by running:

    python process/canonicalize_human.py
    
    # or multi_thread for speedup
    python process/canonicalize_human_multi_thread.py
  • Sample object keypoints:

    python process/sample_obj.py
  • Extract motion representations:

    python process/motion_representation.py  
  • Process the object bps for training:

    python process/process_bps.py

Data Loading

To load and explore our data, please refer to the demo notebook.

Visualization

To visualize the dataset, execute the following steps:

  1. Run the visualization script:

    python visualization/visualize.py [dataset_name]

    Replace [dataset_name] with one of the following: behave, neuraldome, intercap, omomo, grab, imhd, chairs.

  2. To visualize markers, run:

    python visualization/visualize_markers.py

Citation

If you find this repository useful for your work, please cite:

@inproceedings{xu2025interact,
    title     = {{InterAct}: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation},
    author    = {Xu, Sirui and Li, Dongting and Zhang, Yucheng and Xu, Xiyan and Long, Qi and Wang, Ziyin and Lu, Yunzhi and Dong, Shuchang and Jiang, Hezi and Gupta, Akshat and Wang, Yu-Xiong and Gui, Liang-Yan},
    booktitle = {CVPR},
    year      = {2025},
}

Please also consider citing the specific sub-dataset you used from InterAct as follows:

@inproceedings{taheri2020grab,
    title     = {{GRAB}: A Dataset of Whole-Body Human Grasping of Objects},
    author    = {Taheri, Omid and Ghorbani, Nima and Black, Michael J. and Tzionas, Dimitrios},
    booktitle = {ECCV},
    year      = {2020},
}

@inproceedings{brahmbhatt2019contactdb,
    title     = {{ContactDB}: Analyzing and Predicting Grasp Contact via Thermal Imaging},
    author    = {Brahmbhatt, Samarth and Ham, Cusuh and Kemp, Charles C. and Hays, James},
    booktitle = {CVPR},
    year      = {2019},
}

@inproceedings{bhatnagar2022behave,
    title     = {{BEHAVE}: Dataset and Method for Tracking Human Object Interactions},
    author    = {Bhatnagar, Bharat Lal and Xie, Xianghui and Petrov, Ilya and Sminchisescu, Cristian and Theobalt, Christian and Pons-Moll, Gerard},
    booktitle = {CVPR},
    year      = {2022},
}

@article{huang2024intercap, 
    title     = {{InterCap}: Joint Markerless {3D} Tracking of Humans and Objects in Interaction from Multi-view {RGB-D} Images}, 
    author    = {Huang, Yinghao and Taheri, Omid and Black, Michael J. and Tzionas, Dimitrios}, 
    journal   = {IJCV}, 
    year      = {2024}
}

@inproceedings{huang2022intercap,
    title     = {{InterCap}: {J}oint Markerless {3D} Tracking of Humans and Objects in Interaction},
    author    = {Huang, Yinghao and Taheri, Omid and Black, Michael J. and Tzionas, Dimitrios},
    booktitle = {GCPR},
    year      = {2022}, 
}

@inproceedings{jiang2023full,
    title     = {Full-body articulated human-object interaction},
    author    = {Jiang, Nan and Liu, Tengyu and Cao, Zhexuan and Cui, Jieming and Zhang, Zhiyuan and Chen, Yixin and Wang, He and Zhu, Yixin and Huang, Siyuan},
    booktitle = {ICCV},
    year      = {2023}
}

@inproceedings{zhang2023neuraldome,
    title     = {{NeuralDome}: A Neural Modeling Pipeline on Multi-View Human-Object Interactions},
    author    = {Juze Zhang and Haimin Luo and Hongdi Yang and Xinru Xu and Qianyang Wu and Ye Shi and Jingyi Yu and Lan Xu and Jingya Wang},
    booktitle = {CVPR},
    year      = {2023},
}

@article{li2023object,
    title     = {Object Motion Guided Human Motion Synthesis},
    author    = {Li, Jiaman and Wu, Jiajun and Liu, C Karen},
    journal   = {ACM Trans. Graph.},
    year      = {2023}
}

@inproceedings{zhao2024imhoi,
    author    = {Zhao, Chengfeng and Zhang, Juze and Du, Jiashen and Shan, Ziwei and Wang, Junye and Yu, Jingyi and Wang, Jingya and Xu, Lan},
    title     = {{I'M HOI}: Inertia-aware Monocular Capture of 3D Human-Object Interactions},
    booktitle = {CVPR},
    year      = {2024},
}