Data Collection for Visuomotor Diffusion Policy

This document describes the data collection system for the cube stacking project, designed to gather trajectory data for training visuomotor diffusion policies.

Overview

The data collection system captures trajectories generated by the MoveIt2-driven stacking_manager_node as it autonomously performs the cube stacking task. This provides expert demonstrations for training visuomotor policies.

The system records:

Robot joint states (6 arm joints)
Gripper positions (mapped to 0-100 range)
Camera images (424x240 resolution)
Synchronized at 10Hz frequency

Data is saved in degrees format for better model sensitivity.

System Components

The data collection system consists of two main components:

Trajectory Data Interfaces Package
- Custom ROS 2 interfaces for data collection
- Service definitions for starting and stopping episode recording
Trajectory Data Collector Package
- State Logger Node: Records joint states, gripper positions, and camera images
- Provides services for starting and stopping data collection episodes

Data Structure

Data is saved in the following directory structure:

~/mycobot_episodes_degrees/
└── episode_YYYYMMDD_HHMMSS_mmm/
    ├── states.json       # Joint states and gripper positions
    └── frame_dir/        # Camera images
        ├── image_00000.png
        ├── image_00001.png
        └── ...

The states.json file contains an array of entries, each with:

angles: Array of 6 joint angles in degrees
gripper_value: Gripper position mapped to 0-100 range (0=open, 100=closed)
image: Path to the corresponding image file

Usage

Launching Data Collection

The primary and recommended method for collecting multiple episodes of data is using the collect_multiple_episodes.sh script located in the root of your ros2_ws workspace.

cd ~/ros2_ws
./collect_multiple_episodes.sh

This script automates the process by:

Allowing configuration of the output directory, total number of episodes, and cube randomization parameters.
Launching collect_data.launch.py for each episode.
Monitoring each episode for completion or critical failures (e.g., planning, gripper, perception issues).
Generating a detailed failure report (FAILED_EPISODE_*.txt) if an episode aborts due to critical errors, preserving the data for manual review.
Recommending the use of check_episodes.py for managing and cleaning up collected data.

This script orchestrates the collect_data.launch.py file, which in turn runs the stacking_manager_node (using MoveIt2 for autonomous stacking) and the state_logger_node (for recording the demonstration).

For detailed behavior of the script, including failure detection patterns and reporting, refer to the comments within collect_multiple_episodes.sh itself.

Manual Single Episode Collection (for Debugging/Testing)

If you need to collect a single episode, for instance, for testing or debugging purposes, you can still use:

# Single episode (saves to ~/mycobot_episodes_degrees by default)
# This will also use MoveIt2 via stacking_manager_node to perform the task.
ros2 launch mycobot_stacking_project collect_data.launch.py

However, for bulk data collection, collect_multiple_episodes.sh is strongly preferred.

Data Format

The states.json file contains an array of entries in the following format:

[
  {
    "angles": [10.5, -15.2, 30.8, -45.1, 60.3, -75.6],
    "gripper_value": [50],
    "image": "frame_dir/image_00000.png"
  }
]

angles: Joint angles in degrees for the 6 robot joints
gripper_value: Gripper position mapped to 0-100 range (0=open, 100=closed)
image: Relative path to the corresponding image file

Integration with Stacking Manager

The Stacking Manager Node (from mycobot_stacking_project package), driven by MoveIt2, executes the cube stacking task. During data collection:

It calls the State Logger service to start recording at the beginning of its MoveIt2-planned stacking task.
It calls the State Logger service to stop recording when the MoveIt2 task is complete or fails.
Passes a unique episode identifier based on the current timestamp.

The collect_multiple_episodes.sh script automates running this MoveIt2-driven process multiple times.

Troubleshooting

Common Issues

Launch Errors: Always run process cleanup before launching
Missing Data: Ensure robot and camera are publishing on expected topics
Data Location: Episodes save to ~/mycobot_episodes_degrees/ in degrees format

Checking Collected Data

To check if data was collected successfully:

# Check episodes directory
ls -la ~/mycobot_episodes_degrees/

# Validate episodes
python3 check_episodes.py scan --dir ~/mycobot_episodes_degrees

# Check JSON data structure
head -n 10 ~/mycobot_episodes_degrees/episode_*/states.json

Using Collected Data

The collected data is used to train visuomotor diffusion policy models:

Training: Use the DP/ directory training system
Format: Degrees format for better model sensitivity
Processing: Compatible with PyTorch and standard ML frameworks

License

This data collection system is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Collection for Visuomotor Diffusion Policy

Overview

System Components

Data Structure

Usage

Launching Data Collection

Manual Single Episode Collection (for Debugging/Testing)

Data Format

Integration with Stacking Manager

Troubleshooting

Common Issues

Checking Collected Data

Using Collected Data

License

FilesExpand file tree

DATA_COLLECTION.md

Latest commit

History

DATA_COLLECTION.md

File metadata and controls

Data Collection for Visuomotor Diffusion Policy

Overview

System Components

Data Structure

Usage

Launching Data Collection

Manual Single Episode Collection (for Debugging/Testing)

Data Format

Integration with Stacking Manager

Troubleshooting

Common Issues

Checking Collected Data

Using Collected Data

License