Skip to content

Latest commit

 

History

History
164 lines (125 loc) · 8.97 KB

File metadata and controls

164 lines (125 loc) · 8.97 KB

Robot Data Preparation Guide

Overview

This guide shows how to convert your robot data to work with our flavor of the LeRobot dataset V2 format (LeRobot docs) -- GR00T LeRobot. While we have added additional structure, our schema maintains full compatibility with the upstream LeRobot v2. The additional metadata and structure allow for more detailed specification and language annotations for your robot data.

The TLDR: Add a meta/modality.json file to your LeRobot v2 dataset and follow the schema below.

LeRobot v2 Requirements

If you already have a dataset in the LeRobot v2 format, you can skip this section.

If you have a dataset in the LeRobot v3.0 format, please use this script to convert it to the LeRobot v2 format.

Why LeRobot v2? GR00T currently uses the LeRobot v2 data format because many upstream datasets (DROID, LIBERO, Bridge, etc.) are published in v2. We plan to support both v2 and v3 formats natively in a future release. For now, please convert v3 datasets to v2 using the script above.

If you have a dataset in another format, please convert it to the LeRobot v2 format satisfying the following requirements.

Structure Requirements

The folder should follow a similar structure as below and contain these core folders and files:

.
├─meta 
│ ├─episodes.jsonl
│ ├─modality.json # -> GR00T LeRobot specific
│ ├─info.json
│ └─tasks.jsonl
├─videos
│ └─chunk-000
│   └─observation.images.ego_view
│     └─episode_000001.mp4
│     └─episode_000000.mp4
└─data
  └─chunk-000
    ├─episode_000001.parquet
    └─episode_000000.parquet

Video Observations (video/chunk-*)

The videos folder will contain the mp4 files associated with each episode following episode_00000X.mp4 naming format where X indicates the episode number. Requirements:

  • Must be stored as MP4 files.
  • Should be named using the format: observation.images.<video_name>

Data (data/chunk-*)

The data folder will contain all of the parquet files associated with each episode following episode_00000X.parquet naming format where X indicates the episode number. Each parquet file will contain:

  • State information: stored as observation.state which is a 1D concatenated array of all state modalities.
  • Action: stored as action which is a 1D concatenated array of all action modalities.
  • Timestamp: stored as timestamp which is a float point number of the starting time.
  • Annotations: stored as annotation.<annotation_source>.<annotation_type>(.<annotation_name>) (see the annotation field in the example configuration for example naming.). No other columns should have the annotation prefix, see the (multiple-annotation-support) if interested in adding multiple annotations.

Example Parquet File

Here is a sample of the cube_to_bowl dataset that is present in the demo_data directory.

{
    "observation.state":[-0.01,...,0],       // 1D array: all state modalities concatenated per modality.json order
    "action":[-0.010,...,0],                 // 1D array: all action modalities concatenated per modality.json order
    "timestamp":0.049,                       // float: wall-clock time of this observation (seconds)
    "annotation.human.action.task_description":0,  // int: index into meta/tasks.jsonl for the language instruction
    "task_index":0,                          // int: task identifier (same as annotation index for single-task)
    "annotation.human.validity":1,           // int: index into meta/tasks.jsonl for validity label
    "episode_index":0,                       // int: which episode this frame belongs to
    "index":0,                               // int: global frame index across all episodes in the dataset
    "next.reward":0,                         // float: reward at the next timestep (0 if unused)
    "next.done":false                        // bool: true if this is the last frame of the episode
}

Meta

  • episodes.jsonl contains a list of all the episodes in the entire dataset. Each episode contains a list of tasks and the length of the episode.
  • tasks.jsonl contains a list of all the tasks in the entire dataset.
  • info.json contains the dataset information.

meta/tasks.jsonl

Here is a sample of the meta/tasks.jsonl file that contains the task descriptions.

{"task_index": 0, "task": "pick the squash from the counter and place it in the plate"}
{"task_index": 1, "task": "valid"}

You can refer the task index in the parquet file to get the task description. So in this case, the annotation.human.action.task_description for the first observation is "pick the squash from the counter and place it in the plate" and annotation.human.validity is "valid".

tasks.jsonl contains a list of all the tasks in the entire dataset.

meta/episodes.jsonl

Here is a sample of the meta/episodes.jsonl file that contains the episode information.

{"episode_index": 0, "tasks": [...], "length": 416}
{"episode_index": 1, "tasks": [...], "length": 470}

episodes.jsonl contains a list of all the episodes in the entire dataset. Each episode contains a list of tasks and the length of the episode.

GR00T LeRobot Specific Requirements

The meta/modality.json Configuration

We require an additional metadata file meta/modality.json that is not present in the standard LeRobot format. This file provides detailed metadata about state and action modalities, enabling:

  • Separate Data Storage and Interpretation:
    • State and Action: Stored as concatenated float32 arrays. The modality.json file supplies the metadata necessary to interpret these arrays as distinct, fine-grained fields.
    • Video: Stored as separate files, with the configuration file allowing them to be renamed to a standardized format.
    • Annotations: Keeps track of all annotation fields. If there are no annotations, do not include the annotation field in the configuration file.
  • Fine-Grained Splitting: Divides the state and action arrays into more semantically meaningful fields.
  • Clear Mapping: Explicit mapping of data dimensions.
  • Sophisticated Data Transformations: Supports field-specific normalization and rotation transformations during training.

Schema

{
    "state": {
        "<state_key>": {
            "start": <int>,         // Starting index in the state array
            "end": <int>            // Ending index in the state array
        }
    },
    "action": {
        "<action_key>": {
            "start": <int>,         // Starting index in the action array
            "end": <int>            // Ending index in the action array
        }
    },
    "video": {
        "<new_key>": {
            "original_key": "<original_video_key>"
        }
    },
    "annotation": {
        "<annotation_key>": {}  // Empty dictionary to maintain consistency with other modalities
    }
}

Example

For a concrete example of modality.json and the full dataset structure, see the publicly available datasets on HuggingFace: nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim.

You can also find a working example in the included demo data at demo_data/cube_to_bowl_5/meta/modality.json.

Notes

  • All indices are zero-based and follow Python's array slicing convention ([start:end]).

GR00T LeRobot Extensions to Standard LeRobot

GR00T LeRobot is a flavor of the standard LeRobot format with more opinionated requirements:

  • We will compute meta/stats.json and meta/relative_stats.json for each dataset, and store them in the meta folder.
  • Proprioceptive states must always be included in the "observation.state" keys.
  • We support multi-channel annotation formats (e.g., coarsegrained, finetuned), allowing users to add as many annotation channels as needed via the annotation.<annotation_source>.<annotation_type> key.
  • We require an additional metadata file meta/modality.json that is not present in the standard LeRobot format.

Multiple Annotation Support

To support multiple annotations within a single parquet file, users may add extra columns to the parquet file. Users should treat these columns the same way as the task_index column in the original LeRobot v2 dataset:

In LeRobot v2, actual language descriptions are stored in a row of the meta/tasks.jsonl file, while the parquet file stores only the corresponding index in the task_index column. We follow the same convention and store the corresponding index for each annotation in the annotation.<annotation_source>.<annotation_type> column. Although the task_index column may still be used for the default annotation, a dedicated column annotation.<annotation_source>.<annotation_type> is required to ensure it is loadable by our custom data loader.