Skip to content

Conversation

@jadechoghari
Copy link
Member

@jadechoghari jadechoghari commented Nov 3, 2025

What this does

Feat(support): Add Behavior 1k

Example script for loading a task of behaviour 1k:

python examples/behavior_1k/load_behavior_1k_dataset.py --repo-id lerobot/behavior1k-task0000

B1K in Lerobotdatset v3.0 format: https://huggingface.co/collections/lerobot/behavior-1k

michel-aractingi and others added 9 commits October 24, 2025 14:17
Copilot AI review requested due to automatic review settings November 3, 2025 12:25
@jadechoghari jadechoghari added enhancement Suggestions for new features or improvements dataset Issues regarding data inputs, processing, or datasets labels Nov 3, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the BEHAVIOR-1K dataset in LeRobot v3.0 format. The changes introduce scripts for converting BEHAVIOR-1K datasets to v3.0 format, loading the converted datasets, and defining dataset-specific constants.

  • Adds a custom wrapper BehaviorLeRobotDatasetV3 extending LeRobotDataset with BEHAVIOR-1K specific features (modality/camera selection, chunk streaming)
  • Implements conversion utilities to migrate BEHAVIOR-1K datasets from legacy format to LeRobot v3.0
  • Defines BEHAVIOR-1K constants including robot configurations, camera settings, and task mappings

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
examples/behavior_1k/load_behavior_1k_dataset.py Test script demonstrating dataset loading with various configurations
examples/behavior_1k/convert_to_lerobot_v3.py Conversion script for migrating BEHAVIOR-1K datasets to v3.0 format
examples/behavior_1k/behaviour_1k_constants.py Constants file defining robot types, camera intrinsics, action/proprioception indices, and task mappings
examples/behavior_1k/behavior_lerobot_dataset_v3.py Custom dataset wrapper extending LeRobotDataset with BEHAVIOR-1K specific functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def load_behavior1k_dataset_with_multiple_modalities(repo_id, root):
"""Test loading multiple modalities and cameras."""
logging.info("\n" + "=" * 80)
logging.info("Testing multi-modality loading with repo_id: {repo_id}")
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing f-string prefix. The variable repo_id won't be interpolated into the log message. Change to f\"Testing multi-modality loading with repo_id: {repo_id}\".

Suggested change
logging.info("Testing multi-modality loading with repo_id: {repo_id}")
logging.info(f"Testing multi-modality loading with repo_id: {repo_id}")

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +57
# script to convert one single task to v3.1
# TASK = 1
NEW_ROOT = Path("/fsx/jade_choghari/tmp/bb")
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded user-specific path and unused constants should be removed. The NEW_ROOT constant is never used in the code and contains a user-specific path. The comment mentions v3.1 but the code targets v3.0. Remove lines 55-57.

Suggested change
# script to convert one single task to v3.1
# TASK = 1
NEW_ROOT = Path("/fsx/jade_choghari/tmp/bb")

Copilot uses AI. Check for mistakes.
write_info(info, new_root)


def load_jsonlines(fpath: Path) -> list[any]:
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid type hint: any should be Any (capitalized) and must be imported from the typing module. Change to from typing import Any at the top and use list[Any] in the return type.

Copilot uses AI. Check for mistakes.
task_dir_name = f"task-00{task_index}"
videos_dir = root / "videos" / task_dir_name / video_key
ep_paths = sorted(videos_dir.glob("*.mp4"))
print("ep_paths", ep_paths)
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement should be removed or replaced with proper logging. Use logging.debug(f\"Episode paths: {ep_paths}\") instead.

Suggested change
print("ep_paths", ep_paths)
logging.debug(f"Episode paths: {ep_paths}")

Copilot uses AI. Check for mistakes.
},
}

# Camera resolutions and corresponding intrinstics
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'intrinstics' to 'intrinsics'.

Suggested change
# Camera resolutions and corresponding intrinstics
# Camera resolutions and corresponding intrinsics

Copilot uses AI. Check for mistakes.
Comment on lines +552 to +553
import argparse
from pathlib import Path
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate imports: argparse and Path are already imported at the top of the file (lines 18 and 22). Remove these duplicate imports from the if __name__ == \"__main__\": block.

Suggested change
import argparse
from pathlib import Path

Copilot uses AI. Check for mistakes.
ep_ids_set.add(ep_video["episode_index"])
# we skip this check because ep_ids have a step of 10, whereas we convert with a step of 1
# if len(ep_ids_set) != 1:
# raise ValueError(f"Number of episodes is not the same ({ep_ids_set}).")
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# raise ValueError(f"Number of episodes is not the same ({ep_ids_set}).")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Issues regarding data inputs, processing, or datasets enhancement Suggestions for new features or improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants