feat(datasets): add lerobot-dataset-quality to flag outlier episodes#3761
Closed
maeste wants to merge 1 commit into
Closed
feat(datasets): add lerobot-dataset-quality to flag outlier episodes#3761maeste wants to merge 1 commit into
maeste wants to merge 1 commit into
Conversation
Adds a read-only analysis tool that computes deterministic per-episode quality metrics from recorded actions (duration, median/p95 jerk, peak velocity, static fraction, end-pose consistency) and flags statistical outliers via the IQR rule. Complements lerobot-dataset-viz for datasets too large to review episode by episode, and feeds candidate episodes to lerobot-edit-dataset --operation.type=delete_episodes. - src/lerobot/scripts/lerobot_dataset_quality.py: core metrics + CLI - pyproject.toml: lerobot-dataset-quality entry point - tests/scripts/test_dataset_quality.py: unit tests for metrics, outlier detection, episode grouping and the end-to-end report - docs/source/using_dataset_tools.mdx: user-facing documentation Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was referenced Jun 10, 2026
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary / Motivation
Once a teleoperation dataset grows past a few dozen episodes, reviewing each one in
lerobot-dataset-vizis impractical, so bad demonstrations (struggled long attempts, sharp corrections, hesitations, wrong end poses) silently stay in the training set. This PR addslerobot-dataset-quality, a read-only CLI that computes deterministic per-episode metrics from the recorded actions and flags statistical outliers, turning "watch 200 episodes" into "watch these 8". The flagged list feeds directly into the existinglerobot-dataset-viz→lerobot-edit-dataset --operation.type delete_episodesworkflow.Related issues
What changed
src/lerobot/scripts/lerobot_dataset_quality.py: new analysis tool. Core logic is in pure, importable functions (compute_episode_metrics,detect_outliers,evaluate_dataset_quality) with a thin argparsemain()following thelerobot-dataset-vizCLI conventions (--repo-id,--root). Only theactioncolumn is read — no image/video decoding, so it runs in seconds.n_frames/duration_s,median_jerkandp95_jerk(smoothness vs. isolated spikes),max_velocity,static_fraction(hesitations), and final-pose distance from the across-episode mean. Outliers flagged with the IQR rule (--k-iqr, default 1.5).pyproject.toml: newlerobot-dataset-qualityentry point.tests/scripts/test_dataset_quality.py: 11 unit tests on synthetic trajectories with known properties (smooth vs. spiked vs. holding), outlier detection (uniform set → no flags; long episode →duration_high; divergent end pose →final_state_high), episode grouping/sorting, and the end-to-end report including error paths.docs/source/using_dataset_tools.mdx: user-facing documentation section.No breaking changes; purely additive and read-only.
How was this tested (or how to run locally)
tests/scripts/test_dataset_quality.py—uv run pytest tests/scripts/test_dataset_quality.py -v(11 passed)pre-commit runpasses on all touched files (ruff, mypy, bandit, typos, prettier).lerobot-dataset-viz(over-long struggled attempts and episodes ending in the wrong pose).lerobot-dataset-quality --repo-id lerobot/pusht(any dataset with anactionfeature works) or add--output-format jsonfor raw numbers.Checklist (required before merge)
pre-commit run -a)pytest)Reviewer notes
median_jerkvsp95_jerkis deliberate: the median is robust to isolated corrective spikes, the p95 is there to catch exactly those spikes.lerobot-edit-datasetdelete command but never modifies anything itself; the docs stress reviewing flagged episodes visually before deleting.🤖 Generated with Claude Code