[WIP] Update Training Benchmarks #4284

matthewtrepte · 2025-12-26T17:29:40Z

Description

Update training benchmarks which are used with unit tests (fast mode) and nightly regression testing (full mode).
Nightly mode is tracked here

Changes

Add new envs, remove dropped env
Fix sb3 workflow
Update timestamp to capture beginning of training, not end

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

greptile-apps · 2025-12-26T17:32:06Z

Greptile Summary

Updates training benchmarks for unit tests and nightly regression testing. Adds new environments (OpenArm manipulation tasks, Dexsuite Kuka-Allegro, AutoMate/Forge tasks, additional velocity locomotion variants, and LocoManip), removes deprecated Isaac-Navigation-Flat-Anymal-C-v0, and reorganizes config entries with descriptive task categories. Fixes sb3 workflow support by adding proper log paths and metric tags (rollout/ep_rew_mean, rollout/ep_len_mean). Improves timestamp accuracy by capturing at session start rather than finish. Replaces carb tokens dependency with custom repository path resolution. Enhances error reporting with specific threshold violation messages and better validation checks.

Key changes:

Added 3 new fast mode test entries for sb3, rl_games, and rsl_rl workflows
Added 20+ new environment configurations in full mode across manipulation, locomotion, and factory tasks
Fixed timestamp to capture training start time instead of end time
Added sb3 workflow support throughout the benchmarking infrastructure
Improved error handling with detailed failure messages
Removed carb dependency for better portability

Critical issue:

full_test mode max_iterations was changed from 500 to 10, which will dramatically reduce test coverage for full environment testing

Confidence Score: 3/5

Generally safe but contains a critical config change that drastically reduces test coverage
The PR makes well-structured improvements to the benchmarking infrastructure with proper error handling and sb3 support. However, the max_iterations change from 500 to 10 in full_test mode (line 20) appears unintentional and would severely reduce test coverage, making full environment testing nearly useless. All other changes are logical additions of new environments and improvements to timestamp/error handling.
Review configs.yaml line 20 carefully - the max_iterations change from 500 to 10 for full_test mode needs verification

Important Files Changed

Filename	Overview
source/isaaclab_tasks/test/benchmarking/configs.yaml	Adds new environments (OpenArm, Dexsuite, AutoMate, Forge, velocity variants, LocoManip), removes Isaac-Navigation-Flat-Anymal-C-v0, reorganizes entries with task categories, and fixes full_test max_iterations from 500 to 10
source/isaaclab_tasks/test/benchmarking/conftest.py	Captures timestamp at session start (pytest_sessionstart) instead of at finish, passes timestamp to process_kpi_data for accurate training start time tracking
source/isaaclab_tasks/test/benchmarking/env_benchmark_test_utils.py	Replaces carb tokens with custom _get_repo_path(), adds sb3 workflow support (log paths and metric tags), improves error messages with threshold details, adds log validation check
source/isaaclab_tasks/test/benchmarking/test_environments_training.py	Captures training stderr on failure, validates workflow_experiment_name_variable exists before using it, moves duration timing to capture full training time

Sequence Diagram

sequenceDiagram
    participant pytest
    participant conftest
    participant test_environments_training
    participant train_job
    participant utils as env_benchmark_test_utils
    
    Note over pytest,conftest: Session Start
    pytest->>conftest: pytest_sessionstart()
    conftest->>conftest: Store START_TIMESTAMP = now()
    
    Note over pytest,test_environments_training: Test Execution Loop (per task)
    pytest->>test_environments_training: test_train_environments()
    test_environments_training->>utils: get_env_configs()
    utils-->>test_environments_training: env_configs
    test_environments_training->>utils: get_env_config()
    utils-->>test_environments_training: env_config
    
    test_environments_training->>train_job: train_job(workflow, task, env_config)
    train_job->>train_job: start_time = time.time()
    train_job->>train_job: subprocess.run(training command)
    train_job->>train_job: duration = time.time() - start_time
    train_job-->>test_environments_training: duration
    
    test_environments_training->>utils: evaluate_job(workflow, task, env_config, duration)
    utils->>utils: _retrieve_logs()
    utils->>utils: _get_repo_path() [NEW]
    utils->>utils: Check sb3 log path [NEW]
    utils->>utils: _parse_tf_logs()
    utils->>utils: _extract_log_val() with sb3 tags [NEW]
    utils->>utils: Compare thresholds, add failure msg [NEW]
    utils-->>test_environments_training: kpi_payload
    
    test_environments_training->>conftest: Store in GLOBAL_KPI_STORE
    
    Note over pytest,conftest: Session Finish
    pytest->>conftest: pytest_sessionfinish()
    conftest->>utils: process_kpi_data(GLOBAL_KPI_STORE, tag, START_TIMESTAMP) [UPDATED]
    utils->>utils: Accumulate totals/successes/failures
    utils->>utils: Add timestamp from START [UPDATED]
    utils-->>conftest: Updated kpi_payloads
    conftest->>utils: output_payloads() (if --save_kpi_payload)
    utils->>utils: _get_repo_path() [NEW]
    utils->>utils: Write to logs/kpi.json

greptile-apps

Additional Comments (1)

source/isaaclab_tasks/test/benchmarking/configs.yaml, line 20 (link)

logic: max_iterations changed from 500 to 10

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

add new envs, remove dropped env, improve timestamp, fix sb3

ad232fe

github-actions bot added the isaac-lab Related to Isaac Lab team label Dec 26, 2025

greptile-apps bot reviewed Dec 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Update Training Benchmarks #4284

[WIP] Update Training Benchmarks #4284

matthewtrepte commented Dec 26, 2025

Uh oh!

greptile-apps bot commented Dec 26, 2025

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] Update Training Benchmarks #4284

Are you sure you want to change the base?

[WIP] Update Training Benchmarks #4284

Conversation

matthewtrepte commented Dec 26, 2025

Description

Type of change

Screenshots

Checklist

Uh oh!

greptile-apps bot commented Dec 26, 2025

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot left a comment •

edited

Loading