Skip to content

Conversation

@matthewtrepte
Copy link
Contributor

Description

Update training benchmarks which are used with unit tests (fast mode) and nightly regression testing (full mode).
Nightly mode is tracked here

Changes

  • Add new envs, remove dropped env
  • Fix sb3 workflow
  • Update timestamp to capture beginning of training, not end

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Dec 26, 2025
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 26, 2025

Greptile Summary

Updates training benchmarks for unit tests and nightly regression testing. Adds new environments (OpenArm manipulation tasks, Dexsuite Kuka-Allegro, AutoMate/Forge tasks, additional velocity locomotion variants, and LocoManip), removes deprecated Isaac-Navigation-Flat-Anymal-C-v0, and reorganizes config entries with descriptive task categories. Fixes sb3 workflow support by adding proper log paths and metric tags (rollout/ep_rew_mean, rollout/ep_len_mean). Improves timestamp accuracy by capturing at session start rather than finish. Replaces carb tokens dependency with custom repository path resolution. Enhances error reporting with specific threshold violation messages and better validation checks.

Key changes:

  • Added 3 new fast mode test entries for sb3, rl_games, and rsl_rl workflows
  • Added 20+ new environment configurations in full mode across manipulation, locomotion, and factory tasks
  • Fixed timestamp to capture training start time instead of end time
  • Added sb3 workflow support throughout the benchmarking infrastructure
  • Improved error handling with detailed failure messages
  • Removed carb dependency for better portability

Critical issue:

  • full_test mode max_iterations was changed from 500 to 10, which will dramatically reduce test coverage for full environment testing

Confidence Score: 3/5

  • Generally safe but contains a critical config change that drastically reduces test coverage
  • The PR makes well-structured improvements to the benchmarking infrastructure with proper error handling and sb3 support. However, the max_iterations change from 500 to 10 in full_test mode (line 20) appears unintentional and would severely reduce test coverage, making full environment testing nearly useless. All other changes are logical additions of new environments and improvements to timestamp/error handling.
  • Review configs.yaml line 20 carefully - the max_iterations change from 500 to 10 for full_test mode needs verification

Important Files Changed

Filename Overview
source/isaaclab_tasks/test/benchmarking/configs.yaml Adds new environments (OpenArm, Dexsuite, AutoMate, Forge, velocity variants, LocoManip), removes Isaac-Navigation-Flat-Anymal-C-v0, reorganizes entries with task categories, and fixes full_test max_iterations from 500 to 10
source/isaaclab_tasks/test/benchmarking/conftest.py Captures timestamp at session start (pytest_sessionstart) instead of at finish, passes timestamp to process_kpi_data for accurate training start time tracking
source/isaaclab_tasks/test/benchmarking/env_benchmark_test_utils.py Replaces carb tokens with custom _get_repo_path(), adds sb3 workflow support (log paths and metric tags), improves error messages with threshold details, adds log validation check
source/isaaclab_tasks/test/benchmarking/test_environments_training.py Captures training stderr on failure, validates workflow_experiment_name_variable exists before using it, moves duration timing to capture full training time

Sequence Diagram

sequenceDiagram
    participant pytest
    participant conftest
    participant test_environments_training
    participant train_job
    participant utils as env_benchmark_test_utils
    
    Note over pytest,conftest: Session Start
    pytest->>conftest: pytest_sessionstart()
    conftest->>conftest: Store START_TIMESTAMP = now()
    
    Note over pytest,test_environments_training: Test Execution Loop (per task)
    pytest->>test_environments_training: test_train_environments()
    test_environments_training->>utils: get_env_configs()
    utils-->>test_environments_training: env_configs
    test_environments_training->>utils: get_env_config()
    utils-->>test_environments_training: env_config
    
    test_environments_training->>train_job: train_job(workflow, task, env_config)
    train_job->>train_job: start_time = time.time()
    train_job->>train_job: subprocess.run(training command)
    train_job->>train_job: duration = time.time() - start_time
    train_job-->>test_environments_training: duration
    
    test_environments_training->>utils: evaluate_job(workflow, task, env_config, duration)
    utils->>utils: _retrieve_logs()
    utils->>utils: _get_repo_path() [NEW]
    utils->>utils: Check sb3 log path [NEW]
    utils->>utils: _parse_tf_logs()
    utils->>utils: _extract_log_val() with sb3 tags [NEW]
    utils->>utils: Compare thresholds, add failure msg [NEW]
    utils-->>test_environments_training: kpi_payload
    
    test_environments_training->>conftest: Store in GLOBAL_KPI_STORE
    
    Note over pytest,conftest: Session Finish
    pytest->>conftest: pytest_sessionfinish()
    conftest->>utils: process_kpi_data(GLOBAL_KPI_STORE, tag, START_TIMESTAMP) [UPDATED]
    utils->>utils: Accumulate totals/successes/failures
    utils->>utils: Add timestamp from START [UPDATED]
    utils-->>conftest: Updated kpi_payloads
    conftest->>utils: output_payloads() (if --save_kpi_payload)
    utils->>utils: _get_repo_path() [NEW]
    utils->>utils: Write to logs/kpi.json
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. source/isaaclab_tasks/test/benchmarking/configs.yaml, line 20 (link)

    logic: max_iterations changed from 500 to 10

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant