Skip to content

Conversation

@iansseijelly
Copy link

@iansseijelly iansseijelly commented Dec 23, 2025

Description

This PR addresses issue #4274.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

Training throughput before and after the patch running training on task [Isaac-Velocity-Flat-Spot-v0].

delayed_buffer_perf_fix ## Checklist
  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

…n the hot path, avoiding unnecessary kernel and synchronization
@github-actions github-actions bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels Dec 23, 2025
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 24, 2025

Greptile Summary

Optimized circular buffer performance by eliminating GPU-CPU synchronization in the hot path, addressing the performance issue identified in #4274.

Key optimizations:

  • Cached max_length as an integer to avoid repeated .item() calls that trigger GPU sync
  • Added _all_initialized flag to skip first-push checks after warmup, avoiding torch.any().item() calls
  • Removed unnecessary .clone() in DelayBuffer.compute() return path
  • Changed to in-place assignment ([:]) in DelayedPDActuator to avoid tensor reallocation

Performance impact:
According to the PR description, profiling showed excessive time spent in aten::item (89% CPU time) and torch.any checks. These optimizations eliminate both bottlenecks in steady-state operation after warmup.

Code correctness:

  • The in-place assignment pattern is safe because CircularBuffer.__getitem__ returns indexed views
  • The _all_initialized flag is correctly reset on buffer reset
  • Initialization logic still handles first-push correctly before setting the flag

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk - well-targeted performance optimizations
  • The changes are focused performance optimizations that eliminate GPU-CPU synchronization overhead without altering logic. The coordinated changes across three files (caching max_length, skipping checks after warmup, removing clone, using in-place assignment) work together correctly. Minor deduction because there are no new tests to verify the optimizations don't break edge cases, though the logic appears sound.
  • No files require special attention

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/utils/buffers/circular_buffer.py Optimized to avoid GPU-CPU synchronization in hot path by caching max_length as integer and tracking initialization state
source/isaaclab/isaaclab/utils/buffers/delay_buffer.py Removed unnecessary .clone() call since consumer now uses in-place assignment
source/isaaclab/isaaclab/actuators/actuator_pd.py Changed to in-place assignment to avoid unnecessary tensor allocation and copying

Sequence Diagram

sequenceDiagram
    participant Actuator as DelayedPDActuator
    participant DelayBuf as DelayBuffer
    participant CircBuf as CircularBuffer
    
    Note over Actuator,CircBuf: Hot Path (called every physics step)
    
    Actuator->>DelayBuf: compute(control_action.joint_positions)
    DelayBuf->>CircBuf: append(data)
    
    alt First time after reset
        CircBuf->>CircBuf: Check _all_initialized flag (false)
        CircBuf->>CircBuf: Check is_first_push = (_num_pushes == 0)
        CircBuf->>CircBuf: Call .any().item() (GPU sync)
        CircBuf->>CircBuf: Initialize buffer if needed
        CircBuf->>CircBuf: Set _all_initialized = true
    else All batches initialized (optimized path)
        CircBuf->>CircBuf: Skip initialization check
        Note over CircBuf: No GPU-CPU sync needed!
    end
    
    CircBuf->>CircBuf: Increment _num_pushes
    DelayBuf->>CircBuf: __getitem__(time_lags)
    CircBuf-->>DelayBuf: Return delayed data (view)
    DelayBuf-->>Actuator: Return delayed data (no clone)
    Actuator->>Actuator: In-place assign with [:]
    
    Note over Actuator,CircBuf: Optimizations Applied:<br/>1. Cached max_length as int (avoid .item())<br/>2. Skip initialization check after warmup<br/>3. Removed unnecessary .clone()<br/>4. In-place assignment in actuator
Loading

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 24, 2025

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant