Skip to content

Conversation

@OrangeEarth15
Copy link
Contributor

@OrangeEarth15 OrangeEarth15 commented Oct 14, 2025

Introduces fine-grained load balancing optimization for 3DGUT rendering to address GPU load imbalance issues, inspired by Balanced 3DGS.

Implementation

Transitions from thread-per-pixel to warp-per-pixel parallelism:

  • Each warp (32 threads) now processes one pixel cooperatively instead of one thread per pixel
  • Subdivides each 16×16 tile into 64 virtual tiles (2×2 pixels each)
  • Enables particle-level parallelism across warp threads

Configuration

# configs/render/3dgut.yaml
splat:
  fine_grained_load_balancing: true  # default: false

Changes

  • Add renderBalanced kernel with warp-per-pixel processing
  • Add evalForwardNoKBufferBalanced for cooperative particle traversal
  • Add compile-time flag FINE_GRAINED_LOAD_BALANCING
  • Add config parameter fine_grained_load_balancing

Performance

Benchmark on MipNeRF360 dataset (8× downsampled):
Render kernel: 3.19ms → 1.78ms (~1.8× speedup, 44% reduction)

image image

Copy link
Collaborator

@wilsonCernWq wilsonCernWq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@wilsonCernWq wilsonCernWq merged commit 1a863bd into nv-tlabs:main Oct 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants