Skip to content

fix(recipe): bump NCCL all-reduce bandwidth threshold to 300 Gbps#350

Merged
xdu31 merged 1 commit intoNVIDIA:mainfrom
xdu31:nccl-limit
Mar 11, 2026
Merged

fix(recipe): bump NCCL all-reduce bandwidth threshold to 300 Gbps#350
xdu31 merged 1 commit intoNVIDIA:mainfrom
xdu31:nccl-limit

Conversation

@xdu31
Copy link
Contributor

@xdu31 xdu31 commented Mar 11, 2026

Summary

Bump NCCL all-reduce bandwidth threshold from 100 Gbps to 300 Gbps in the H100 EKS training recipe.

Changes

  • recipes/overlays/h100-eks-ubuntu-training.yaml: Update nccl-all-reduce-bw constraint from >= 100 to >= 300

Test plan

  • Existing unit tests pass (make test)

@xdu31 xdu31 requested a review from a team as a code owner March 11, 2026 22:02
@mchmarny mchmarny added this to the M1 - Repo Opening milestone Mar 11, 2026
@xdu31 xdu31 merged commit ca0551d into NVIDIA:main Mar 11, 2026
58 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants