Skip to content

Conversation

@Xiaoming-AMD
Copy link
Collaborator

Recent Megatron changes wrap DDP twice — once on the default stream, and once inside a torch.cuda.Stream() block — which can cause deadlocks or hangs under ROCm multi-GPU training.
This branch introduces a runtime monkey-patch in Primus to neutralize the second DDP construction without modifying Megatron source files.

@Xiaoming-AMD Xiaoming-AMD merged commit 1baa042 into main Nov 3, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants