Closed
Conversation
…ight support - Add bergson/magic_patch.py: runtime monkey-patch for twice-differentiable DTensor redistribution (pytorch/pytorch#160509), replacing the old magic_wmdp_setup.sh that modified torch source files on disk - Add per_token mode to DataStream for [n_examples, max_length] weight tensors - Support 2D [B, T] per-token weights in weighted_causal_lm_ce - Fix backward weight_grads accumulation when autograd returns None
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The weight gradient from autograd.grad should always be a tensor since data.weights participates in the computation graph via weighted_causal_lm_ce. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multiple concurrent DCP async_save calls each create their own Gloo process group. With consecutive saves at steps 20-24 (last_start logic), up to 5 saves were in-flight simultaneously. Background threads from these saves may call distributed operations that conflict, causing all ranks to deadlock in fut.result() until the NCCL watchdog times out. Limit to one concurrent save at a time: wait for the previous save to complete before starting the next one. Each save still overlaps with at least one training step, so async I/O benefit is preserved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Raises a clear ValueError at init time when the dataset doesn't have enough examples for the requested number of batches, instead of crashing with an IndexError mid-training. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PyTorch's Future.result() waits for done callbacks to complete before returning. The destroy_process_group callback was invoked from DCP's background thread after each save, but destroy_process_group may do a barrier on the Gloo group. Since ranks complete their I/O at different times, the fast rank would deadlock waiting for the slow rank to also call destroy_process_group, while the slow rank was still in fut.result(). DCP holds its own reference to the process group, keeping it alive for the duration of the background I/O. GC will clean it up afterwards. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip per_token parameter from DataStream and 2D weight path from weighted_causal_lm_ce to keep the merge scope minimal. The per-token code is preserved on the magic-per-token branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.