Skip to content

Conversation

@cijose
Copy link
Contributor

@cijose cijose commented Oct 21, 2025

Summary

For all the pre-training runs we did with FSDP2 we observed that averaging the gradients during FSDP2 all gather resulted stable runs and summing the gradients led to NANs in large scale setup. This was overlooked in the DINOv3 codebase and this PR fixes that.

Test plan

We have tested thoroughly this setup for all the pre-training runs we did for DINOv3.

@cijose cijose requested a review from patricklabatut October 21, 2025 20:01
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 21, 2025
@cijose cijose requested a review from qasfb October 21, 2025 20:28
@cijose cijose merged commit d13af7d into main Oct 22, 2025
1 of 2 checks passed
@cijose cijose deleted the cijose/fix_bug_sum_grad branch October 22, 2025 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants