This directory contains in-depth guides, tutorials, and discussions about optimizing and using Megatron for various use cases.
-
A practical guide to enable Megatron-FSDP training, including a quick-start example for DeepSeek-V3, required and recommended configurations, and instructions for checkpoint conversion from torch_dist to fsdp_dtensor.
-
Spectral Descent: Orthogonalizing Momentum via Newton-Schulz Iteration
A discussion of Muon and related higher-order optimizers in Megatron Core, including layer-wise distributed optimizers, tensor parallel Newton-Schulz execution modes, and performance results on NVIDIA GB300.
If you'd like to contribute a guide or tutorial, please follow this structure:
- Create a new directory:
docs/discussions/your-guide-name/ - Add your main guide:
docs/discussions/your-guide-name/your-guide-name.md - Create an images directory:
docs/discussions/your-guide-name/images/ - Update this README.md with a link to your guide
Each guide should be self-contained with its own images and supporting files.