README.md

Megatron Discussions

This directory contains in-depth guides, tutorials, and discussions about optimizing and using Megatron for various use cases.

Megatron-FSDP User Guide

A practical guide to enable Megatron-FSDP training, including a quick-start example for DeepSeek-V3, required and recommended configurations, and instructions for checkpoint conversion from torch_dist to fsdp_dtensor.
Spectral Descent: Orthogonalizing Momentum via Newton-Schulz Iteration

A discussion of Muon and related higher-order optimizers in Megatron Core, including layer-wise distributed optimizers, tensor parallel Newton-Schulz execution modes, and performance results on NVIDIA GB300.

If you'd like to contribute a guide or tutorial, please follow this structure:

Each guide should be self-contained with its own images and supporting files.