Skip to content

Latest commit

 

History

History
101 lines (88 loc) · 4.76 KB

File metadata and controls

101 lines (88 loc) · 4.76 KB

Changelog

NVIDIA Megatron-Bridge 0.2.0

  • Model Collection Support

    • LLM
      • HuggingFace Conversion + training recipes:
        • GPT-OSS
        • Qwen3 Next
        • Nemotron-H
        • Nemotron Nano v2
        • Moonlight
        • OlMoE
        • GLM 4.5
        • Gemma 3
      • HuggingFace conversion support:
        • Llama Nemotron
        • Mistral
        • Gemma
        • Gemma 2
    • VLM
      • Nemotron Nano v2 VL
      • Qwen 3 VL
      • Qwen2.5 VL
      • Gemma3 VL
  • Performance

    • Megatron-Bridge support for new benchmarks
      • Benchmarks (same workloads as GB200 system) for GB300 system
      • GPT-OSS 120B
      • Qwen3-Next 80B_A3B
      • Support for linear attention on Blackwell - Gated Delta Networks
      • Pre-training with NVFP4 precision: Llama3 8B, Lama3 70B, Llama3.1 405B
    • Megatron-Bridge support for benchmarks previously existing only for NeMo 2.0
      • Nemotron-H 56B
      • Fine-tuning (SFT and LoRA): Llama3 8B and Llama3 70B
    • HybridEP: DeepSeek V3 benchmarks on GB200 and GB300 systems now use HybridEP
    • CUDA Graphs
      • Full-model iteration CUDA graph used for dense models- Llama3 8B, Llama3 70B, Llama3.1 405B
      • Fine-grained Transformer component specific CUDA Graphs used for MoE models
  • NVIDIA Model Optimization Integration

    • Knowledge Distillation
    • Post training quantization export
    • Quantization aware training
  • Enhanced LoRA support

    • Support for expert layers
    • Supported merging adapters for export to HuggingFace @HollowMan6
  • Finetuning dataset improvements: OpenAI messages format conversion, chat template support

  • Integration with Tensor NVIDIA-DLFW-Inspect for tensor statistic collection & monitoring

  • Support for sample-based training

  • Broader Community Adoption: Integrate the Megatron-Bridge into the training pipelines of VeRL (PR), Slime (PR), and Sky-RL (PR).

  • Special thanks to the community contributors for this release: @HollowMan6, @fzyzcjy, @erictang000, @hawkoli1987.

NVIDIA Megatron-Bridge 0.1.0rc4

  • Fix docs build
  • Update performance scripts

NVIDIA Megatron-Bridge 0.1.0rc3

  • Model Collection Support
    • Llama
    • Qwen 2, Qwen 3, Qwen 3 MoE
    • DeepSeek
    • Mamba
  • Migration guide from NeMo 2 to Megatron-Bridge
  • Contribution guide for adding a new model
  • Checkpoint conversion from Hugging Face to Megatron
  • Performance
    • MoE LLM
      • Change the model to dropless with balanced gating
      • Fusion of operators in router function
      • Global permutation fusion with A2A dispatcher
      • EP A2A communication overlap with computation in both 1F1B pipelining and non-pipelined training
      • Precision-aware optimizer update to support BF16 states
    • Megatron FSDP
      • Migration from mcore FSDP to megatron FSDP
      • Fusion of weight gradient copy to reduce-scatter communication buffer to WGRAD GEMM
      • Removed redundant optimizer operations
      • Use Zero1 (opt and master param sharding) in the replica domain of hybrid FSDP to further lower memory usage
      • IB-SHARP support for the IB AllReduce of hybrid FSDP in a patch with NCCL2.28
    • MXFP8
      • Improved act grad all-gather overlap performance via userbuffer
      • Parameter all-gather overlap with computation while the communication buffer sharing with reduce-scatter
      • Fusion of MXFP8 scaling factor swizzling kernels
      • Use PDL (Programmatic Dependent Launch) for quantization kernels to lower CPU overhead
    • Others
      • Full iteration cuda graph for dense model without pipelining
      • Fusion of activation and cast fusion (currently tensor-wise scaling only)
      • Store SwiGLU input in FP8 to save activation memory

NVIDIA Megatron-Bridge 0.1.0a0

  • Llama and Qwen
  • Pretrain/SFT
  • PeFT
  • Recipe structure with examples for plain python & NeMo Run usage