Is your feature request related to a problem? Please describe.
Megatron-LM does not currently support Gemma 4 dense models such as Gemma 4 E4B. This makes it difficult to load, validate, and train Gemma 4 dense checkpoints in Megatron Core while preserving Hugging Face parity.
Describe the solution you'd like
Based on the scope discussion in this issue, the implementation is split across two layers:
Describe alternatives you've considered
One alternative is to keep Gemma 4 support outside Megatron-LM as a downstream fork. However, that makes checkpoint conversion, validation, and maintenance harder, especially as Megatron Core APIs evolve.
Additional context
A working implementation covering all Gemma-specific components is available in NVIDIA-NeMo/Megatron-Bridge#4148, including:
- Gemma 4 unit tests and HF layer/block parity checks
- Converted Megatron checkpoint parity against the original HF Gemma 4 E4B model
- TP1, TP2 parity checks for converted checkpoints
The concrete implementation review is tracked in #5090.
This issue serves as the feature tracking item.
Is your feature request related to a problem? Please describe.
Megatron-LM does not currently support Gemma 4 dense models such as Gemma 4 E4B. This makes it difficult to load, validate, and train Gemma 4 dense checkpoints in Megatron Core while preserving Hugging Face parity.
Describe the solution you'd like
Based on the scope discussion in this issue, the implementation is split across two layers:
Describe alternatives you've considered
One alternative is to keep Gemma 4 support outside Megatron-LM as a downstream fork. However, that makes checkpoint conversion, validation, and maintenance harder, especially as Megatron Core APIs evolve.
Additional context
A working implementation covering all Gemma-specific components is available in NVIDIA-NeMo/Megatron-Bridge#4148, including:
The concrete implementation review is tracked in #5090.
This issue serves as the feature tracking item.