Skip to content

[Feature-request] Add Megatron Core support Gemma 4 dense model (e.g. E4B) #5089

@DOGEUNNKIM

Description

@DOGEUNNKIM

Is your feature request related to a problem? Please describe.

Megatron-LM does not currently support Gemma 4 dense models such as Gemma 4 E4B. This makes it difficult to load, validate, and train Gemma 4 dense checkpoints in Megatron Core while preserving Hugging Face parity.

Describe the solution you'd like

Based on the scope discussion in this issue, the implementation is split across two layers:

Describe alternatives you've considered

One alternative is to keep Gemma 4 support outside Megatron-LM as a downstream fork. However, that makes checkpoint conversion, validation, and maintenance harder, especially as Megatron Core APIs evolve.

Additional context

A working implementation covering all Gemma-specific components is available in NVIDIA-NeMo/Megatron-Bridge#4148, including:

  • Gemma 4 unit tests and HF layer/block parity checks
  • Converted Megatron checkpoint parity against the original HF Gemma 4 E4B model
  • TP1, TP2 parity checks for converted checkpoints

The concrete implementation review is tracked in #5090.

This issue serves as the feature tracking item.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions