[Feature-request] Add Megatron Core support Gemma 4 dense model (e.g. E4B)

**Is your feature request related to a problem? Please describe.**

Megatron-LM does not currently support Gemma 4 dense models such as Gemma 4 E4B. This makes it difficult to load, validate, and train Gemma 4 dense checkpoints in Megatron Core while preserving Hugging Face parity.

**Describe the solution you'd like**

Based on the scope discussion in this issue, the implementation is split across two layers:

- Megatron Core (generic): GEGLU support and any generic hooks needed for Gemma 4 architectures, kept default-off so existing GPT models are unaffected. - tracked in #5090 
- Megatron-Bridge (Gemma-specific): Dual RoPE, per-layer embeddings, shared KV wiring, HF checkpoint conversion, parity checks, and example scripts — tracked in NVIDIA-NeMo/Megatron-Bridge#4148.

**Describe alternatives you've considered**

One alternative is to keep Gemma 4 support outside Megatron-LM as a downstream fork. However, that makes checkpoint conversion, validation, and maintenance harder, especially as Megatron Core APIs evolve.

**Additional context**

A working implementation covering all Gemma-specific components is available in NVIDIA-NeMo/Megatron-Bridge#4148, including:
- Gemma 4 unit tests and HF layer/block parity checks
- Converted Megatron checkpoint parity against the original HF Gemma 4 E4B model
- TP1, TP2 parity checks for converted checkpoints

The concrete implementation review is tracked in #5090. 

This issue serves as the feature tracking item.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature-request] Add Megatron Core support Gemma 4 dense model (e.g. E4B) #5089

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature-request] Add Megatron Core support Gemma 4 dense model (e.g. E4B) #5089

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions