feat: megatron bridge adaptation by gursimar · Pull Request #1056 · inclusionAI/AReaL

gursimar · 2026-03-19T00:24:13Z

Description

First PR for the adaptation of Megatron-Bridge into AReaL. See RFC #1055 for more details.

Implementation details

1. Introduced a new parameter whose default value is mbridge. So it will not break/ change flow of existing code

actor:
  megatron:
    bridge_type: mbridge

2. Implemented megatron-bridge model creation when bridge_type=megatron-bridge

make_mcore_model supports model creation based on bridge_type
get_bridge function allows creation of appropriate bridge type based on yaml param
_save_model_to_hf and _load_model_from_hf functions are adapted to support megatron-bridge.

3. Testing

Qwen3-0.6B TP=1, TP=2, TP=2,PP=2 works as expected.

4. Resolved dependency conflicts

5. Added docs pages

Type of Change

Checklist

I have read the Contributing Guide
Pre-commit hooks pass (pre-commit run --all-files)
Relevant tests pass; new tests added for new functionality
Documentation updated (if applicable; built with ./docs/build_all.sh)
Branch is up to date with main
Self-reviewed via /review-pr command
This PR was created by a coding agent via /create-pr
This PR is a breaking change

Breaking Change Details (if applicable):
No Breaking change, in fact it is backward compatible

Additional Context

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

gemini-code-assist · 2026-03-19T00:24:33Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the Megatron-Bridge library into AReaL, providing an alternative backend for managing Megatron models alongside the existing mbridge. This enhancement offers greater flexibility in model conversion and creation workflows, allowing users to leverage different bridge implementations. The changes are designed to be fully backward compatible, ensuring that existing setups continue to operate seamlessly while enabling the adoption of the new bridge for future development and optimization.

Highlights

New Bridge Type Parameter: Introduced a new bridge_type parameter within MegatronEngineConfig to allow selection between mbridge (default) and megatron-bridge for model handling.
Megatron-Bridge Integration: Implemented comprehensive support for creating Megatron models using the megatron-bridge backend, including specific configurations for variable sequence lengths and P2P communication.
Adapted Model Persistence: Modified the model saving (_save_model_to_hf) and loading (_load_model_from_hf) functions to correctly interact with the megatron-bridge implementation.
Backward Compatibility: Ensured that the existing mbridge functionality remains the default, maintaining backward compatibility for current configurations.
Initial Testing: Conducted initial functional testing with Qwen3-0.6B across various Tensor Parallelism (TP) and Pipeline Parallelism (PP) setups (TP=1, TP=2, TP=2,PP=2).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for megatron-bridge as an alternative to mbridge for creating Megatron models. This is controlled by a new bridge_type configuration parameter. The changes correctly adapt model creation, saving, and loading logic based on the selected bridge. The implementation looks solid, but I have a few suggestions to improve code quality by removing a debug print statement and refactoring some duplicated and repetitive code blocks.

areal/engine/megatron_engine.py

areal/models/mcore/registry.py

garrett4wade · 2026-03-19T05:35:42Z

Hi @gursimar , thanks for the contribution, but IMO the current form has several issues that should be addressed:

Could you please update pyproject.toml to resolve the dependency conflict?
Could you please add a minimal unit-test to test the integration and benchmark the speed of save/load using megatron bridge vs mbridge?
If megatron-bridge is generally preferred or there are some trade-offs, we should draft a new document about this new feature.

gursimar · 2026-03-20T22:08:22Z

Accuracy validated to be similar.
Experiment conducted on 8 GPUs with Qwen3-0.6B and gsm-8k

gursimar · 2026-03-20T23:39:49Z

@garrett4wade
Save/load benchmarking using the attached script

Using Qwen3-0.6B (TP=1, PP=1)

baseline	runs	save (s)	load (s)	total (s)
mbridge-fast	10	1.64 ±0.08	0.23 ±0.00	1.87 ±0.08
mbridge-standard	10	1.73 ±0.04	0.61 ±0.02	2.34 ±0.04
megatron-bridge	10	3.05 ±0.03	0.41 ±0.02	3.46 ±0.03

Using Qwen2.5-14B-Instruct (TP=1, PP=1)

baseline	runs	save (s)	load (s)	total (s)
mbridge-fast	10	84.14 ±1.47	5.52 ±0.19	89.66 ±1.43
mbridge-standard	10	86.69 ±1.27	6.38 ±0.25	93.07 ±1.23
megatron-bridge	10	86.68 ±0.88	5.43 ±0.11	92.11 ±0.90

megatron-bridge seems to have slightly poor save/load times.
Nevertheless, there are important reasons why megatron-bridge can still be useful. See RFC #1055

I think its better that we address the optimized save/load in a separate, future PR.

Script: benchmark_bridges.py

Let me know if something else needs to be addressed for merging this PR.

- tested TP,PP>1 megatron-bridge integration with mbridge backward compatibility - darwin with x86_64 needs special handling as torch >2.9.1 stops support - some packages conflicts due to megatron-bridge are overridden to previous versions

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

areal/engine/megatron_engine.py Outdated Show resolved Hide resolved

areal/models/mcore/registry.py Outdated Show resolved Hide resolved

areal/models/mcore/registry.py Show resolved Hide resolved

gursimar changed the title ~~Megatron bridge adaptation~~ feat: megatron bridge adaptation Mar 20, 2026

gursimar force-pushed the megatron-bridge-adaptation branch from 7ec6c3b to f49f94b Compare March 20, 2026 21:38

gursimar force-pushed the megatron-bridge-adaptation branch from f49f94b to da01bc4 Compare March 20, 2026 23:40

chore: added docs for the megatron-bridge feature

fe09077

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: megatron bridge adaptation#1056

feat: megatron bridge adaptation#1056
gursimar wants to merge 2 commits intoinclusionAI:mainfrom
gursimar:megatron-bridge-adaptation

gursimar commented Mar 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

garrett4wade commented Mar 19, 2026

Uh oh!

gursimar commented Mar 20, 2026 •

edited

Loading

Uh oh!

gursimar commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gursimar commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Implementation details

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

garrett4wade commented Mar 19, 2026

Uh oh!

gursimar commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gursimar commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Using Qwen3-0.6B (TP=1, PP=1)

Using Qwen2.5-14B-Instruct (TP=1, PP=1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gursimar commented Mar 19, 2026 •

edited

Loading

gursimar commented Mar 20, 2026 •

edited

Loading

gursimar commented Mar 20, 2026 •

edited

Loading