Skip to content

mirror: feat: Add EXAONE 4.0 model bridge (LG AI Research)#4298

Open
ko3n1g wants to merge 14 commits into
mainfrom
ko3n1g/mirror/pr-2532
Open

mirror: feat: Add EXAONE 4.0 model bridge (LG AI Research)#4298
ko3n1g wants to merge 14 commits into
mainfrom
ko3n1g/mirror/pr-2532

Conversation

@ko3n1g

@ko3n1g ko3n1g commented Jun 11, 2026

Copy link
Copy Markdown
Contributor
Claude summary

Mirror of #2532 by @Bias92 — copied into the upstream repo so the full CI pipeline runs natively (cross-fork PRs cannot trigger it).

Commits are copied verbatim with authorship preserved. Review and discussion remain on #2532.

Bias92 and others added 14 commits February 26, 2026 04:17
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: 김재우 <pewpewplay315@gmail.com>
Move duplicated TERowParallelLinearLayerNorm class into
models/common/te_layers.py and update Gemma2, Gemma3, and EXAONE
imports. No behavior change on the normal no-bias path; adds a
defensive assertion for deferred bias.

Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Drop two fields from Exaone4ModelProvider that duplicate parent defaults:
- share_embeddings_and_output_weights (parent: True)
- rotary_percent (parent: 1.0)

Per reviewer feedback on PR #2532.

Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: 노란토끼 <83907395+Bias92@users.noreply.github.com>
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Drop the EXAONE-specific GPTModelProvider subclasses and configure the plain GPTModelProvider returned by the base bridge instead. This follows the Qwen3 provider_bridge pattern and keeps only the custom EXAONE layer spec in the provider module.

Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: adityavavreNVDA <avavre@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

"MiniMaxM2Bridge",
"OlMoEBridge",
"OlMoEModelProvider",
"Exaone4Bridge",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Exaone4Bridge is listed twice in __all__ — once at line 197 (correct alphabetical position) and again here. The duplicate should be removed.

Suggested change
"Exaone4Bridge",

@@ -0,0 +1,4 @@
from megatron.bridge.models.exaone.exaone4_bridge import Exaone4Bridge

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA copyright header. Per project rules, all new Python files under src/ must include the Apache 2.0 copyright header (tests are exempt).

Suggested change
from megatron.bridge.models.exaone.exaone4_bridge import Exaone4Bridge
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from megatron.bridge.models.exaone.exaone4_bridge import Exaone4Bridge

Comment on lines +43 to +47
output_size,
config=config,
**kwargs,
)
self.post_layernorm = TENorm(config, output_size)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This assert is new — the original Gemma2/Gemma3 implementations did not have it. While the check is correct in practice (all current callers set add_bias_linear=False), note that assert is stripped by python -O. If this is meant as a safety invariant for future models adopting this class, consider using if bias is not None: raise ValueError(...) instead.

@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review - EXAONE 4.0 Bridge -- Clean new-model bridge following the established LLM pattern (Qwen2/Gemma2). The TERowParallelLinearLayerNorm refactor into models/common/te_layers.py is a nice dedup. Items: (1) Bug: Duplicate Exaone4Bridge in all (src/megatron/bridge/models/init.py): appears at line 197 (correct alphabetical spot) and again at line 226 (out of order). The second entry should be removed. (2) Bug: Missing copyright header (src/megatron/bridge/models/exaone/init.py): New source files under src/ require the NVIDIA Apache 2.0 header. (3) Observation: The refactored TERowParallelLinearLayerNorm adds a new assert bias is None guard absent from the original Gemma2/Gemma3 implementations. Safe in practice, but assert is stripped by python -O. Consider raise ValueError. (4) Missing: No functional conversion roundtrip test (test_exaone4_conversion.py): toy-model HF-to-Megatron roundtrip GPU test missing. (5) Missing: No unit test for shared te_layers.py module since it is now shared by Gemma2, Gemma3, and EXAONE. -- Suggested test cases: No perf tests impacted.

@ko3n1g

ko3n1g commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 97ab5b2

@yaoyu-33 yaoyu-33 added area:model Model implementations and HF bridge logic community-request feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic community-request feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants