mirror: feat: Add EXAONE 4.0 model bridge (LG AI Research)#4298
mirror: feat: Add EXAONE 4.0 model bridge (LG AI Research)#4298ko3n1g wants to merge 14 commits into
Conversation
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: 김재우 <pewpewplay315@gmail.com>
Move duplicated TERowParallelLinearLayerNorm class into models/common/te_layers.py and update Gemma2, Gemma3, and EXAONE imports. No behavior change on the normal no-bias path; adds a defensive assertion for deferred bias. Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Drop two fields from Exaone4ModelProvider that duplicate parent defaults: - share_embeddings_and_output_weights (parent: True) - rotary_percent (parent: 1.0) Per reviewer feedback on PR #2532. Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: 노란토끼 <83907395+Bias92@users.noreply.github.com>
Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Drop the EXAONE-specific GPTModelProvider subclasses and configure the plain GPTModelProvider returned by the base bridge instead. This follows the Qwen3 provider_bridge pattern and keeps only the custom EXAONE layer spec in the provider module. Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: adityavavreNVDA <avavre@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
| "MiniMaxM2Bridge", | ||
| "OlMoEBridge", | ||
| "OlMoEModelProvider", | ||
| "Exaone4Bridge", |
There was a problem hiding this comment.
Bug: Exaone4Bridge is listed twice in __all__ — once at line 197 (correct alphabetical position) and again here. The duplicate should be removed.
| "Exaone4Bridge", |
| @@ -0,0 +1,4 @@ | |||
| from megatron.bridge.models.exaone.exaone4_bridge import Exaone4Bridge | |||
There was a problem hiding this comment.
Missing NVIDIA copyright header. Per project rules, all new Python files under src/ must include the Apache 2.0 copyright header (tests are exempt).
| from megatron.bridge.models.exaone.exaone4_bridge import Exaone4Bridge | |
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
| from megatron.bridge.models.exaone.exaone4_bridge import Exaone4Bridge |
| output_size, | ||
| config=config, | ||
| **kwargs, | ||
| ) | ||
| self.post_layernorm = TENorm(config, output_size) |
There was a problem hiding this comment.
Nit: This assert is new — the original Gemma2/Gemma3 implementations did not have it. While the check is correct in practice (all current callers set add_bias_linear=False), note that assert is stripped by python -O. If this is meant as a safety invariant for future models adopting this class, consider using if bias is not None: raise ValueError(...) instead.
|
Review - EXAONE 4.0 Bridge -- Clean new-model bridge following the established LLM pattern (Qwen2/Gemma2). The TERowParallelLinearLayerNorm refactor into models/common/te_layers.py is a nice dedup. Items: (1) Bug: Duplicate Exaone4Bridge in all (src/megatron/bridge/models/init.py): appears at line 197 (correct alphabetical spot) and again at line 226 (out of order). The second entry should be removed. (2) Bug: Missing copyright header (src/megatron/bridge/models/exaone/init.py): New source files under src/ require the NVIDIA Apache 2.0 header. (3) Observation: The refactored TERowParallelLinearLayerNorm adds a new assert bias is None guard absent from the original Gemma2/Gemma3 implementations. Safe in practice, but assert is stripped by python -O. Consider raise ValueError. (4) Missing: No functional conversion roundtrip test (test_exaone4_conversion.py): toy-model HF-to-Megatron roundtrip GPU test missing. (5) Missing: No unit test for shared te_layers.py module since it is now shared by Gemma2, Gemma3, and EXAONE. -- Suggested test cases: No perf tests impacted. |
|
/ok to test 97ab5b2 |
Claude summary
Mirror of #2532 by @Bias92 — copied into the upstream repo so the full CI pipeline runs natively (cross-fork PRs cannot trigger it).
Bias92/Megatron-Bridge:feat/exaone4-bridgeCommits are copied verbatim with authorship preserved. Review and discussion remain on #2532.