You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _sources/advanced/arch-support-beyond-megatron.md
-2Lines changed: 0 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,5 @@
1
1
# Supporting Model Architectures Beyond Megatron-LM
2
2
3
-
## Background
4
-
5
3
While the Megatron-LM framework is highly efficient for parallel training, it can lack the flexibility to support rapidly evolving model architectures like Qwen3Next. Natively supporting the unique structures of these models, such as Gated-Delta-Net, often requires invasive and time-consuming modifications to Megatron's core codebase.
6
4
7
5
To accelerate the adoption of these cutting-edge models, slime introduces a more agile approach: **instead of deeply re-engineering Megatron, we directly import and wrap the model's official HuggingFace implementation**, embedding it as a "black-box" module into Megatron's parallel training pipeline.
Copy file name to clipboardExpand all lines: _sources/index.rst
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,12 @@ slime is an LLM post-training framework for RL scaling, providing two core capab
6
6
- High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
7
7
- Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.
8
8
9
+
slime is the RL-framework behind [GLM-4.5](https://z.ai/blog/glm-4.5) and [GLM-4.6](https://z.ai/blog/glm-4.6) and apart from models from Z.ai, we also supports the following models:
10
+
11
+
- Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series;
12
+
- DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);
<h1>Supporting Model Architectures Beyond Megatron-LM<aclass="headerlink" href="#supporting-model-architectures-beyond-megatron-lm" title="Link to this heading">#</a></h1>
454
-
<sectionid="background">
455
-
<h2>Background<aclass="headerlink" href="#background" title="Link to this heading">#</a></h2>
456
453
<p>While the Megatron-LM framework is highly efficient for parallel training, it can lack the flexibility to support rapidly evolving model architectures like Qwen3Next. Natively supporting the unique structures of these models, such as Gated-Delta-Net, often requires invasive and time-consuming modifications to Megatron’s core codebase.</p>
457
454
<p>To accelerate the adoption of these cutting-edge models, slime introduces a more agile approach: <strong>instead of deeply re-engineering Megatron, we directly import and wrap the model’s official HuggingFace implementation</strong>, embedding it as a “black-box” module into Megatron’s parallel training pipeline.</p>
458
455
<p>This document uses Qwen3Next 80B-A3B as an example to illustrate this concept.</p>
459
-
</section>
460
456
<sectionid="principle-and-core-components">
461
457
<h2>Principle and Core Components<aclass="headerlink" href="#principle-and-core-components" title="Link to this heading">#</a></h2>
462
458
<p>Megatron’s model instantiation is a two-step process: first, it generates a “layer specification” (<codeclass="docutils literal notranslate"><spanclass="pre">ModuleSpec</span></code>) based on the configuration, and then it instantiates the actual PyTorch modules according to that spec.</p>
<li><p>High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;</p></li>
446
446
<li><p>Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.</p></li>
447
447
</ul>
448
+
<p>slime is the RL-framework behind [GLM-4.5](<aclass="reference external" href="https://z.ai/blog/glm-4.5">https://z.ai/blog/glm-4.5</a>) and [GLM-4.6](<aclass="reference external" href="https://z.ai/blog/glm-4.6">https://z.ai/blog/glm-4.6</a>) and apart from models from Z.ai, we also supports the following models:</p>
449
+
<ulclass="simple">
450
+
<li><p>Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series;</p></li>
451
+
<li><p>DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);</p></li>
0 commit comments