Skip to content

Commit 7ebfb62

Browse files
committed
deploy: 0beb909
1 parent a74b9b8 commit 7ebfb62

File tree

14 files changed

+26
-9
lines changed

14 files changed

+26
-9
lines changed
-255 Bytes
Binary file not shown.

.doctrees/environment.pickle

-598 Bytes
Binary file not shown.

.doctrees/index.doctree

1.93 KB
Binary file not shown.

_sources/advanced/arch-support-beyond-megatron.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
# Supporting Model Architectures Beyond Megatron-LM
22

3-
## Background
4-
53
While the Megatron-LM framework is highly efficient for parallel training, it can lack the flexibility to support rapidly evolving model architectures like Qwen3Next. Natively supporting the unique structures of these models, such as Gated-Delta-Net, often requires invasive and time-consuming modifications to Megatron's core codebase.
64

75
To accelerate the adoption of these cutting-edge models, slime introduces a more agile approach: **instead of deeply re-engineering Megatron, we directly import and wrap the model's official HuggingFace implementation**, embedding it as a "black-box" module into Megatron's parallel training pipeline.

_sources/index.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ slime is an LLM post-training framework for RL scaling, providing two core capab
66
- High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
77
- Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.
88

9+
slime is the RL-framework behind [GLM-4.5](https://z.ai/blog/glm-4.5) and [GLM-4.6](https://z.ai/blog/glm-4.6) and apart from models from Z.ai, we also supports the following models:
10+
11+
- Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series;
12+
- DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);
13+
- Llama 3.
14+
915
.. toctree::
1016
:maxdepth: 1
1117
:caption: Get Started

advanced/arch-support-beyond-megatron.html

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -435,7 +435,6 @@ <h2> Contents </h2>
435435
</div>
436436
<nav aria-label="Page">
437437
<ul class="visible nav section-nav flex-column">
438-
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#background">Background</a></li>
439438
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#principle-and-core-components">Principle and Core Components</a></li>
440439
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#current-limitations">Current Limitations</a></li>
441440
</ul>
@@ -451,12 +450,9 @@ <h2> Contents </h2>
451450

452451
<section class="tex2jax_ignore mathjax_ignore" id="supporting-model-architectures-beyond-megatron-lm">
453452
<h1>Supporting Model Architectures Beyond Megatron-LM<a class="headerlink" href="#supporting-model-architectures-beyond-megatron-lm" title="Link to this heading">#</a></h1>
454-
<section id="background">
455-
<h2>Background<a class="headerlink" href="#background" title="Link to this heading">#</a></h2>
456453
<p>While the Megatron-LM framework is highly efficient for parallel training, it can lack the flexibility to support rapidly evolving model architectures like Qwen3Next. Natively supporting the unique structures of these models, such as Gated-Delta-Net, often requires invasive and time-consuming modifications to Megatron’s core codebase.</p>
457454
<p>To accelerate the adoption of these cutting-edge models, slime introduces a more agile approach: <strong>instead of deeply re-engineering Megatron, we directly import and wrap the model’s official HuggingFace implementation</strong>, embedding it as a “black-box” module into Megatron’s parallel training pipeline.</p>
458455
<p>This document uses Qwen3Next 80B-A3B as an example to illustrate this concept.</p>
459-
</section>
460456
<section id="principle-and-core-components">
461457
<h2>Principle and Core Components<a class="headerlink" href="#principle-and-core-components" title="Link to this heading">#</a></h2>
462458
<p>Megatron’s model instantiation is a two-step process: first, it generates a “layer specification” (<code class="docutils literal notranslate"><span class="pre">ModuleSpec</span></code>) based on the configuration, and then it instantiates the actual PyTorch modules according to that spec.</p>
@@ -539,7 +535,6 @@ <h2>Current Limitations<a class="headerlink" href="#current-limitations" title="
539535
</div>
540536
<nav class="bd-toc-nav page-toc">
541537
<ul class="visible nav section-nav flex-column">
542-
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#background">Background</a></li>
543538
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#principle-and-core-components">Principle and Core Components</a></li>
544539
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#current-limitations">Current Limitations</a></li>
545540
</ul>

index.html

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -445,6 +445,12 @@ <h1>slime Documentation<a class="headerlink" href="#slime-documentation" title="
445445
<li><p>High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;</p></li>
446446
<li><p>Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.</p></li>
447447
</ul>
448+
<p>slime is the RL-framework behind [GLM-4.5](<a class="reference external" href="https://z.ai/blog/glm-4.5">https://z.ai/blog/glm-4.5</a>) and [GLM-4.6](<a class="reference external" href="https://z.ai/blog/glm-4.6">https://z.ai/blog/glm-4.6</a>) and apart from models from Z.ai, we also supports the following models:</p>
449+
<ul class="simple">
450+
<li><p>Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series;</p></li>
451+
<li><p>DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);</p></li>
452+
<li><p>Llama 3.</p></li>
453+
</ul>
448454
<div class="toctree-wrapper compound">
449455
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Get Started</span></p>
450456
<ul>

objects.inv

-23 Bytes
Binary file not shown.

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

zh/.doctrees/environment.pickle

0 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)