THUDM
diff --git a/‎.doctrees/advanced/arch-support-beyond-megatron.doctree‎
-255 Bytes b/‎.doctrees/advanced/arch-support-beyond-megatron.doctree‎
-255 Bytes
diff --git a/‎.doctrees/environment.pickle‎
-598 Bytes b/‎.doctrees/environment.pickle‎
-598 Bytes
diff --git a/‎.doctrees/index.doctree‎
1.93 KB b/‎.doctrees/index.doctree‎
1.93 KB
diff --git a/‎_sources/advanced/arch-support-beyond-megatron.md‎
Lines changed: 0 additions & 2 deletions b/‎_sources/advanced/arch-support-beyond-megatron.md‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎_sources/index.rst‎
Lines changed: 6 additions & 0 deletions b/‎_sources/index.rst‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎advanced/arch-support-beyond-megatron.html‎
Lines changed: 0 additions & 5 deletions b/‎advanced/arch-support-beyond-megatron.html‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎index.html‎
Lines changed: 6 additions & 0 deletions b/‎index.html‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎objects.inv‎
-23 Bytes b/‎objects.inv‎
-23 Bytes
diff --git a/‎searchindex.js‎
Lines changed: 1 addition & 1 deletion b/‎searchindex.js‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎zh/.doctrees/environment.pickle‎
0 Bytes b/‎zh/.doctrees/environment.pickle‎
0 Bytes
@@ -1,7 +1,5 @@
 # Supporting Model Architectures Beyond Megatron-LM
 
-## Background
-
 While the Megatron-LM framework is highly efficient for parallel training, it can lack the flexibility to support rapidly evolving model architectures like Qwen3Next. Natively supporting the unique structures of these models, such as Gated-Delta-Net, often requires invasive and time-consuming modifications to Megatron's core codebase.
 
 To accelerate the adoption of these cutting-edge models, slime introduces a more agile approach: **instead of deeply re-engineering Megatron, we directly import and wrap the model's official HuggingFace implementation**, embedding it as a "black-box" module into Megatron's parallel training pipeline.
 
@@ -6,6 +6,12 @@ slime is an LLM post-training framework for RL scaling, providing two core capab
 - High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
 - Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.
 
+slime is the RL-framework behind [GLM-4.5](https://z.ai/blog/glm-4.5) and [GLM-4.6](https://z.ai/blog/glm-4.6) and apart from models from Z.ai, we also supports the following models:
+
+- Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series;
+- DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);
+- Llama 3.
+
 .. toctree::
    :maxdepth: 1
    :caption: Get Started
 
@@ -435,7 +435,6 @@ <h2> Contents </h2>
             </div>
             <nav aria-label="Page">
                 <ul class="visible nav section-nav flex-column">
-<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#background">Background</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#principle-and-core-components">Principle and Core Components</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#current-limitations">Current Limitations</a></li>
 </ul>
@@ -451,12 +450,9 @@ <h2> Contents </h2>
 
   <section class="tex2jax_ignore mathjax_ignore" id="supporting-model-architectures-beyond-megatron-lm">
 <h1>Supporting Model Architectures Beyond Megatron-LM<a class="headerlink" href="#supporting-model-architectures-beyond-megatron-lm" title="Link to this heading">#</a></h1>
-<section id="background">
-<h2>Background<a class="headerlink" href="#background" title="Link to this heading">#</a></h2>
 <p>While the Megatron-LM framework is highly efficient for parallel training, it can lack the flexibility to support rapidly evolving model architectures like Qwen3Next. Natively supporting the unique structures of these models, such as Gated-Delta-Net, often requires invasive and time-consuming modifications to Megatron’s core codebase.</p>
 <p>To accelerate the adoption of these cutting-edge models, slime introduces a more agile approach: <strong>instead of deeply re-engineering Megatron, we directly import and wrap the model’s official HuggingFace implementation</strong>, embedding it as a “black-box” module into Megatron’s parallel training pipeline.</p>
 <p>This document uses Qwen3Next 80B-A3B as an example to illustrate this concept.</p>
-</section>
 <section id="principle-and-core-components">
 <h2>Principle and Core Components<a class="headerlink" href="#principle-and-core-components" title="Link to this heading">#</a></h2>
 <p>Megatron’s model instantiation is a two-step process: first, it generates a “layer specification” (<code class="docutils literal notranslate"><span class="pre">ModuleSpec</span></code>) based on the configuration, and then it instantiates the actual PyTorch modules according to that spec.</p>
@@ -539,7 +535,6 @@ <h2>Current Limitations<a class="headerlink" href="#current-limitations" title="
   </div>
   <nav class="bd-toc-nav page-toc">
     <ul class="visible nav section-nav flex-column">
-<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#background">Background</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#principle-and-core-components">Principle and Core Components</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#current-limitations">Current Limitations</a></li>
 </ul>
 
@@ -445,6 +445,12 @@ <h1>slime Documentation<a class="headerlink" href="#slime-documentation" title="
 <li><p>High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;</p></li>
 <li><p>Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.</p></li>
 </ul>
+<p>slime is the RL-framework behind [GLM-4.5](<a class="reference external" href="https://z.ai/blog/glm-4.5">https://z.ai/blog/glm-4.5</a>) and [GLM-4.6](<a class="reference external" href="https://z.ai/blog/glm-4.6">https://z.ai/blog/glm-4.6</a>) and apart from models from Z.ai, we also supports the following models:</p>
+<ul class="simple">
+<li><p>Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series;</p></li>
+<li><p>DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);</p></li>
+<li><p>Llama 3.</p></li>
+</ul>
 <div class="toctree-wrapper compound">
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Get Started</span></p>
 <ul>