You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/chapter09_alignment/industrial-post-training.md
+80-2Lines changed: 80 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -752,12 +752,18 @@ Tulu 3 完整开源数据、代码和训练 recipe,主题就是 multi-stage po
752
752
753
753
## 参考资料
754
754
755
+
### 国内公司与实验室
756
+
757
+
#### MiniMax
758
+
755
759
[^minimax_m2_1]: [MiniMax M2.1: Post-Training Experience and Insights for Agent Models](https://www.minimax.io/news/post-training-experience-and-insights-for-agent-models)
756
760
757
761
[^minimax_m1]: [MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention](https://arxiv.org/abs/2506.13585)
758
762
759
763
[^minimax_webexplorer]: [WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents](https://arxiv.org/abs/2509.06501)
[^kimi_k1_5]: [Kimi k1.5: Scaling Reinforcement Learning with LLMs](https://arxiv.org/abs/2501.12599)
776
784
777
785
[^kimi_k2]: [Kimi K2: Open Agentic Intelligence](https://arxiv.org/abs/2507.20534)
778
786
779
787
[^kimi_researcher]: [Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities](https://moonshotai.github.io/Kimi-Researcher/)
780
788
789
+
#### 字节 Seed / Doubao
790
+
781
791
[^seed1_5_thinking]: [Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning](https://arxiv.org/abs/2504.13914)
782
792
783
793
[^vapo]: [VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks](https://arxiv.org/abs/2504.05118)
@@ -798,70 +808,100 @@ Tulu 3 完整开源数据、代码和训练 recipe,主题就是 multi-stage po
798
808
799
809
[^seed1_8]: [Official Release of Seed1.8: A Generalized Agentic Model](https://seed.bytedance.com/en/blog/official-release-of-seed1-8-a-generalized-agentic-model)
800
810
811
+
#### DeepSeek
812
+
801
813
[^deepseek_math]: [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://arxiv.org/abs/2402.03300)
802
814
803
815
[^deepseek_r1]: [DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning](https://arxiv.org/abs/2501.12948)
804
816
805
817
[^deepseek_v3_2]: [DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models](https://arxiv.org/abs/2512.02556)
806
818
819
+
#### 智谱 Z.ai / GLM
820
+
807
821
[^glm_4_5]: [GLM-4.5: Agentic, Reasoning, and Coding Foundation Models](https://arxiv.org/abs/2508.06471)
808
822
809
823
[^glm_5]: [GLM-5: from Vibe Coding to Agentic Engineering](https://arxiv.org/html/2602.15763v1)
[^apple_fm]: [Apple Intelligence Foundation Language Models](https://machinelearning.apple.com/research/apple-intelligence-foundation-language-models)
938
992
939
993
[^apple_fm_2025]: [Apple Intelligence Foundation Language Models Tech Report 2025](https://machinelearning.apple.com/research/apple-foundation-models-tech-report-2025)
940
994
995
+
#### xAI Grok
996
+
941
997
[^grok_1]: [xAI Grok-1 Model Card](https://x.ai/news/grok/model-card)
942
998
943
999
[^grok_4]: [xAI Grok 4](https://x.ai/news/grok-4)
@@ -946,16 +1002,22 @@ Tulu 3 完整开源数据、代码和训练 recipe,主题就是 multi-stage po
946
1002
947
1003
[^grok_4_1_card]: [xAI Grok 4.1 Model Card](https://data.x.ai/2025-11-17-grok-4-1-model-card.pdf)
[^nova_report]: [The Amazon Nova Family of Models: Technical Report and Model Card](https://www.isi.edu/results/publications/31887/the-amazon-nova-family-of-models-technical-report-and-model-card/)
@@ -964,26 +1026,42 @@ Tulu 3 完整开源数据、代码和训练 recipe,主题就是 multi-stage po
964
1026
965
1027
[^nova_forge]: [Amazon Nova Forge](https://aws.amazon.com/nova/forge/)
0 commit comments