Skip to content

Latest commit

 

History

History
22 lines (15 loc) · 1.36 KB

File metadata and controls

22 lines (15 loc) · 1.36 KB

Open-Code-Zero

Welcome to the official repo for Open-Code-Zero! We pioneered replicating Deepseel-R1-Zero’s self-reflective reasoning in pure code using minimalist RL (no math data!) on a 7B coder model. Key discoveries:

  • 🚀 Emergent Long-COT: Achieved sophisticated self-correction with just 15k problems and ~600 training steps.
  • 🧠 System-2 Awakening: Chaotic "quick thinking" gives way to structured, critical analysis as training stabilizes.
  • 💻 Coder Advantage: Code-specialized models avoid language-switching instability seen in general LLMs, enabling cleaner reasoning.
  • 🫢 First to prove code-domain LLMs can intrinsically evolve Deepseek-R1-Zero-style reasoning without math. Dive in for paradigm-shifting examples!

Training Settings & Code

We find by using a simple outcome-based reward, model learns to naturally adopt a more sophisticated reasoning pattern gradually during training Coming soon in just a few days! Stay tuned and star our repo if you are interested :)