Open-Code-Zero

Welcome to the official repo for Open-Code-Zero! We pioneered replicating Deepseel-R1-Zero’s self-reflective reasoning in pure code using minimalist RL (no math data!) on a 7B coder model. Key discoveries:

🚀 Emergent Long-COT: Achieved sophisticated self-correction with just 15k problems and ~600 training steps.
🧠 System-2 Awakening: Chaotic "quick thinking" gives way to structured, critical analysis as training stabilizes.
💻 Coder Advantage: Code-specialized models avoid language-switching instability seen in general LLMs, enabling cleaner reasoning.
🫢 First to prove code-domain LLMs can intrinsically evolve Deepseek-R1-Zero-style reasoning without math. Dive in for paradigm-shifting examples!

Training Settings & Code

We find by using a simple outcome-based reward, model learns to naturally adopt a more sophisticated reasoning pattern gradually during training Coming soon in just a few days! Stay tuned and star our repo if you are interested :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open-Code-Zero

Training Settings & Code

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Open-Code-Zero

Training Settings & Code