Content
🚀News • 📖Paper_List • ✨Motivation
📈Analysis • 🖥️Algorithms • 🗂️Scaling_Law • 💯Code_Evaluator
Links
Project Page • Algorithm Paper • Scaling Law Paper • Insights Blog
- [2026.3.10] The paper was uploaded to Arxiv.
This is the project page for MicroCoder and a brief summary for the papers below:
- Breaking Training Bottlenecks: Effective Reinforcement Learning for Modern Coding Models
Zongqian Li 1, 2, Shaohan Huang 1, Zewen Chi 1, Yixuan Su 2, Lexin Zhou 3, Li Dong 1, Nigel Collier 2, Furu Wei 1
Microsoft 1, University of Cambridge 2, Princeton University 3
Algorithm_Paper - Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems
Zongqian Li 1, 2, Tengchao Lv 1, Shaohan Huang 1, Yixuan Su 2, Qinzheng Sun 1, Qiufeng Yin 1, Ying Xin 1, Scarlett Li 1, Lei Cui 1, Nigel Collier 2, Furu Wei 1
Microsoft 1, University of Cambridge 2
Scaling_Law_Paper - MicroCoder-Insights: Training Recipes for Modern Coding Models
Insight_Blog
- Cross-generational training effectiveness: Current training methods demonstrate substantial improvements on Qwen 2.5 models but minimal improvements on Qwen 3 models, revealing generation-specific training bottlenecks
- Dataset difficulty gap: Mainstream datasets pose greater difficulty for Qwen 2.5 while appearing relatively simple for Qwen 3 capabilities, indicating need for more challenging training corpora
- Fundamental behavioral differences: Output behavior patterns differ fundamentally between generations; Qwen 3 models exhibit pronounced upward trends in response length during training whereas Qwen 2.5 models show stable or decreasing lengths; across model series progression from Qwen 2.5 Instruct to Qwen 3 Instruct to Qwen 3 Thinking, standard outputs demonstrate increasing length and variance
Figure: Algorithm: GRPO+, Max Response Length: 8K, Test Dataset: LiveCodeBench v6, Train Batch Size: 64
MicroCoder-Insights: Training Recipes for Modern Coding Models
Through comprehensive analysis across more than thirty controlled experiments, we reveal 34 key training insights across seven main aspects including code evaluator, temperature, training data, context length and extension, truncation mask strategies, batch size and on-policy, KL loss and clip ratio.
Breaking Training Bottlenecks: Effective Reinforcement Learning for Modern Coding Models
To address training bottlenecks, we propose MicroCoder-GRPO, an enhanced Group Relative Policy Optimization approach with three key innovations:
- conditional truncation masking to enhance long output potential while maintaining training stability,
- diversity-determined temperature selection to maintain and encourage output diversity,
- and removal of KL loss with high clipping ratios to facilitate exploration.
The modifications of MicroCoder-GRPO compared to GRPO are shown as the red components in the equations:
Figure: Temperature: 1.2, Train Dataset: MicroCoder-Dataset, Test Dataset: LiveCodeBench v6, Train Batch Size: 64
@misc{li2026breakingtrainingbottleneckseffective,
title={Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models},
author={Zongqian Li and Shaohan Huang and Zewen Chi and Yixuan Su and Lexin Zhou and Li Dong and Nigel Collier and Furu Wei},
year={2026},
eprint={2603.07777},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2603.07777},
}
@misc{li2026scalingdatadifficultyimproving,
title={Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems},
author={Zongqian Li and Tengchao Lv and Shaohan Huang and Yixuan Su and Qinzheng Sun and Qiufeng Yin and Ying Xin and Scarlett Li and Lei Cui and Nigel Collier and Furu Wei},
year={2026},
eprint={2603.07779},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.07779},
}



