MicroCoder: Breaking Training Bottlenecks for Modern Coding Models

Content

🚀News • 📖Paper_List • ✨Motivation

📈Analysis • 🖥️Algorithms • 🗂️Scaling_Law • 💯Code_Evaluator

📌Citation • 🔖License

Links

Project Page • Algorithm Paper • Scaling Law Paper • Insights Blog

🚀 News

[2026.3.10] The paper was uploaded to Arxiv.

📖 Paper List

This is the project page for MicroCoder and a brief summary for the papers below:

Breaking Training Bottlenecks: Effective Reinforcement Learning for Modern Coding Models
Zongqian Li ^{1, 2}, Shaohan Huang ¹, Zewen Chi ¹, Yixuan Su ², Lexin Zhou ³, Li Dong ¹, Nigel Collier ², Furu Wei ¹
Microsoft ¹, University of Cambridge ², Princeton University ³
Algorithm_Paper
Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems
Zongqian Li ^{1, 2}, Tengchao Lv ¹, Shaohan Huang ¹, Yixuan Su ², Qinzheng Sun ¹, Qiufeng Yin ¹, Ying Xin ¹, Scarlett Li ¹, Lei Cui ¹, Nigel Collier ², Furu Wei ¹
Microsoft ¹, University of Cambridge ²
Scaling_Law_Paper
MicroCoder-Insights: Training Recipes for Modern Coding Models
Insight_Blog

✨ Motivation

Cross-generational training effectiveness: Current training methods demonstrate substantial improvements on Qwen 2.5 models but minimal improvements on Qwen 3 models, revealing generation-specific training bottlenecks
Dataset difficulty gap: Mainstream datasets pose greater difficulty for Qwen 2.5 while appearing relatively simple for Qwen 3 capabilities, indicating need for more challenging training corpora
Fundamental behavioral differences: Output behavior patterns differ fundamentally between generations; Qwen 3 models exhibit pronounced upward trends in response length during training whereas Qwen 2.5 models show stable or decreasing lengths; across model series progression from Qwen 2.5 Instruct to Qwen 3 Instruct to Qwen 3 Thinking, standard outputs demonstrate increasing length and variance

Figure: Algorithm: GRPO+, Max Response Length: 8K, Test Dataset: LiveCodeBench v6, Train Batch Size: 64

📈 Analysis: MicroCoder-Insights

MicroCoder-Insights: Training Recipes for Modern Coding Models

Through comprehensive analysis across more than thirty controlled experiments, we reveal 34 key training insights across seven main aspects including code evaluator, temperature, training data, context length and extension, truncation mask strategies, batch size and on-policy, KL loss and clip ratio.

🖥️ Algorithms: MicroCoder-GRPO

Breaking Training Bottlenecks: Effective Reinforcement Learning for Modern Coding Models

To address training bottlenecks, we propose MicroCoder-GRPO, an enhanced Group Relative Policy Optimization approach with three key innovations:

conditional truncation masking to enhance long output potential while maintaining training stability,
diversity-determined temperature selection to maintain and encourage output diversity,
and removal of KL loss with high clipping ratios to facilitate exploration.

The modifications of MicroCoder-GRPO compared to GRPO are shown as the red components in the equations:

$\theta$: current policy parameters, $\theta_{\text{old}}$: reference policy parameters, $\pi_{\theta}$: policy with parameters $\theta$, $\pi_{\theta_{\text{old}}}$: old/reference policy, $T(D)$: training temperature determined by diversity, $D$: output diversity, $\beta_0$: KL loss weight (set to 0), $\varepsilon$: clipping trust region parameter, $\varepsilon_{\text{high}}$: high clipping value, $L_{\max}$: maximum response length, $\rho$: masking probability, $m$: repeat check parameter (128 tokens), $q$: query, $Q$: set of queries, $P(Q)$: probability distribution over queries, $G$: number of outputs/samples, $o_i$: output $i$, $r_i$: reward for output $i$, $A_i$: advantage score for output $i$, $U(0,1)$: uniform distribution over [0,1], $\mathbb{I}[\cdot]$: indicator function, $\mathbf{D}_{\text{KL}}$: KL divergence, $\text{non-incorrect}(o_i)$: indicates whether output $i$ is non-incorrect, $\neg\text{repeat}(o_i, m)$: checks for non-repetition sequences (final 128 tokens differ from preceding 128 tokens)

Figure: Temperature: 1.2, Train Dataset: MicroCoder-Dataset, Test Dataset: LiveCodeBench v6, Train Batch Size: 64

🗂️ Scaling Law: Data Difficulty

Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

💯 Code Evaluator: MicroCoder-Evaluator

📌 Citation

@misc{li2026breakingtrainingbottleneckseffective,
      title={Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models}, 
      author={Zongqian Li and Shaohan Huang and Zewen Chi and Yixuan Su and Lexin Zhou and Li Dong and Nigel Collier and Furu Wei},
      year={2026},
      eprint={2603.07777},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.07777}, 
}

@misc{li2026scalingdatadifficultyimproving,
      title={Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems}, 
      author={Zongqian Li and Tengchao Lv and Shaohan Huang and Yixuan Su and Qinzheng Sun and Qiufeng Yin and Ying Xin and Scarlett Li and Lei Cui and Nigel Collier and Furu Wei},
      year={2026},
      eprint={2603.07779},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.07779}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figures		figures
MicroCoder-Insights.md		MicroCoder-Insights.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MicroCoder: Breaking Training Bottlenecks for Modern Coding Models

🚀 News

📖 Paper List

✨ Motivation

📈 Analysis: MicroCoder-Insights

🖥️ Algorithms: MicroCoder-GRPO

🗂️ Scaling Law: Data Difficulty

💯 Code Evaluator: MicroCoder-Evaluator

📌 Citation

🔖 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MicroCoder: Breaking Training Bottlenecks for Modern Coding Models

🚀 News

📖 Paper List

✨ Motivation

📈 Analysis: MicroCoder-Insights

🖥️ Algorithms: MicroCoder-GRPO

🗂️ Scaling Law: Data Difficulty

💯 Code Evaluator: MicroCoder-Evaluator

📌 Citation

🔖 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages