Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

Haozhe Wang^♠,♥,♦, Qixin Xu^♥,♦, Che Liu^♤, Junhong Wu^♡, ^†Fangzhen Lin^♠, ^†Wenhu Chen^♥

HKUST^♠, University of Waterloo^♥, M‑A‑P^♦, Tsinghua University^♣, Imperial College London^♤, UCAS^♡*

📖 TL;DR

Reinforcement Learning (RL) has been a game-changer for teaching LLMs complex reasoning, but how it works has been a mystery. Puzzling behaviors like sudden "aha moments," and performance boosts from longer answers ("length-scaling") have been observed, but not understood.

In this work, we reveal that these are not random quirks. They are the hallmarks of an emergent reasoning hierarchy, where the model learns to reason much like a human: by separating high-level strategic planning from low-level procedural execution. We show this process unfolds in two overlapping phases and leverage this insight to create a more efficient RL algorithm.

Installation

We follow the installation of VeRL and use transformers==4.52.4.

A detailed environment config is provided here (requirements.txt).

Training

We provide bash scripts in example_scripts. They support multinode training on LLama-3, Qwen2.5-Base, Qwen3-Base, Qwen3-Instruct, MiMO-VL-Instruct, Qwen2.5-VL-Instruct.

Set the following:

export WANDB_API_KEY=""
export workdir=/path/to/this/repo
export trainfile=/download/from/hf
export valfile=/download/from/hf
export WORLD_SIZE=/how/many/nodes
export RANK=/rank/index/of/the/current/node

Datasets

Download the train and dev set from the huggingface

For VL experiments: JasperHaozhe/HICRA_RLDATA_VL_ViRL7B
For Math experiments: JasperHaozhe/HICRA_RLDATA_Math

🍊 Citation

If you find our work useful for your research, please consider citing our paper:

@article{wang2025emergent,
  title={Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning},
  author={Wang, Haozhe and Xu, Qixin and Liu, Che and Wu, Junhong and Lin, Fangzhen and Chen, Wenhu},
  journal={arXiv preprint:2509.03646},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
example_scripts		example_scripts
images		images
recipe		recipe
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

📖 TL;DR

Installation

Training

Datasets

🍊 Citation

About

Uh oh!

Releases

Packages

Languages

License

TIGER-AI-Lab/Hierarchical-Reasoner

Folders and files

Latest commit

History

Repository files navigation

Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

📖 TL;DR

Installation

Training

Datasets

🍊 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages