This repository contains the official implementation for the paper "Think Socially via Cognitive Reasoning". We introduce Cognitive Reasoning, a new paradigm modeled on human social cognition, and CogFlow, a framework to instill this capability in LLMs.
CogFlow enables models to navigate complex social situations by generating a structured "cognitive flow" of interconnected cognitive units (e.g., observation, attribution). This approach moves beyond rigid logical deduction, which is often ill-suited for the ambiguous and interpretive nature of social interactions.

Figure: The CogFlow framework, from data collection via simulation to model optimization with SFT and RL.
- β¨ Features
- π Getting Started
- πΎ Dataset Preparation
- βοΈ Training Pipeline
- π Analysis & Evaluation
- π Citation
- π Acknowledgements
- Cognitive Reasoning Paradigm: A structured approach that allows LLMs to interpret and respond to social situations more effectively.
- Cognitive Flow Simulation: A novel data generation process using tree-structured planning to simulate the associative nature of human thought.
- SFT + RL Framework: A complete training pipeline that first instills foundational skills via Supervised Fine-Tuning (SFT) and then refines them using multi-objective Reinforcement Learning (RL).
- Analysis Tools: Includes scripts for automated evaluation and visualizing the model's internal attention mechanisms to understand the cognitive flow.
Please first clone the repository.
git clone https://github.com/thu-coai/CogFlow
cd CogFlow
This project requires three separate Conda environments due to differing dependencies for data generation, SFT, and RL.
- For Data Generation (
cogflow_data
)conda create -n cogflow_data python=3.11 conda activate cogflow_data pip install openai==1.109.1 zhipuai==2.1.5.20241204 numpy==2.2.0 filelock==3.16.1
- For SFT & Reward Model (
cogflow_sft
)conda create -n cogflow_sft python=3.11 conda activate cogflow_sft # We have only tested on these versions; others may not work as expected. pip install transformers==4.50.0 torch==2.6.0 accelerate==1.5.2 tensorboard==2.19.0 deepspeed==0.16.5 cd Llama-Factory pip install -e ".[torch,metrics]" pip install openai==1.109.1 cd ..
- For Reinforcement Learning (
cogflow_rl
)conda create -n cogflow_rl python=3.10 conda activate cogflow_rl pip install transformers==4.54.0 accelerate==1.9.0 torch==2.6.0 deepspeed==0.17.2 torchdata==0.11.0 vllm==0.8.3 ray==2.43.0 tensorboard==2.20.0 # Download the correct flash-attention wheel from [https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1](https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1) pip install path/to/flash_attn-xxx.whl cd veRL pip install -e . cd ..
You can either download our pre-generated dataset or run the cognitive flow simulation to create your own.
You can directly download our dataset from Huggingface and place the files at dataset/train.json
and dataset/test.json
. You can then proceed directly to the Training Pipeline.
The code for simulation is located in the data_generation
directory.
Place your raw data file at data_generation/dataset/reddit_raw.json
. This should be a JSON file containing a list of dictionaries, each with Post Text
and Comments
keys.
You need an API key from a supported LLM provider to generate the data.
- Set API Credentials: Add your API key in
data_generation/api_config.py
by modifying thecustom_client
arguments. - Select Platform: Set the
platform
variable indata_generation/run_all.py
. We recommend using the default (custom
). - Verify Model Name: Ensure the model name in
data_generation/api_config.py
matches your chosen platform.
Navigate to the data_generation
directory and run the script. This will generate the complete dataset and split it into training and testing sets.
cd data_generation
bash generate_and_collect.sh
The final datasets will be saved as dataset/train.json
and dataset/test.json
.
The training process uses Llama-Factory for SFT and Reward Model training, and veRL for Reinforcement Learning. In our practice, we use Qwen-2.5-7B-Instruct
and Llama-3.1-8B-Instruct
as our base models.
This step teaches the base model the fundamental capability of cognitive reasoning.
- Preprocess Training Data: Run the following script to preprocess the training data in
dataset/train.json
. It will convert and register the dataset toLlama-Factory/data
.cd Llama-Factory/CogFlow_files bash prepare_data_sft.sh
- Prepare Config:
- Modify the SFT configuration file,
Llama-Factory/CogFlow_files/qwen2.5-7b_full_sft_cogflow.yaml
forQwen-2.5-7B-Instruct
base model andLlama-Factory/CogFlow_files/llama3.1-8b_full_sft_cogflow.yaml
forLlama-3.1-8B-Instruct
base model. - update the corresponding
model_name_or_path
to your base models' path.
- Modify the SFT configuration file,
- Run SFT Training: Execute the training command:
cd Llama-Factory # for Qwen-2.5-7B-Instruct base model llamafactory-cli train CogFlow_files/qwen2.5-7b_full_sft_cogflow.yaml # for Llama-3.1-8B-Instruct base model llamafactory-cli train CogFlow_files/llama3.1-8b_full_sft_cogflow.yaml
The reward model learns to predict human preferences, guiding the RL process.
- Prepare Training Data: You can prepare it in two ways:
- Download Our Dataset (Recommended): Download our preprocessed reward model data from Huggingface and place it in the
dataset
folder, with namesrm_train.json
,rm_eval.json
, andrm_test.json
. Then, run the following script to register the dataset toLlama-Factory/data
.cd Llama-Factory/CogFlow_files bash prepare_data_rm_offtheshelf.sh
- Generate Reward Model Data: Construct reward model data from the
dataset/train.json
. Before running, please configure the API key inLlama-Factory/CogFlow_files/prompt_utils/api_config.py
(the same procedure in Step 2). Then, run the following script to register the dataset toLlama-Factory/data
.:cd Llama-Factory/CogFlow_files bash prepare_data_rm.sh
- Download Our Dataset (Recommended): Download our preprocessed reward model data from Huggingface and place it in the
- Prepare Config: Use
Llama-Factory/CogFlow_files/qwen2.5-7b_full_rm_class2.yaml
as a template and updatemodel_name_or_path
. - Run RM Training:
cd Llama-Factory llamafactory-cli train CogFlow_files/qwen2.5-7b_full_rm_class2.yaml
RL optimizes the SFT model's ability to generate high-quality and efficient cognitive flows.
-
Prepare RL Data: Ensure
dataset/train.json
is in the rootdataset
folder. Run the preprocessing script to prepare the data for veRL.cd veRL/cogflow_utils bash prepare_all_data.sh
-
Configure Training Script: In
veRL/cogflow_utils/rl_cogflow_full.sh
, set the paths forTMPDIR
,MODEL_PATH
(your SFT model),REWARD_MODEL_PATH
, and theMODEL_BRIEF_NAME
(nickname used in log and checkpoints' file names). Also, setTOKENIZER_MODEL
inveRL/cogflow_utils/custom_reward_full.py
(the tokenizer of your SFT model).- Scripts with suffixes
_direct
or_distillr1
are used to train the baselinesDirect-GRPO
andDistilled-R1
.
- Scripts with suffixes
-
Run RL Training, Checkpoints will be saved in the
checkpoints
directory.:cd veRL/cogflow_utils bash rl_cogflow_full.sh
-
Convert Checkpoint: After training, convert the RL checkpoint into a standard checkpoint. Set the SFT model path and choose a checkpoint in
converter_full.sh
, then run the following script. The final model will be saved in the checkpoint'smodel
subdirectory.bash converter_full.sh
The code in the test
directory is used for automated evaluation.
- Environment: You can use the
cogflow_rl
environment. - Configuration: Modify the parameters in
run_all.sh
, includingTOKENIZER_PATH
,RM_PATH
, andMODEL_NAME
. - Run Evaluation:
cd test bash run_all.sh
Reproduce the Sankey diagram attention visualizations from our paper. The code is in attention_visualizer
.
- Calculate Attention:
- Configure
config_cog.json
with your model path and the example input/output you wish to analyze. - Run the calculation script:
cd attention_visualizer python attention_calculater.py --config config_cog.json
- Configure
- Generate Visualization Data: Process the saved attention weights to create the visualization file.
python attention_visualizer.py --agg_method topk_topk_mean --agg_inner_top_k 10 --agg_top_k 10 --norm_method none --viz_norm_method power --viz_power 0.2 --run_folder attention_map_cog attention_data.npz
- View and Edit: Open
sankey_visualizer.html
in a web browser. Load the generatedattention_sankey_*.json
file to view, interactively edit, and export the Sankey diagram as an SVG.
If you use CogFlow in your research, please cite our paper:
@misc{cogflow,
title={Think Socially via Cognitive Reasoning},
author={Jinfeng Zhou and Zheyu Chen and Shuai Wang and Quanyu Dai and Zhenhua Dong and Hongning Wang and Minlie Huang},
year={2025},
eprint={2509.22546},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.22546},
}
Our SFT and RL implementations are built upon the excellent LLaMA-Factory and veRL frameworks. We thank their developers for making their work public.