Kalman Filter Enhanced Group Relative Policy Optimization for Language Model Reasoning

The code repository for paper "Kalman Filter Enhanced Group Relative Policy Optimization for Language Model Reasoning".

Environment Setup

Create conda env

conda create --name krpo python=3.12 -y
conda activate krpo

Install dependencies

conda install -c conda-forge cudatoolkit cudatoolkit-dev -y
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

The base model is Llama-3.2-1B-Instruct, please follow the huggingface to fetch the model https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

Model training and evaluation

Model training

For model training, the commandline is:

bash run.sh [GPU id]

For instance:

bash run.sh 0

Model evaluation

For model evaluation, the resume path of the tested model can be specified in the eval_krpo.sh file. The evaluation can be performed with:

bash eval.sh [GPU id]

For example:

bash eval.sh 0

In both train.py and eval.py, group_advantages_baseline() function is how the baseline model gets group advantages.

Additional Base models and Data

Performed more experiments with base models Qwen2.5-0.5B-Instruct and Qwen2.5-1.5B-Instruct on Normal level difficulty Arithmetic questions:

Also on additional datasets --- AMC and AIME:

Acknowledgement

If you got a chance to use our code, you can cite us!

Enjoy!!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data/MathTasks		data/MathTasks
logs		logs
README.md		README.md
eval.py		eval.py
eval.sh		eval.sh
loss.py		loss.py
replay_buffer.py		replay_buffer.py
requirements.txt		requirements.txt
run.sh		run.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kalman Filter Enhanced Group Relative Policy Optimization for Language Model Reasoning

Environment Setup

Create conda env

Install dependencies

Model training and evaluation

Model training

Model evaluation

Additional Base models and Data

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

billhhh/KRPO_LLMs_RL

Folders and files

Latest commit

History

Repository files navigation

Kalman Filter Enhanced Group Relative Policy Optimization for Language Model Reasoning

Environment Setup

Create conda env

Install dependencies

Model training and evaluation

Model training

Model evaluation

Additional Base models and Data

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages