Skip to content

Latest commit

 

History

History
46 lines (41 loc) · 1.36 KB

File metadata and controls

46 lines (41 loc) · 1.36 KB

MASPRM

Multi-Agent System Process Reward Model

A lightweight process reward model that guides multi-agent reasoning at search time.

Paper · PDF · Project Page

MASPRM training pipeline

MASPRM training pipeline (main paper figure).

Highlights

  • MASPRM adds a process reward model to guide multi-agent sytem.
  • Plugs into MCTS and inference time search for better trajectory selection.
  • Improves exact-match on challenging reasoning benchmarks.

Quickstart

pip install -r requirements.txt
python src/run_mcts.py --dataset mmlu --split train --load_in_4bit --ray --gpus_per_actor 0.125 --actors 32

Docker

docker build -t masprm .
docker run --rm -it -v "$PWD:/app" masprm python src/run_mcts.py --help

BibTeX

@article{yazdani2025masprm,
  title={{MASPRM}: Multi-Agent System Process Reward Model},
  author={Yazdani, Milad and Mostajabdaveh, Mahdi and Zhou, Zirui and Xiong, Ying},
  journal={arXiv preprint arXiv:2510.24803},
  year={2025}
}