This repository contains the official code for the NeurIPS 2025 Oral paper titled: Breaking the Performance Ceiling in Reinforcement Learning Requires Inference Strategies. The code is built as an extension to the Mava codebase.
Before running the experiments please download the pre-trained model checkpoints from the following link: Download all_checkpoints.zip (~420MB). Once downloaded, extract the contents into the root directory of the code and delete the zip file:
unzip all_checkpoints.zip
rm all_checkpoints.zipAfter extraction, the code directory should look like this:
RL-INFERENCE-STRATEGIES/
├── all_checkpoints/
├── base_policy_hyperparameters/
├── inference_configurations/
├── ...We strongly recommend using uv and python 3.12, but any other virtual environment manager can be used in a similar way.
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create a virtual environment and install all dependencies
uv sync
# If you have a GPU, install the CUDA version of JAX
uv sync --extra cuda12
# Activate the virtual environment
source .venv/bin/activateWe give convenient launcher scripts to reproduce all the results from the paper.
To retrain all base policies with the parameters used in the paper, please run
python experiment_launch_scripts/train_base_policies.pyTo retrain all COMPASS policies with the parameters used in the paper, please run
python experiment_launch_scripts/train_compass_policies.pyTo run all stochastic evaluation experiments, please run
python experiment_launch_scripts/eval_stochastic.pyTo run all COMPASS experiments, please run
python experiment_launch_scripts/eval_compass_cmaes.pyTo run all SGBS experiments, please run
python experiment_launch_scripts/eval_sgbs.pyTo run all online fine-tuning experiments, please run
python experiment_launch_scripts/eval_finetuning.pyIf you build on this work, please cite our paper:
@inproceedings{
chalumeau2025breaking,
title={Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies},
author={Felix Chalumeau and Daniel Rajaonarivonivelomanantsoa and Ruan John de Kock and Juan Claude Formanek and Sasha Abramowitz and Omayma Mahjoub and Wiem Khlifi and Simon Verster Du Toit and Louay Ben Nessir and Refiloe Shabe and Arnol Manuel Fokam and Siddarth Singh and Ulrich Armel Mbou Sob and Arnu Pretorius},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=RxkCwOKVKa}
}This work was supported with Cloud TPUs from Google's TPU Research Cloud (TRC) 🌤.