A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning

This repository contains the official JAX implementation for the NeurIPS 2025 paper: A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning.

Our work introduces a principle for designing targeted interventions that guide multi-agent systems toward desirable outcomes, such as improved coordination and performance, without explicitly programming complex behaviors.

Installation

This project requires Python 3.10 and we recommend conda for environment management. The installation is a two-stage process: first, you install the hardware-specific JAX library, and second, you install this project's dependencies.

Clone the repository:

git clone https://github.com/iamlilAJ/Pre-Strategy-Intervention.git
cd Pre-Strategy-Intervention

Create and activate the conda environment:

conda create -n intervention python=3.10 -y
conda activate intervention

Install JAX for your specific hardware:
- For NVIDIA GPU Users (Recommended): This command installs the exact versions of JAX, a CUDA-enabled jaxlib, and cuDNN that are compatible with this project. The -f flag is crucial as it directs pip to the official JAX repository to find the GPU-specific packages.
```
pip install jax==0.4.25 jaxlib==0.4.25 nvidia-cudnn-cu12==8.9.2.26 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```
- For CPU-Only Users: If you do not have an NVIDIA GPU, install the CPU-only version of JAX.
```
pip install jax==0.4.25 jaxlib==0.4.25
```
Install the project and dependencies: Now that JAX is correctly installed, you can install the rest of the project's dependencies. This command uses the [algs] extra to include packages like optax and wandb.
```
pip install -e .[algs]
```

Running Experiments

To reproduce the main results from our paper, run our Pre-Strategy Intervention method against the Standard MARL and Intrinsic Reward baselines described below. All experiments are managed via command-line arguments using Hydra.

Our experiments are organized around three main conditions which can be applied to most algorithms. You can select the condition by modifying the Hydra configuration name (+alg=...).

Pre-Strategy Intervention (Our Method): Use the base algorithm name.
- Example: +alg=ippo
Standard MARL Baseline: Add the base_marl_ prefix to the algorithm name.
- Example: +alg=base_marl_ippo
Intrinsic Reward Baseline: Add the intrinsic_reward_ prefix to the algorithm name.
- Example: +alg=intrinsic_reward_ippo

Below are the base commands for each supported algorithm and environment. Simply apply the prefixes described above to run the desired baseline.

Hanabi Environment

IPPO:

python baselines/IPPO/ippo_pre.py +alg=ippo

MAPPO:

python baselines/MAPPO/mappo_pre.py +alg=mappo

PQN-VDN:

python baselines/QLearning/pqn_vdn_pre.py +alg=pqn

PQN-IQL:

python baselines/QLearning/pqn_iql_pre.py +alg=pqn

4-Player Version in Hanabi

You can change the number of players by overriding the num_agents parameter.

To run PQN-VDN with 4 players:

python baselines/QLearning/pqn_vdn_pre.py +alg=pqn alg.ENV_KWARGS.num_agents=4

Global Pre-Strategy Intervention (GPSI)

You can change the intervention scope by overriding the intervene_two_agents parameter. For example:

 python baselines/QLearning/pqn_vdn_pre.py +alg=pqn alg.ENV_KWARGS.intervene_two_agents=True

MPE Environment

IQL:

python baselines/QLearning/iql_pre.py +alg=iql

To run the second IQL scenario:

python baselines/QLearning/iql_pre.py +alg=iql_scenario_2

VDN:

python baselines/QLearning/vdn_pre.py +alg=vdn

QMIX:

python baselines/QLearning/qmix_pre.py +alg=qmix

Heterogeneous MPE Setting

This special setting tests our method with heterogeneous agents, where one agent is a significantly faster "sprinter."

Intervening on the second agent:

python baselines/QLearning/iql_pre.py +alg=heter_iql

To change which agent is the sprinter to the targeted agent): You can override the accel parameter on the command line.
```
python baselines/QLearning/iql_pre.py +alg=heter_iql alg.ENV_KWARGS.accel='[25.0, 5.0, 5.0]'
```

To run the baseline for this setting:

python baselines/QLearning/iql_pre.py +alg=baseline_heter_iql

Visualization of Learned Behavior

Our Method	Baseline

In this visualization, the red agent (our intervened agent) has learned a preference for moving towards the yellow landmark. By learning this simple additional desired outcome, the agent team can achieve effective coordination and successfully solve the task.

Citation

If you use this work in your research, please cite the following paper.

BibTeX:

@misc{liu2025principle,
    title={A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning},
    author={Anjie Liu and Jianhong Wang and Samuel Kaski and Jun Wang and Mengyue Yang},
    year={2025},
    eprint={2510.17697},
    archivePrefix={arXiv},
    primaryClass={cs.AI}
}

License and Acknowledgements

This project is licensed under the Apache 2.0 License.

Our implementation is built upon the excellent JaxMARL library. We thank the original authors for their significant contributions to the community.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
baselines		baselines
jaxmarl		jaxmarl
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning

Table of Contents

Installation

Running Experiments

Hanabi Environment

4-Player Version in Hanabi

Global Pre-Strategy Intervention (GPSI)

MPE Environment

Heterogeneous MPE Setting

Visualization of Learned Behavior

Citation

License and Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

iamlilAJ/Pre-Strategy-Intervention

Folders and files

Latest commit

History

Repository files navigation

A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning

Table of Contents

Installation

Running Experiments

Hanabi Environment

4-Player Version in Hanabi

Global Pre-Strategy Intervention (GPSI)

MPE Environment

Heterogeneous MPE Setting

Visualization of Learned Behavior

Citation

License and Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages