IMPRL is a Python library for deep reinforcement learning for inspection and maintenance (I&M) planning of engineering systems. It provides a collection of environments and single-agent and multi-agent algorithm implementations.
The following reinforcement algorithms are implemented:
| Algorithm | Formulation | entry point |
|---|---|---|
| DDQN | off-policy, value-based | train_and_log.py |
| JAC | off-policy, actor-critic | train_and_log.py |
| PPO | on-policy, actor-critic | imprl/agents/PPO.py |
| Algorithm | Formulation | entry point |
|---|---|---|
| DDMAC / DCMAC | off-policy, actor-critic | train_and_log.py |
| IACC / IACC_PS | off-policy, actor-critic | train_and_log.py |
| MAPPO_PS | on-policy, actor-critic | imprl/agents/MAPPO_PS.py |
| QMIX_PS | off-policy, value-based | train_and_log.py |
| VDN_PS | off-policy, value-based | train_and_log.py |
| IAC / IAC_PS | off-policy, actor-critic | train_and_log.py |
| IPPO_PS | on-policy, actor-critic | imprl/agents/IPPO_PS.py |
PSdenotes parameter sharing between agents.- The base off-policy actor-critic algorithm: ACER from SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY by Wang et al., an off-policy algorithm that uses weighted sampling for experience replay.
-
We provide heuristic baselines such as
inspect_repairwhich finds optimal maintenance policies by optimizing inspection intervals and prioritizing component repairs. -
More simple polices such as
do_nothingandfailure_replacealso provided. These require no optimization and are most suitable for sanity checks.
- For
k_out_of_n_infiniteenvironments, it is possible to compute (near-)optimal policies using point-based value iteration algorithms for POMDPs (such as SARSOP). To enable visualization of SARSOP policies, we provide a wrapper for interfacing with SARSOP called SARSOPAgent.
imprl/
βββ agents/ # RL algorithms, configs, and training entry points
β βββ configs/ # default hyperparameter configs
β βββ primitives/ # reusable networks, replay buffers, and schedulers
β βββ __init__.py # agent registry and factory
β βββ *.py # algorithm implementations
βββ baselines/ # heuristic and reference policies
βββ envs/ # environment registry, wrappers, and environment cores
β βββ game_envs/ # matrix-game environments, wrappers, and configs
β βββ structural_envs/ # k-out-of-n environments, wrappers, and configs
β βββ __init__.py # env registry and factory
βββ post_process/ # policy visualisation and result post-processing
β βββ policy_visualizer.py # rollout and belief-space plots
β βββ stats.py # summary statistics helpers
βββ runners/ # serial and parallel rollout utilities
βββ parallel.py # multiprocessing rollout helpers
βββ serial.py # single-process rollout and training helpers
uv is a fast Python package and project manager. See more install options: uv docs
# macOS (Homebrew)
brew install uv
# Or via script (Linux/macOS)
curl -LsSf https://astral.sh/uv/install.sh | shFrom the repository root:
uv venv --python 3.11
source .venv/bin/activate # Windows: .venv\Scripts\activateFrom this directory, install the base runtime deps defined in pyproject.toml. This install includes the core dependencies for running the library:
uv sync
# optionally install dev tools: pytest, black, ruff
uv sync --group dev3.1) PyTorch GPU support (optional)
GPU notes (PyTorch): If you need a CUDA-enabled wheel, install a build matching your system from the official PyTorch index: https://pytorch.org/get-started/locally/
Example (Linux, CUDA 12.1):
uv pip install --index-url https://download.pytorch.org/whl/cu121 torchCheck if PyTorch detects your GPU:
python -c "import torch; print(torch.cuda.is_available())"3.2) Installing additional packages (optional)
Use uv add <pkg> to add packages to your project and lockfile. For example,
to add Jupyter Notebook:
uv add "notebook>=7.1.2,<8.0.0"If resolution fails, relax version ranges and retry.
For logging, the library relies on wandb.You can log into wandb using your private API key,
wandb login
# <enter wandb API key>import imprl.envs
# create an environment
env = imprl.envs.make(name="k_out_of_n_infinite", setting="hard-1-of-4_infinite")
# env uses the standard Gymnasium API.
obs, info = env.reset()
# select an action for each agent
action = [0, 1, 0, 2]
# step the environment
next_obs, reward, termination, truncation, info = env.step(action)Train a DDQN agent on the 1-out-of-4 infinite-horizon environment.
Default configs specifying the hyperparameters and environment configs are located at imprl/config/agents/DDQN.yaml. We use Hydra for configuration management and you can override any of the default config values via the command line. For example, to disable logging to wandb, you can set WANDB.mode=disabled:
cd imprl
python train_and_log.py --config-name DDQN WANDB.mode=disabledfrom omegaconf import OmegaConf
from imprl.post_process import PolicyVisualizer
# create an environment
env = imprl.envs.make(name="k_out_of_n_infinite", setting="hard-1-of-4_infinite")
# create an agent
cfg = OmegaConf.load("imprl/agents/configs/IPPO_PS.yaml")
agent = imprl.agents.make(
"IPPO_PS", env, OmegaConf.to_container(cfg, resolve=False), device
)
# (optional) load checkpoint
# agent.load_weights(checkpoint_dir, int(checkpoint))
plotter = PolicyVisualizer(env, agent)
plotter.plot()You can find a detailed introduction in getting_started.ipynb.
This project utilizes the abstractions in EPyMARL and the author would like to acknowledge the insights shared in Reinforcement Learning Implementation Tips and Tricks for developing this library.
-
IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL
- Benchmarking scalability of cooperative MARL methods in real-world infrastructure management planning problems.
- Environments: (Correlated and uncorrelated) k-out-of-n systems and offshore wind structural systems.
- RL solvers: Provides wrappers for interfacing with several (MA)RL libraries such as EPyMARL, Rllib, MARLlib etc.
-
IMP-act: Benchmarking MARL for Infrastructure Management Planning at Scale with JAX
- Large-scale road networks with up to 178 agents implemented in JAX for scalability.
- IMP-act-JaxMARL interfaces IMP-act with multi-agent solvers in JaxMARL.
- We also provide NumPy-based environments for compatibility with PyTorch in IMP-act-epymarl.
If you find this repository useful, please consider citing:
Assessing the Optimality of Decentralized Inspection and Maintenance Policies for Stochastically Degrading Engineering Systems (open access)
@inproceedings{bhustali_decentralization_2025,
title = "Assessing the Optimality of Decentralized Inspection and Maintenance Policies for Stochastically Degrading Engineering Systems",
author = "Prateek Bhustali and Andriotis, {Charalampos P.}",
year = "2025",
doi = "10.1007/978-3-031-74650-5_13",
isbn = "978-3-031-74649-9",
series = "Communications in Computer and Information Science",
publisher = "Springer",
pages = "236--254",
editor = "Oliehoek, {Frans A.} and Manon Kok and Sicco Verwer",
booktitle = "Artificial Intelligence and Machine Learning",
url = "https://link.springer.com/chapter/10.1007/978-3-031-74650-5_13",
}