A multi-agent reinforcement learning environment for studying wildfire suppression and inter-jurisdiction resource sharing. The simulation models fire spread across grid-based jurisdictions where autonomous agents (suppression units) must decide how to move and fight fires, and a higher-level sharing policy decides when to transfer units between jurisdictions.
Each jurisdiction is a 2D grid of cells. At each timestep, three things happen to the fire:
- Suppression -- Units standing on burning cells have a chance to extinguish them. More units on a cell means a higher chance of putting the fire out.
- Spread -- Fire spreads to neighboring cells (up/down/left/right). The more burning neighbors a cell has, the more likely it catches fire.
- Lightning -- Random new fires ignite via a stochastic process (log-normal rate, Poisson count), simulating exogenous ignitions.
Units live on the grid and move each step by a (dx, dy) offset, clamped to grid bounds and limited by movement_per_step (Manhattan distance). The suppression algorithm decides where each unit moves. The current implementation is a greedy heuristic: each unit targets the nearest burning cell, claims it so other units pick different targets, and moves toward it. Idle units drift back toward the grid center.
When multiple jurisdictions are composed together, a sharing algorithm can transfer units between them. Transfers work in three phases:
- Select -- The algorithm picks a source jurisdiction (least fire) and a destination (most fire), then selects the unit closest to the center in the source.
- Steer -- The algorithm overrides that unit's movement to walk it toward the center cell (the transfer departure point).
- Hop -- Once at center, the unit enters transit to an adjacent jurisdiction. Multi-hop routes repeat this for non-adjacent destinations.
Units in transit are removed from their source jurisdiction and cannot suppress fires. After juris_travel_time steps they arrive at the destination's center cell.
The system is split into independent layers so each can be studied separately:
JurisdictionEnvis the building block. It handles one fire grid with local units. It knows nothing about other jurisdictions or transfers. You can instantiate and step it alone for pure suppression research.MultiJurisdictionEnvcomposes multipleJurisdictionEnvinstances and manages the transit system. It providesinitiate_transfer()to move units between jurisdictions andadvance_transit()/step()to tick the simulation forward.
Suppression algorithms operate on a single JurisdictionEnv. Sharing algorithms operate on a MultiJurisdictionEnv. The orchestration loop in main.py connects them.
wildfire-marl-simulation/
├── environment/
│ ├── __init__.py # exports JurisdictionEnv, MultiJurisdictionEnv
│ ├── jurisdiction_env.py # single-jurisdiction fire grid + units
│ └── multi_jurisdiction_env.py # composes jurisdictions + transit system
├── algorithms/
│ ├── __init__.py # re-exports registries
│ ├── utils.py # shared helpers (manhattan_distance, step_toward)
│ ├── suppression_algorithms/
│ │ ├── __init__.py # SUPPRESSION_ALGORITHM_REGISTRY
│ │ ├── algorithm_base.py # SuppressionAlgorithm ABC
│ │ └── greedy.py # greedy nearest-fire heuristic
│ └── sharing_algorithms/
│ ├── __init__.py # SHARING_ALGORITHM_REGISTRY
│ ├── algorithm_base.py # SharingAlgorithm ABC
│ ├── none.py # no-op (no transfers)
│ └── periodic_transfer.py # periodic best-to-worst transfer
├── main.py # CLI entry point (single / multi modes)
└── fire_animator.py # renders snapshot .npz files to GIF/MP4
Requires Python 3.10+. Create a conda environment from the provided environment.yml:
conda env create -f environment.yml
conda activate sim-wildfire-marlOr install dependencies manually:
pip install numpy matplotlib pillowpython main.py --mode single --suppression-algorithm greedy --verbose --steps 200Optional flags for single mode:
python main.py --mode single --rows 16 --cols 16 --num-units 8 --save-snapshots --output-dir results# No sharing (baseline):
python main.py --mode multi --sharing-algorithm none --suppression-algorithm greedy --verbose --steps 200
# Periodic transfer:
python main.py --mode multi --sharing-algorithm periodic_transfer --suppression-algorithm greedy --period-s 10 --verbose --steps 200
# Custom grid layout:
python main.py --mode multi --num-juris-rows 3 --num-juris-cols 3 --per-juris-rows 20 --per-juris-cols 20 --save-snapshotspython main.py --mode multi --sharing-algorithm periodic_transfer --save-snapshots --output-dir snapshots
python fire_animator.py --snapshots-dir snapshots --output-dir animations --fps 4from environment import JurisdictionEnv
import numpy as np
jenv = JurisdictionEnv(
rows=16, cols=16, base_spread_prob=0.06,
suppression_success_prob=0.8, movement_per_step=4,
lightning_mu_log=-2.0, lightning_sigma_log=2.0, num_units=8,
)
rng_s = np.random.default_rng(0)
rng_l = np.random.default_rng(1)
actions = np.zeros((jenv.num_units, 2), dtype=int) # all stay
burning, positions, reward, count = jenv.step(actions, rng_s, rng_l)This section provides the technical context needed to extend this codebase.
- Fire does not spread across jurisdiction boundaries. Jurisdictions are coupled only through unit transfers.
JurisdictionEnvhas no knowledge of multi-jurisdiction concepts. It must remain importable and usable withoutMultiJurisdictionEnv.SuppressionAlgorithm.actions(jenv, rng)receives a singleJurisdictionEnv, returns(num_units, 2)int array of(dx, dy). No masks, no tags, no global indices.SharingAlgorithmhas two methods:decide_transfers(multi_env, rng) -> list[(unit_id, target_juris)]andget_steering_actions(multi_env, rng) -> dict[unit_id, (dx, dy)]. Steering overrides are applied after suppression actions are computed.- The main loop order is:
decide_transfers->initiate_transfer->advance_transit->get_steering_actions->get_actions(per jurisdiction) -> apply steering overrides ->step. This order matters becauseadvance_transitdelivers arrived units before actions are computed, preventing shape mismatches.
JurisdictionEnv.burning_map:(rows, cols)bool, 2D.JurisdictionEnv.unit_positions: 1D int array of flat cell indices (length = current num_units, variable due to add/remove).MultiJurisdictionEnv.unit_jurisdiction:(num_units_total,)int, global ID -> jurisdiction index (-1 if transit).MultiJurisdictionEnv.unit_local_index:(num_units_total,)int, global ID -> index withinjenv.unit_positions(-1 if transit).- Snapshot format for animator:
burning_mapis(steps+1, J, R, C)bool,unit_positionsis(steps+1, N, 2)int where col 0 = jurisdiction, col 1 = flat cell index (negative = in transit with remaining steps encoded as-remaining). Single mode uses J=1.
- Create
algorithms/suppression_algorithms/my_algo.py. - Subclass
SuppressionAlgorithm, implementactions(self, jenv, rng) -> np.ndarrayreturning(jenv.num_units, 2). - Set
name = "my_algo"class attribute. - Register in
algorithms/suppression_algorithms/__init__.pyby importing and adding toSUPPRESSION_ALGORITHM_REGISTRY. - The algorithm receives a
JurisdictionEnvwith these useful attributes:burning_map,unit_positions,cell_row,cell_col,center_cell_row,center_cell_col,rows,cols,movement_per_step,num_units. Usejenv.units_per_cell()andjenv.spread_probabilities(fire_state)for planning.
- Create
algorithms/sharing_algorithms/my_algo.py. - Subclass
SharingAlgorithm, implementdecide_transfers(self, multi_env, rng)and optionally overrideget_steering_actions(self, multi_env, rng). decide_transfersreturns[(unit_id, target_juris), ...]. Only return transfers for units at their jurisdiction's center cell.initiate_transferwill validate this.get_steering_actionsreturns{unit_id: (dx, dy)}to override suppression actions for specific units (e.g., to walk them toward center before transfer).- Register in
algorithms/sharing_algorithms/__init__.py. - Useful
multi_envattributes:jurisdictions(list ofJurisdictionEnv),unit_jurisdiction,unit_local_index,burning_counts,juris_row,juris_col,adj_matrix,num_juris_rows,num_juris_cols,transit_units.
JurisdictionEnv.virtual_step(actions, rng_spread, rng_lightning, burning_map=, unit_positions=) is stateless -- it returns (next_burning, new_positions, reward, count) without mutating the environment. Use this for lookahead / tree search in RL algorithms. Note: it still consumes RNG state, so fork the RNG if you need repeatable rollouts.
initiate_transfer(unit_id, target_juris)removes the unit from its jurisdiction (jenv.remove_units), shiftsunit_local_indexfor remaining units in that jurisdiction, setsunit_jurisdiction[uid] = -1, and appends aTransitUnit(unit_id, from_juris, to_juris, remaining_steps).advance_transit()decrementsremaining_stepsfor all transit units. Those reaching 0 are delivered:jenv.add_units([center_cell])is called on the destination, and global tracking arrays are updated.get_snapshot()encodes transit units as(to_juris, -remaining_steps)in the unit_positions array, matching the old format the animator expects.
| Parameter | Default | Description |
|---|---|---|
rows / cols |
16 | Grid dimensions per jurisdiction |
base_spread_prob |
0.06 | Per-neighbor fire spread probability |
suppression_success_prob |
0.8 | Per-unit chance of extinguishing a fire cell |
movement_per_step |
4 | Max Manhattan distance a unit can move per step |
lightning_mu_log |
-2.0 | Log-normal mean for lightning rate |
lightning_sigma_log |
2.0 | Log-normal std for lightning rate |
juris_travel_time |
4 | Steps to transit between adjacent jurisdictions |
num_juris_rows / num_juris_cols |
2 | Grid layout of jurisdictions (multi mode) |