Maze-Flatland wraps and extends the flatland-rl environment in maze-rl, making it a powerful AI training ground:
- optimized for multi-agent decision-making
- provides built-in functionalities to improve sample efficiency
- lowers the exploration burden for agents
- provides insights into the behavior of train-agents
- implements extensive logging that can be easily be extended.
Maze-Flatland - Environment architecture |
---|
![]() |
![]() |
![]() |
![]() |
---|
- Key Features
- Setup
- Running Rollouts and Training
- Offline Training
- References
- Contacts
- Acknowledgements
- Masking and Skipping for better sample efficiency. Prevents illegal actions and skips unnecessary decision where a single option is available.
- Designed for multi-agent reinforcement learning (MARL). Sequential multi-agent decision-making is supported.
- Customizable configurations for different training scenarios with YAML files.
- Built-in KPIs and events for comprehensive performance tracking and analysis.
- Navigate into the
maze-flatland
directory
cd maze-flatland
- Create a conda environment from the
environment.yml
Note: The default name for the environment is maze-flatland.
conda env create --file environment.yml
Note: The default version of flatland-rl is the stable version 4.0.1. Modify the
environment.yml
if another version is needed and run the tests to ensure full compatibility.
- Activate the conda environment:
conda activate maze-flatland
- Install maze-flatland locally:
pip install -e .
Note: the editable mode (
-e
) option is essential to facilitate the modification and creation of new experiment (yaml
) files without re-installing.
- Install maze-rl library
pip install git+https://github.com/enlite-ai/maze.git@dev
Run a rollout with a greedy policy:
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy
This performs 1 rollout with 3 trains and prints the resulting performance statistics to the command line, similarly to the given example
The statistics are saved in a rollout directory using the format: outputs/yyyy-mm-dd/hh-mm-ss/
from the working directory.
Every experiment is launched by the maze-run command.
Replace <path/to/yaml/exp>
with your target experiment. E.g. offline/train/bc_train
.
All the experiment files are stored at maze-flatland/maze_flatland/conf/experiment/
.
maze-run -cn conf_train +experiment=<path/to/yaml/exp>
Replace <path/to/yaml/exp>
with your target experiment. E.g. multi_train/rollout/heuristic/simple_greedy
.
maze-run +experiment=<path/to/yaml/exp>
You can override the configuration by modifying the yaml file or at the command line level:
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy env._.n_trains=15 env._.map_height=40 env._.map_width=40 runner.n_episodes=10
Here we run 10 episodes (runner.n_episodes=10
) on a 40x40 map (env._.map_height=40
and env._.map_width=40
)
with 15 trains (env._.n_trains=15
).
You can also launch experiments with wrappers defined at maze-flatland/maze_flatland/conf/wrappers
.
- masking and skipping wrappers (
sub_step_skipping_monitored
). Automatically skips decision-making when only one viable option exists.
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=sub_step_skipping_monitored
- rendering wrapper (
renderer
). It stores a step-by-step rendering of the ongoing episode in a dedicated folder namedrendered_states
in the output directory.
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=renderer
- dataset creator wrapper (
spaces_recording
). It stores the experience for offline training from the rolled out episode(s) in a dedicated folder namedspaces_records
in the output directory.
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=spaces_recording
Additionally, you can stack multiple wrappers.
Beware of the order used for the wrappers.
The first is the outmost wrapper.
In this case, if we reverse the order of the wrappers,
the experience batch will include the agent-environment interactions that were skipped.
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=[renderer,spaces_recording,sub_step_skipping_monitored]
To follow, we list the steps to be followed to train a policy.
To get started, we provide a ready-to-use dataset for training an agent. You can either download, extract and use this dataset or proceed to the data collection phase.
-
-
- Collect 500 trajectories using the
spaces_recording
wrapper:
Tip: Using the
parallel
runner you can set the number of processes used to collect the datarunner.n_processes=20
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=[sub_step_skipping_monitored,spaces_recording] runner=parallel runner.n_processes=5 runner.n_episodes=500
Note: You can replace the greedy agent with any other existing agent, both heuristic or trained.
-
We have now collected our dataset at
/outputs/yyyy-mm-dd/hh-mm-ss/space_records/
- Collect 500 trajectories using the
-
- Finally, we can train with behavioral cloning our torch policy:
maze-run -cn conf_train +experiment=offline/train/bc_train trajectories_data=<replace/with/path/to/dataset>
Note:
<replace/with/path/to/dataset>
can be a single.pkl
file, a directory containing multiple.pkl
files or a list of the previous.
- Now we have our trained policy at the output directory:
./flatland_result/BC_offline-v2.2/decision_point_mask/multi_train/ma_reduced_action_space-masked_flatland-bc-local/yyyy-mm-dd_hh-mm-sssss
- Rollout the trained policy:
maze-run +experiment=multi_train/rollout/torch_policy_masked input_dir=<replace/with/your/output/directory/>
As default, the state_dict.pt
is used to run the rollout as this holds the weights of the best performing policy.
To use a policy at a certain checkpoint specify the policy.state_dict_file
to use. As an example,
maze-run +experiment=multi_train/rollout/torch_policy_masked input_dir=<replace/with/your/output/directory/> policy.state_dict_file=state_dict-epoch_80.pt
XGBoost is a powerful gradient boosting algorithm leveraging optimized tree-based learning to map tabular data to predictions, or action space in this case, excelling in speed and accuracy. Learn more about XGBoost.
Similarly as for behavioral cloning, you can use the ready-to-use dataset or proceed with the data collection phase.
-
-
- First, if using custom observation, we need to flatten the observation (
obs_aggregator_flattening
wrapper) to obtain tabular data compatible with xgboost. - Second, the first and last actions (
DO_NOTHING
andSTOP_MOVING
) need to be at least once in the dataset as XGBoost infers the labels from the dataset. This could be achieved by setting thedo_skipping_in_reset
tofalse
in the skipping wrapper at themaze_flatland/conf/wrappers/sub_step_skipping_monitored.yaml
file and by replacing the greedy policy with a random one.
- Start data collection:
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=[sub_step_skipping_monitored,obs_aggregator_flattening,spaces_recording] runner=parallel runner.n_processes=5 runner.n_episodes=500 policy=masked_random_policy
- We have now collected our dataset at
/outputs/yyyy-mm-dd/hh-mm-ss/space_records/
- First, if using custom observation, we need to flatten the observation (
-
- Finally, we can proceed with training:
maze-run -cn conf_train +experiment=offline/train/xgboost_train trajectories_data=<replace/with/path/to/dataset/>
Note:
<replace/with/path/to/dataset>
can be a single.pkl
file, a directory containing multiple.pkl
files or a list of the previous.
- Now we have our trained model at the output directory:
./flatland_result/XGBoost-v2.2/decision_point_mask/multi_train/ma_reduced_action_space-masked_flatland-xgboost-local/yyyy-mm-dd_hh-mm-sssss
- Rollout the policy:
maze-run +experiment=multi_train/rollout/xgboost input_dir=<replace/with/your/output/directory/>
To conclude, we report a comparison between XGBoost and Neural Network, trained from the same dataset, over the validation set for the round 1 where malfunctions are enabled and trains have the same speed profile.
Validation 1st Round - Performance comparison Neural Network vs XGBoost. |
---|
![]() |
Website enlite.ai
This work is partially supported by the AI4REALNET project that has received funding from European Union’s Horizon Europe Research and Innovation program under the Grant Agreement No 101119527. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.