Maze-Flatland

Maze-Flatland wraps and extends the flatland-rl environment in maze-rl, making it a powerful AI training ground:

optimized for multi-agent decision-making
provides built-in functionalities to improve sample efficiency
lowers the exploration burden for agents
provides insights into the behavior of train-agents
implements extensive logging that can be easily be extended.

Maze-Flatland - Environment architecture

Supported by

Key Features

Masking and Skipping for better sample efficiency. Prevents illegal actions and skips unnecessary decision where a single option is available.
Designed for multi-agent reinforcement learning (MARL). Sequential multi-agent decision-making is supported.
Customizable configurations for different training scenarios with YAML files.
Built-in KPIs and events for comprehensive performance tracking and analysis.

Setup

Navigate into the maze-flatland directory

cd maze-flatland

Create a conda environment from the environment.yml

Note: The default name for the environment is maze-flatland.

conda env create --file environment.yml

Note: The default version of flatland-rl is the stable version 4.0.1. Modify the environment.yml if another version is needed and run the tests to ensure full compatibility.

Activate the conda environment:

conda activate maze-flatland

Install maze-flatland locally:

pip install -e .

Note: the editable mode (-e) option is essential to facilitate the modification and creation of new experiment (yaml) files without re-installing.

Install maze-rl library

pip install git+https://github.com/enlite-ai/maze.git@dev

Test the installation

Run a rollout with a greedy policy:

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy

This performs 1 rollout with 3 trains and prints the resulting performance statistics to the command line, similarly to the given example

The statistics are saved in a rollout directory using the format: outputs/yyyy-mm-dd/hh-mm-ss/ from the working directory.

Running Rollouts and Training

Every experiment is launched by the maze-run command.

Training

Replace <path/to/yaml/exp> with your target experiment. E.g. offline/train/bc_train.
All the experiment files are stored at maze-flatland/maze_flatland/conf/experiment/.

maze-run -cn conf_train +experiment=<path/to/yaml/exp>

Testing

Replace <path/to/yaml/exp> with your target experiment. E.g. multi_train/rollout/heuristic/simple_greedy.

maze-run +experiment=<path/to/yaml/exp>

You can override the configuration by modifying the yaml file or at the command line level:

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy env._.n_trains=15 env._.map_height=40 env._.map_width=40 runner.n_episodes=10

Here we run 10 episodes (runner.n_episodes=10) on a 40x40 map (env._.map_height=40 and env._.map_width=40) with 15 trains (env._.n_trains=15).

Using wrappers

You can also launch experiments with wrappers defined at maze-flatland/maze_flatland/conf/wrappers.

masking and skipping wrappers (sub_step_skipping_monitored). Automatically skips decision-making when only one viable option exists.

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=sub_step_skipping_monitored

rendering wrapper (renderer). It stores a step-by-step rendering of the ongoing episode in a dedicated folder named rendered_states in the output directory.

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=renderer

dataset creator wrapper (spaces_recording). It stores the experience for offline training from the rolled out episode(s) in a dedicated folder named spaces_records in the output directory.

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=spaces_recording

Additionally, you can stack multiple wrappers. Beware of the order used for the wrappers. The first is the outmost wrapper.
In this case, if we reverse the order of the wrappers, the experience batch will include the agent-environment interactions that were skipped.

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=[renderer,spaces_recording,sub_step_skipping_monitored]

Offline training

To follow, we list the steps to be followed to train a policy.

Behavioral cloning

To get started, we provide a ready-to-use dataset for training an agent. You can either download, extract and use this dataset or proceed to the data collection phase.

Data collection [optional]

Collect 500 trajectories using the spaces_recording wrapper:

Tip: Using the parallel runner you can set the number of processes used to collect the data runner.n_processes=20

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=[sub_step_skipping_monitored,spaces_recording] runner=parallel runner.n_processes=5 runner.n_episodes=500

Note: You can replace the greedy agent with any other existing agent, both heuristic or trained.

We have now collected our dataset at /outputs/yyyy-mm-dd/hh-mm-ss/space_records/

Training phase

Finally, we can train with behavioral cloning our torch policy:

maze-run -cn conf_train +experiment=offline/train/bc_train trajectories_data=<replace/with/path/to/dataset>

Note: <replace/with/path/to/dataset> can be a single .pkl file, a directory containing multiple .pkl files or a list of the previous.

Now we have our trained policy at the output directory: ./flatland_result/BC_offline-v2.2/decision_point_mask/multi_train/ma_reduced_action_space-masked_flatland-bc-local/yyyy-mm-dd_hh-mm-sssss
Rollout the trained policy:

maze-run +experiment=multi_train/rollout/torch_policy_masked input_dir=<replace/with/your/output/directory/>

As default, the state_dict.pt is used to run the rollout as this holds the weights of the best performing policy.
To use a policy at a certain checkpoint specify the policy.state_dict_file to use. As an example,

maze-run +experiment=multi_train/rollout/torch_policy_masked input_dir=<replace/with/your/output/directory/> policy.state_dict_file=state_dict-epoch_80.pt

XGBoost

XGBoost is a powerful gradient boosting algorithm leveraging optimized tree-based learning to map tabular data to predictions, or action space in this case, excelling in speed and accuracy. Learn more about XGBoost.

Similarly as for behavioral cloning, you can use the ready-to-use dataset or proceed with the data collection phase.

Data collection [optional]

First, if using custom observation, we need to flatten the observation (obs_aggregator_flattening wrapper) to obtain tabular data compatible with xgboost.
Second, the first and last actions (DO_NOTHING and STOP_MOVING) need to be at least once in the dataset as XGBoost infers the labels from the dataset. This could be achieved by setting the do_skipping_in_reset to false in the skipping wrapper at the maze_flatland/conf/wrappers/sub_step_skipping_monitored.yaml file and by replacing the greedy policy with a random one.

Start data collection:

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=[sub_step_skipping_monitored,obs_aggregator_flattening,spaces_recording] runner=parallel runner.n_processes=5 runner.n_episodes=500 policy=masked_random_policy

We have now collected our dataset at /outputs/yyyy-mm-dd/hh-mm-ss/space_records/

Finally, we can proceed with training:

maze-run -cn conf_train +experiment=offline/train/xgboost_train trajectories_data=<replace/with/path/to/dataset/>

Note: <replace/with/path/to/dataset> can be a single .pkl file, a directory containing multiple .pkl files or a list of the previous.

Now we have our trained model at the output directory: ./flatland_result/XGBoost-v2.2/decision_point_mask/multi_train/ma_reduced_action_space-masked_flatland-xgboost-local/yyyy-mm-dd_hh-mm-sssss
Rollout the policy:

maze-run +experiment=multi_train/rollout/xgboost input_dir=<replace/with/your/output/directory/>

To conclude, we report a comparison between XGBoost and Neural Network, trained from the same dataset, over the validation set for the round 1 where malfunctions are enabled and trains have the same speed profile.

Validation 1st Round - Performance comparison Neural Network vs XGBoost.

References

Contacts

Website enlite.ai

Social

Acknowledgements

This work is partially supported by the AI4REALNET project that has received funding from European Union’s Horizon Europe Research and Innovation program under the Grant Agreement No 101119527. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
hydra_plugins		hydra_plugins
maze_flatland		maze_flatland
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
environment.yml		environment.yml
readme.md		readme.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Maze-Flatland

Supported by

Table of Contents

Key Features

Setup

Test the installation

Running Rollouts and Training

Training

Testing

Using wrappers

Offline training

Behavioral cloning

Data collection [optional]

Training phase

XGBoost

Data collection [optional]

References

Contacts

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

enlite-ai/maze-flatland

Folders and files

Latest commit

History

Repository files navigation

Maze-Flatland

Supported by

Table of Contents

Key Features

Setup

Test the installation

Running Rollouts and Training

Training

Testing

Using wrappers

Offline training

Behavioral cloning

Data collection [optional]

Training phase

XGBoost

Data collection [optional]

References

Contacts

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages