Skip to content

enlite-ai/maze-flatland

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Maze-Flatland

Maze-Flatland wraps and extends the flatland-rl environment in maze-rl, making it a powerful AI training ground:

  • optimized for multi-agent decision-making
  • provides built-in functionalities to improve sample efficiency
  • lowers the exploration burden for agents
  • provides insights into the behavior of train-agents
  • implements extensive logging that can be easily be extended.
Maze-Flatland - Environment architecture

Supported by

Table of Contents

  1. Key Features
  2. Setup
  3. Running Rollouts and Training
  4. Offline Training
  5. References
  6. Contacts
  7. Acknowledgements

Key Features

  • Masking and Skipping for better sample efficiency. Prevents illegal actions and skips unnecessary decision where a single option is available.
  • Designed for multi-agent reinforcement learning (MARL). Sequential multi-agent decision-making is supported.
  • Customizable configurations for different training scenarios with YAML files.
  • Built-in KPIs and events for comprehensive performance tracking and analysis.

Setup

  • Navigate into the maze-flatland directory
cd maze-flatland
  • Create a conda environment from the environment.yml

Note: The default name for the environment is maze-flatland.

conda env create --file environment.yml

Note: The default version of flatland-rl is the stable version 4.0.1. Modify the environment.yml if another version is needed and run the tests to ensure full compatibility.

  • Activate the conda environment:
conda activate maze-flatland
  • Install maze-flatland locally:
pip install -e .

Note: the editable mode (-e) option is essential to facilitate the modification and creation of new experiment (yaml) files without re-installing.

  • Install maze-rl library
pip install git+https://github.com/enlite-ai/maze.git@dev

Test the installation

Run a rollout with a greedy policy:

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy

This performs 1 rollout with 3 trains and prints the resulting performance statistics to the command line, similarly to the given example

Description

The statistics are saved in a rollout directory using the format: outputs/yyyy-mm-dd/hh-mm-ss/ from the working directory.

Running Rollouts and Training

Every experiment is launched by the maze-run command.

Training

Replace <path/to/yaml/exp> with your target experiment. E.g. offline/train/bc_train.
All the experiment files are stored at maze-flatland/maze_flatland/conf/experiment/.

maze-run -cn conf_train +experiment=<path/to/yaml/exp>

Testing

Replace <path/to/yaml/exp> with your target experiment. E.g. multi_train/rollout/heuristic/simple_greedy.

maze-run +experiment=<path/to/yaml/exp>

You can override the configuration by modifying the yaml file or at the command line level:

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy env._.n_trains=15 env._.map_height=40 env._.map_width=40 runner.n_episodes=10

Here we run 10 episodes (runner.n_episodes=10) on a 40x40 map (env._.map_height=40 and env._.map_width=40) with 15 trains (env._.n_trains=15).

Using wrappers

You can also launch experiments with wrappers defined at maze-flatland/maze_flatland/conf/wrappers.

  • masking and skipping wrappers (sub_step_skipping_monitored). Automatically skips decision-making when only one viable option exists.
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=sub_step_skipping_monitored
  • rendering wrapper (renderer). It stores a step-by-step rendering of the ongoing episode in a dedicated folder named rendered_states in the output directory.
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=renderer
  • dataset creator wrapper (spaces_recording). It stores the experience for offline training from the rolled out episode(s) in a dedicated folder named spaces_records in the output directory.
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=spaces_recording

Additionally, you can stack multiple wrappers. Beware of the order used for the wrappers. The first is the outmost wrapper.
In this case, if we reverse the order of the wrappers, the experience batch will include the agent-environment interactions that were skipped.

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=[renderer,spaces_recording,sub_step_skipping_monitored]

Offline training

To follow, we list the steps to be followed to train a policy.

Behavioral cloning

To get started, we provide a ready-to-use dataset for training an agent. You can either download, extract and use this dataset or proceed to the data collection phase.


Data collection [optional]
  1. Collect 500 trajectories using the spaces_recording wrapper:

Tip: Using the parallel runner you can set the number of processes used to collect the data runner.n_processes=20

maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=[sub_step_skipping_monitored,spaces_recording] runner=parallel runner.n_processes=5 runner.n_episodes=500

Note: You can replace the greedy agent with any other existing agent, both heuristic or trained.

  1. We have now collected our dataset at /outputs/yyyy-mm-dd/hh-mm-ss/space_records/


Training phase
  1. Finally, we can train with behavioral cloning our torch policy:
maze-run -cn conf_train +experiment=offline/train/bc_train trajectories_data=<replace/with/path/to/dataset>

Note: <replace/with/path/to/dataset> can be a single .pkl file, a directory containing multiple .pkl files or a list of the previous.

  1. Now we have our trained policy at the output directory: ./flatland_result/BC_offline-v2.2/decision_point_mask/multi_train/ma_reduced_action_space-masked_flatland-bc-local/yyyy-mm-dd_hh-mm-sssss
  2. Rollout the trained policy:
maze-run +experiment=multi_train/rollout/torch_policy_masked input_dir=<replace/with/your/output/directory/>

As default, the state_dict.pt is used to run the rollout as this holds the weights of the best performing policy.
To use a policy at a certain checkpoint specify the policy.state_dict_file to use. As an example,

maze-run +experiment=multi_train/rollout/torch_policy_masked input_dir=<replace/with/your/output/directory/> policy.state_dict_file=state_dict-epoch_80.pt

XGBoost

XGBoost is a powerful gradient boosting algorithm leveraging optimized tree-based learning to map tabular data to predictions, or action space in this case, excelling in speed and accuracy. Learn more about XGBoost.

Similarly as for behavioral cloning, you can use the ready-to-use dataset or proceed with the data collection phase.


Data collection [optional]
  • First, if using custom observation, we need to flatten the observation (obs_aggregator_flattening wrapper) to obtain tabular data compatible with xgboost.
  • Second, the first and last actions (DO_NOTHING and STOP_MOVING) need to be at least once in the dataset as XGBoost infers the labels from the dataset. This could be achieved by setting the do_skipping_in_reset to false in the skipping wrapper at the maze_flatland/conf/wrappers/sub_step_skipping_monitored.yaml file and by replacing the greedy policy with a random one.
  1. Start data collection:
maze-run +experiment=multi_train/rollout/heuristic/simple_greedy wrappers=[sub_step_skipping_monitored,obs_aggregator_flattening,spaces_recording] runner=parallel runner.n_processes=5 runner.n_episodes=500 policy=masked_random_policy
  1. We have now collected our dataset at /outputs/yyyy-mm-dd/hh-mm-ss/space_records/

  1. Finally, we can proceed with training:
maze-run -cn conf_train +experiment=offline/train/xgboost_train trajectories_data=<replace/with/path/to/dataset/>

Note: <replace/with/path/to/dataset> can be a single .pkl file, a directory containing multiple .pkl files or a list of the previous.

  1. Now we have our trained model at the output directory: ./flatland_result/XGBoost-v2.2/decision_point_mask/multi_train/ma_reduced_action_space-masked_flatland-xgboost-local/yyyy-mm-dd_hh-mm-sssss
  2. Rollout the policy:
maze-run +experiment=multi_train/rollout/xgboost input_dir=<replace/with/your/output/directory/>




To conclude, we report a comparison between XGBoost and Neural Network, trained from the same dataset, over the validation set for the round 1 where malfunctions are enabled and trains have the same speed profile.

Validation 1st Round - Performance comparison Neural Network vs XGBoost.

References

maze-rl GitHub

maze-rl docs

flatland-rl GitHub

flatland Website

AI4RealNet Website

XGBoost Library

Contacts

Website Description enlite.ai

Social Description

Acknowledgements

This work is partially supported by the AI4REALNET project that has received funding from European Union’s Horizon Europe Research and Innovation program under the Grant Agreement No 101119527. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages