The code for now only supports torch--the support of jax is currently broken.
# run in torch
python run/train.py -a sync-ppo -e smac-8m_vs_9m -c smac_th -dl thIn this command:
-a sync-ppo: Specifies the algorithmppousing synchronous distributed architecture (sync).-e smac-8m_vs_9m: Selects the environment8m_vs_9mfrom the SMAC suite (smac).-c smac_th: Specifies the configuration (smac_th).-dl th: Indicates the use of Torch (th) as the deep learning library.
A multi-agent reinforcement learning library.
这是一个模块化的分布式多智能体强化学习的框架. 它主要由三个模块构成: i) 单/多智能体算法, ii) 分布式训练框架, iii) 博弈. 本文先介绍框架的使用指南, 然后再依次阐述这三个模块设计.
- 容易上手, 不需要编程基础也能在半小时内轻松学会多机调参实验.
- 模块化设计, 方便扩展, 新算法和环境的引入只需要遵循预先定制的接口, 即可即插即用.
- 现有的基础算法在多个benchmark上取得了SOTA的水平, 包括SMAC, GRF等经典的多智能体测试环境.
- 分布式训练框架, 支持自博弈以及不对称的多种群博弈, 评估.
单/多智能体算法的入口在algo/train.py, 算法由Agent定义, 大部分的交互模块都定义在Runner这个类里.
All the following python run/train.py can be replaced by python main.py, which automatically detects unexpected halts caused by simulator errors and restarts the whole system accordingly.
For stable simulators, python run/train.py is still the recommanded way to go.
# two agents playing against each other
python run/train.py -a ppo -e template-temp -c template template
python run/train.py -a ppo -e template-temp -c template -kw uid2aid=0,0 uid2gid=0,0
# self-play
python run/train.py -a async-ppo -e template-temp -c template
# run in torch
python run/train.py -a sync-ppo -e smac-8m_vs_9m -c smac_th -dl thwhere sync specifies the distributed architecture(dir: distributed), ppo specifies the algorithm(dir: algo), template denotes the environment suite, and temp is the environment name
By default, all the checkpoints and loggings are saved in ./logs/{env}/{algo}/{model_name}/.
You can also make some simple changes to *.yaml from the command line
# change learning rate to 0.0001, `lr` must appear in `*.yaml`
python run/train.py -a sync-hm -e unity-combat2d -kw lr=0.0001This change will automatically be embodied in Tensorboard, making it a recommanded way to do some simple hyperparameter tuning. Alternatively, you can modify configurations in *.yaml and specify model_name manually using command argument -n your_model_name.
python run/eval.py magw-logs/n_envs=64-n_steps=20-n_epochs=1/seed=4/ -n 1 -ne 1 -nr 1 -r -i eval -s 256 256 --fps 1The above code presents a way for evaluating a trained model, where
magw-logs/n_envs=64-n_steps=20-n_epochs=1/seed=4/is the model path-nspecifies the number of eposodes to run-nespecifies the number of environments running in parallel-nrspecifies the number of ray actors are devoted for runniing-rvisualizes the video and save it as a*.giffile-ispecifies the video name-sspecifies the screen size of the video--fpsspecifies the fps of the saved*.giffile
In some multi-agent settings, we may prefer using different configurations for different agents. The following code demonstrates an example of running multi-agent algorithms with multiple configurations, one for each agent.
# make sure `unity.yaml` and `unity2.yaml` exist in `configs/` directory
# the first agent is initialized with the configuration specified by `unity.yaml`,
# while the second agent is initialized with the configuration specified by `unity2.yaml`
python run/train.py -a sync-hm -e unity-combat2d -c unity unity2