Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
__init__.py	__init__.py
agent_system.py	agent_system.py
prompts.py	prompts.py
rollout_with_multi_agents.py	rollout_with_multi_agents.py
run-qwen3-30B-A3B-multi-agent.sh	run-qwen3-30B-A3B-multi-agent.sh

Name

Last commit message

Last commit date

rollout_with_multi_agents.py

run-qwen3-30B-A3B-multi-agent.sh

Multi-Agent RL

This directory provides an example of running multi-agent reinforcement learning (RL) with slime.

Environment Setup

The environment setup is identical to the standard RL setup used in slime.

Running the Script

You can either define your own multi-agent system or use the provided default configuration.

MULTI_AGENT_CONFIGS = {
    "custom_multi_agent_function_path": "examples.multi_agent.agent_system.run_agent_system",
    "num_parallel": 5,
    "incorrect_reward_weight": 0.8,
    "correct_reward_weight": 1.2,
}

To start a run, execute:

cd slime/
bash examples/multi_agent/run-qwen3-30B-A3B-multi-agent.sh

New Arguments

Specify the agent rollout function with the --custom-generate-function-path argument.
Set the --rollout-max-context-len argument according to your model’s context window.

ROLLOUT_ARGS=(
   --custom-generate-function-path examples.multi_agent.rollout_with_multi_agents.generate_with_multi_agents
   --prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl
   --input-key prompt
   --label-key label
   --apply-chat-template
   --rollout-shuffle
   --rm-type deepscaler
   --num-rollout 3000
   --rollout-batch-size 32
   --n-samples-per-prompt 8
   --rollout-max-context-len 16384
   --rollout-max-response-len 8192
   --rollout-temperature 1

   --global-batch-size 256
   --balance-data
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Multi-Agent RL

Environment Setup

Running the Script

New Arguments

FilesExpand file tree

multi_agent

Directory actions

More options

Directory actions

More options

Latest commit

History

multi_agent

Folders and files

parent directory

README.md

Multi-Agent RL

Environment Setup

Running the Script

New Arguments