[Refactor] Merge rollout controller into rollout manager by PopSoda2002 · Pull Request #304 · THUDM/slime

PopSoda2002 · 2025-09-07T03:26:54Z

Motivation:

Follow the software principle of High Cohesion, Low Coupling to merge the two function-similar class into one

Coauthor by @Williamren97 @MortalHappiness

Solution:

Merge common function into RolloutManager like generate eval
Adjust the files and functions position, move public function to the front, private func to back and remove buffer.py
Change corresponding rollout manager start code
Refactor some function like generate, divides into several functions
Pass ray actor variable like RolloutEngine from the ray actor like RolloutManager using ray.get() func

Result

We use qwen3-4b to test the refactor change will not change the behavior of baseline

colocate
script:

#!/bin/bash

# for rerun the task
pkill -9 sglang
sleep 3
ray stop --force
pkill -9 ray
pkill -9 python
sleep 3
pkill -9 ray
pkill -9 python

set -ex

# will prevent ray from buffering stdout/stderr
export PYTHONBUFFERED=16

NVLINK_COUNT=$(nvidia-smi | grep -o "NVLink" | wc -l)
if [ "$NVLINK_COUNT" -gt 0 ]; then
    HAS_NVLINK=1
else
    HAS_NVLINK=0
fi
echo "HAS_NVLINK: $HAS_NVLINK (detected $NVLINK_COUNT NVLink references)"

SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
source "${SCRIPT_DIR}/models/qwen3-4B.sh"

CKPT_ARGS=(
   --hf-checkpoint /root/Qwen3-4B
   #--hf-checkpoint /root/Qwen3-4B-FP8
   --ref-load /root/Qwen3-4B_torch_dist
   --load /root/Qwen3-4B_slime/
   --save /root/Qwen3-4B_slime/
   --save-interval 20
)

ROLLOUT_ARGS=(
   --prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl
   --input-key prompt
   --label-key label
   --apply-chat-template
   --rollout-shuffle
   --rm-type deepscaler
   --num-rollout 3000
   --rollout-batch-size 32
   --n-samples-per-prompt 4
   --rollout-max-response-len 8192
   --rollout-temperature 0.8

   --global-batch-size 128
   --balance-data
)

EVAL_ARGS=(
   --eval-interval 20
   --eval-prompt-data aime /root/aime-2024/aime-2024.jsonl
   --n-samples-per-eval-prompt 4
   --eval-max-response-len 16384
   --eval-top-p 0.7
)

PERF_ARGS=(
   --tensor-model-parallel-size 2
   --sequence-parallel
   --pipeline-model-parallel-size 1
   --context-parallel-size 1
   --expert-model-parallel-size 1
   --expert-tensor-parallel-size 1

   --recompute-granularity full
   --recompute-method uniform
   --recompute-num-layers 1

   # --micro-batch-size 1
   --use-dynamic-batch-size
   --max-tokens-per-gpu 9216
)

GRPO_ARGS=(
   --advantage-estimator grpo
   --use-kl-loss
   --kl-loss-coef 0.00
   --kl-loss-type low_var_kl
   --entropy-coef 0.00
   --eps-clip 0.2
   --eps-clip-high 0.28
)

OPTIMIZER_ARGS=(
   --optimizer adam
   --lr 1e-6
   --lr-decay-style constant
   --weight-decay 0.1
   --adam-beta1 0.9
   --adam-beta2 0.98
)

export WANDB_API_KEY="a37f4796e6205800c4212556a38e1319b5f144b7"
export CUDA_VISIBLE_DEVICES=5,6
WANDB_ARGS=(
   --use-wandb
   --wandb-project slime-dev
   --wandb-group qwen3-4B-test-huapeng
   --wandb-key ${WANDB_API_KEY}
)

SGLANG_ARGS=(
   --rollout-num-gpus-per-engine 1
   --sglang-mem-fraction-static 0.7
)

MISC_ARGS=(
   # default dropout in megatron is 0.1
   --attention-dropout 0.0
   --hidden-dropout 0.0
   # should be good for model performance
   --accumulate-allreduce-grads-in-fp32
   --attention-softmax-in-fp32
   # need to comment this when using model with MLA
   --attention-backend flash
)

# launch the master node of ray in container
export MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
ray start --head --node-ip-address ${MASTER_ADDR} --num-gpus 2 --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265

# Build the runtime environment JSON with proper variable substitution
RUNTIME_ENV_JSON="{
  \"env_vars\": {
    \"PYTHONPATH\": \"/root/Megatron-LM/\",
    \"CUDA_DEVICE_MAX_CONNECTIONS\": \"1\",
    \"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\"
  }
}"

ray job submit --address="http://127.0.0.1:8265" \
   --runtime-env-json="${RUNTIME_ENV_JSON}" \
   -- python3 train.py \
   --actor-num-nodes 1 \
   --actor-num-gpus-per-node 2 \
   --colocate \
   ${MODEL_ARGS[@]} \
   ${CKPT_ARGS[@]} \
   ${ROLLOUT_ARGS[@]} \
   ${OPTIMIZER_ARGS[@]} \
   ${GRPO_ARGS[@]} \
   ${DISTRIBUTED_ARGS[@]} \
   ${WANDB_ARGS[@]} \
   ${PERF_ARGS[@]} \
   ${EVAL_ARGS[@]} \
   ${SGLANG_ARGS[@]} \
   ${MISC_ARGS[@]}

baseline(main branch) vs change(refactor branch):

Within the jitter range, they remain basically consistent, and the spike timings are all aligned.wandb link

disaggreagate

scripts/run-glm4-9B.sh

slime/ray/utils.py

MortalHappiness · 2025-09-12T02:40:51Z

slime/ray/rollout.py

+            path = Path(path_template.format(rollout_id=self.rollout_id))
+            print(f"Save debug rollout data to {path}")
+            path.parent.mkdir(parents=True, exist_ok=True)
+            torch.save(data, path)


Why not

torch.save( dict( rollout_id=self.rollout_id, samples=[sample.to_dict() for sample in data], ), path, )

?

I just copy and past for this part, I think we can do the replacing in the future

slime/slime/ray/buffer.py

Lines 69 to 75 in ebf16c5

torch.save(

dict(

rollout_id=self.rollout_id,

samples=[sample.to_dict() for sample in data],

),

path,

)

Did you copy from here?

sorry, I think I used the older version of this part, I synced to the newest now. Thanks for your help!

Williamren97

LGTM :)

Co-authored-by: William Ren <williamren97@gmail.com> Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>"

Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

Co-authored-by: Chengxing Xie <91449279+yitianlian@users.noreply.github.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Haoran Wang <70007833+UbeCc@users.noreply.github.com>

zhaochenyang20 · 2025-09-16T02:39:40Z

@PopSoda2002 fix the lint locally.

* Refactor rollout manager * Polish * experiment * rebase * Clean code * add engine lock * Clean code * Clean * Move position of router and engine * remove logging * clean code * log * pre commit * resolve conflicts * precommit * "Refactor rollout manager Co-authored-by: William Ren <williamren97@gmail.com> Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>" * Refactor rollout manager Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com> * add coauthor Co-authored-by: Chengxing Xie <91449279+yitianlian@users.noreply.github.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Haoran Wang <70007833+UbeCc@users.noreply.github.com> --------- Co-authored-by: William Ren <williamren97@gmail.com> Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com> Co-authored-by: Chengxing Xie <91449279+yitianlian@users.noreply.github.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Haoran Wang <70007833+UbeCc@users.noreply.github.com>

zhaochenyang20 requested changes Sep 7, 2025

View reviewed changes

scripts/run-glm4-9B.sh Outdated Show resolved Hide resolved

PopSoda2002 force-pushed the refactor/rollout branch from 6d63b8f to 375bd28 Compare September 10, 2025 01:59

MortalHappiness reviewed Sep 12, 2025

View reviewed changes

PopSoda2002 marked this pull request as ready for review September 14, 2025 17:46

Williamren97 approved these changes Sep 15, 2025

View reviewed changes

PopSoda2002 added 13 commits September 15, 2025 17:26

Refactor rollout manager

e69deb7

Polish

d5658ce

experiment

40b0033

rebase

b15d180

Clean code

7194931

add engine lock

3fa04be

Clean code

d66bd70

Clean

304cf5f

Move position of router and engine

7aaf218

remove logging

b8b690c

clean code

b5fcbf9

log

292c2f8

pre commit

d949ce8

PopSoda2002 force-pushed the refactor/rollout branch from 6b4a413 to d949ce8 Compare September 15, 2025 17:29

PopSoda2002 and others added 5 commits September 15, 2025 17:31

resolve conflicts

3e7fa9c

precommit

c8558c3

"Refactor rollout manager

d0c2b24

Co-authored-by: William Ren <williamren97@gmail.com> Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>"

Refactor rollout manager

db61bd2

Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

add coauthor

8aa2e7b

Co-authored-by: Chengxing Xie <91449279+yitianlian@users.noreply.github.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Haoran Wang <70007833+UbeCc@users.noreply.github.com>

zhuzilin merged commit 20d0679 into THUDM:main Sep 16, 2025
3 of 4 checks passed

zhuzilin mentioned this pull request Sep 17, 2025

Refactor Plan for slime #243

Closed

yueming-yuan pushed a commit to yueming-yuan/slime that referenced this pull request Dec 29, 2025

Add the square logo of Miles (THUDM#304)

1efb5fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Merge rollout controller into rollout manager#304

[Refactor] Merge rollout controller into rollout manager#304
zhuzilin merged 18 commits intoTHUDM:mainfrom
PopSoda2002:refactor/rollout

PopSoda2002 commented Sep 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MortalHappiness Sep 12, 2025

Uh oh!

PopSoda2002 Sep 12, 2025

Uh oh!

MortalHappiness Sep 12, 2025

Uh oh!

PopSoda2002 Sep 12, 2025

Uh oh!

Williamren97 left a comment

Uh oh!

zhaochenyang20 commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	torch.save(
	dict(
	rollout_id=self.rollout_id,
	samples=[sample.to_dict() for sample in data],
	),
	path,
	)

Conversation

PopSoda2002 commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation:

Solution:

Result

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MortalHappiness Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

PopSoda2002 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

MortalHappiness Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

PopSoda2002 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Williamren97 left a comment

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PopSoda2002 commented Sep 7, 2025 •

edited

Loading