Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
chord.sh	chord.sh
full_lmdeploy.sh	full_lmdeploy.sh
gspo.sh	gspo.sh
moe_full.sh	moe_full.sh
moe_lora.sh	moe_lora.sh
pt.sh	pt.sh
rloo.sh	rloo.sh
vllm_72b_4gpu.sh	vllm_72b_4gpu.sh
vllm_lora_qwenvl72b.sh	vllm_lora_qwenvl72b.sh
vllm_multi_turn.sh	vllm_multi_turn.sh
vllm_vl7b.sh	vllm_vl7b.sh

Name

Last commit message

Last commit date

vllm_lora_qwenvl72b.sh

vllm_multi_turn.sh

vllm_vl7b.sh

README: GRPO Internal(Colocate) Mode Execution Scripts

NOTE

Introduction

The GRPO (Group Relative Policy Optimization) training framework supports high-performance inference engines like vLLM to accelerate the sampling process. The Internal Mode allows you to deploy vLLM and perform training using the same GPU resources.

This folder contains scripts and instructions for running GRPO in Internal Mode

Training with Internal mode

--use_vllm true \
--vllm_mode colocate \
--vllm_gpu_memory_utilization [ut_ratio] \

Multi-Node Training

On each node, execute the original single-node training script, using the environment variables NNODES and NODE_RANK, and ensure consistent use of configuration parameters across all nodes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README: GRPO Internal(Colocate) Mode Execution Scripts

Introduction

Training with Internal mode

Multi-Node Training

FilesExpand file tree

internal

Directory actions

More options

Directory actions

More options

Latest commit

History

internal

Folders and files

parent directory

README.md

README: GRPO Internal(Colocate) Mode Execution Scripts

Introduction

Training with Internal mode

Multi-Node Training