Skip to content

[Bug] DeepSpeed ZeRO-2 is unstable with GRPOTrainer using Qwen3-VL MoE #4631

@casper-hansen

Description

@casper-hansen

Reproduction

Exact same script ran with zero2 vs plain accelerate leads to measurably worse reward.

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  deepspeed_multinode_launcher: standard
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: false
  zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: 'bf16'
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
Image

System Info

  • Platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.35
  • Python version: 3.12.9
  • TRL version: 0.25.1
  • PyTorch version: 2.8.0
  • accelerator(s): NVIDIA H200, NVIDIA H200, NVIDIA H200, NVIDIA H200, NVIDIA H200, NVIDIA H200, NVIDIA H200, NVIDIA H200
  • Transformers version: 4.57.1
  • Accelerate version: 1.12.0
  • Accelerate config: not found
  • Datasets version: 4.4.1
  • HF Hub version: 0.36.0
  • bitsandbytes version: 0.48.2
  • DeepSpeed version: 0.18.2
  • Liger-Kernel version: 0.6.4
  • LLM-Blender version: not installed
  • OpenAI version: 2.8.1
  • PEFT version: 0.18.0
  • vLLM version: 0.11.0

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions