利用GYM Schedule进行SWE Agentic训练时出现模型迭代一轮后 退化全部生成“！！！！”

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues，确认这是一个新的问题与讨论。

### Question Description / 问题描述

利用Swfit中的GYM_Env进行多轮的SWE Agentic训练。当使用Deepspeed Zero3 + VLLM Server rollout进行GRPO的训练时（Qwen3.5-9B）模型在一个step后直接出现退化 输出全为 “！！！！！！“（应该是模型参数出现了NaN），但第一步的loss以及grad都是正常范围。但是使用Zero2训练时，一个step后是正常的。

<img width="898" height="556" alt="Image" src="https://github.com/user-attachments/assets/f4a16666-790f-4b0a-92fc-b37a86553fbe" />

相关zero3配置：{
    "bf16": {
      "enabled": true
    },
    "fp16": {
      "enabled": false
    },
    "zero_optimization": {
      "stage": 3,
      "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
      },
      "overlap_comm": true,
      "contiguous_gradients": true,
      "stage3_max_live_parameters": 1e9,
      "stage3_max_reuse_distance": 1e9,
      "stage3_prefetch_bucket_size": 5e7,
      "stage3_param_persistence_threshold": 1e5,
      "reduce_bucket_size": 5e7,
      "sub_group_size": 1e8,
      "stage3_gather_16bit_weights_on_model_save": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": 1.0,
    "steps_per_print": 20,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
  }

zero2配置：{
    "bf16": {
      "enabled": true
    },
    "fp16": {
      "enabled": false
    },
    "zero_optimization": {
      "stage": 2,
      "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
      },
      "allgather_partitions": true,
      "allgather_bucket_size": 20000000,
      "overlap_comm": true,
      "reduce_scatter": true,
      "reduce_bucket_size": 20000000,
      "contiguous_gradients": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": 1.0,
    "steps_per_print": 20,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
  }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

利用GYM Schedule进行SWE Agentic训练时出现模型迭代一轮后退化全部生成“！！！！” #9246

Checklist / 检查清单

Question Description / 问题描述

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

利用GYM Schedule进行SWE Agentic训练时出现模型迭代一轮后 退化全部生成“！！！！” #9246

Description

Checklist / 检查清单

Question Description / 问题描述

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

利用GYM Schedule进行SWE Agentic训练时出现模型迭代一轮后退化全部生成“！！！！” #9246