Skip to content

Conversation

@mikequan0425
Copy link

@mikequan0425 mikequan0425 commented Dec 29, 2025

What does this PR do?

Provide an script for DAPO-training GPT-OSS-20B on NPU

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: [recipe] feat: support qwen2.5-32B DAPO training script on ASCEND NPU #3146, [megatron] feat: support gpt-oss #4323
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new training script for DAPO with GPT-OSS-20B on Ascend NPUs, along with a documentation update. The changes look good overall. I've found one potential high-severity issue in the new training script where an incorrect advantage estimator might be configured, which could affect the correctness of the training process. My detailed feedback is in the review comment.

#!/bin/bash
project_name='gptoss_verl_fsdp'
exp_name='32rank-gptoss-20B'
adv_estimator=grpo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script is configured for a DAPO training run, as indicated by the script name, configuration files, and reward manager. However, the advantage estimator is set to grpo. This appears to be inconsistent and likely incorrect for a DAPO recipe. Using a grpo estimator may not align with the DAPO algorithm, potentially leading to incorrect training behavior. It should be changed to the appropriate estimator for DAPO, which is presumably dapo.

Suggested change
adv_estimator=grpo
adv_estimator=dapo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants