Skip to content

[Feature] Support Decoupled Speculation for SGLang Rollout #5559

@sisyphus111

Description

@sisyphus111

@Sparrow612

Description

As described in https://arxiv.org/pdf/2511.16193, the Decoupled Speculation method can significantly accelerate the rollout phase in LLM post-training (RL). This issue proposes integrating this decoupling mechanism into the framework's rollout phase.

Implementation Plan

Implement a Decoupled Speculation scheduler and runtime as outlined in the paper. The core change is to decouple the draft and verification steps:

  • allow the drafter and verifier to run asynchronously on separate GPU resources
  • enabling the drafter to proceed with drafting the next window of tokens without waiting for the verification result of the previous window

Version Info

  • veRL v0.7.0
  • SGLang (Rollout backend) v0.5.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions