@Sparrow612
Description
As described in https://arxiv.org/pdf/2511.16193, the Decoupled Speculation method can significantly accelerate the rollout phase in LLM post-training (RL). This issue proposes integrating this decoupling mechanism into the framework's rollout phase.
Implementation Plan
Implement a Decoupled Speculation scheduler and runtime as outlined in the paper. The core change is to decouple the draft and verification steps:
- allow the drafter and verifier to run asynchronously on separate GPU resources
- enabling the drafter to proceed with drafting the next window of tokens without waiting for the verification result of the previous window
Version Info
- veRL v0.7.0
- SGLang (Rollout backend) v0.5.8