Skip to content

[Feature] Sequence/token level rejection sampling on async training #1052

@huaiyizhao

Description

@huaiyizhao

Checklist

  • This feature will maintain backward compatibility with the current APIs in
    areal/api/. If not, please raise a refactor issue first.

Background

enhancement for handling staleness

Potential Solution

https://richardli.xyz/post/rl-collapse-part3/

Additional Information

verl implementations https://verl.readthedocs.io/en/latest/algo/rollout_corr.html
(Add any relevant context, references, or supporting data here.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions