Skip to content

[Feature] Fine-tuning with GPRO #1919

@joshuayao

Description

@joshuayao

Priority

P1-Stopper

OS type

N/A

Hardware type

N/A

Running nodes

N/A

Description

Leverage Group Policy Ranking Optimization for more efficient and effective reinforcement learning, producing higher-quality models with less overhead.

Metadata

Metadata

Assignees

Labels

Backlogfeatures in backlogfeatureNew feature or request

Projects

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions