GRPO dataset(s) #138
-
in the README, the example for launching a RL job uses a specific dataset: AI-MO/NuminaMath-TIR
|
Beta Was this translation helpful? Give feedback.
Answered by
qgallouedec
Jan 31, 2025
Replies: 1 comment 2 replies
-
It's a nice dataset that have verifiable answer. But technically, you can use any verifiable data.
Indeed, we've a lot a of issues opened to suggest other domains. There's a lot to explore here, for sure. |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
ocramz
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It's a nice dataset that have verifiable answer. But technically, you can use any verifiable data.
Indeed, we've a lot a of issues opened to suggest other domains. There's a lot to explore here, for sure.