llm-rlhf

Here are 2 public repositories matching this topic...

[NeurIPS 2025 Spotlight] ReasonFlux (long-CoT), ReasonFlux-PRM (process reward model) and ReasonFlux-Coder (code generation)

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

lora reward trl llm rlhf trlx llm-rlhf

Add a description, image, and links to the llm-rlhf topic page so that developers can more easily learn about it.

To associate your repository with the llm-rlhf topic, visit your repo's landing page and select "manage topics."