What you would like to be added?
I would like to propose initial support for GRPO-style post-training in Kubeflow Trainer.
Concretely, a good first step seems to be:
- add one or more GRPO-focused ClusterTrainingRuntime examples/manifests
- support a GRPO trainer image + launcher contract
- extend the Torch plugin only where GRPO needs targeted validation or command mutation
- keep rollout backend configuration explicit through runtime/config inputs###
Why is this needed?
Kubeflow Trainer v2 already has a strong direction for distributed training, LLM fine-tuning, and runtime-centric extensibility.
At the same time, LLM post-training is moving beyond SFT-only workflows. Techniques like GRPO are becoming relevant for:
- RLHF-style post-training
- reward-driven alignment
- iterative improvement of open-weight models
- experimentation with smaller or more specialized models after supervised fine-tuning
This seems like a natural area for Trainer to support next because:
- Trainer already supports specialized LLM fine-tuning behavior through runtimes and targeted plugin logic
- the LLM Trainer design already leaves room for future fine-tuning techniques such as RLHF
- there does not appear to be a clear upstream path yet for GRPO specifically
- I searched existing issues for GRPO and did not find a dedicated issue already tracking this.
If there is alignment on the direction, I would be happy to follow up with:
- a narrow implementation PR, or
- a proposal/KEP first, if that is the preferred path
Love this feature?
Give it a 👍 We prioritize the features with most 👍
What you would like to be added?
I would like to propose initial support for GRPO-style post-training in Kubeflow Trainer.
Concretely, a good first step seems to be:
Why is this needed?
Kubeflow Trainer v2 already has a strong direction for distributed training, LLM fine-tuning, and runtime-centric extensibility.
At the same time, LLM post-training is moving beyond SFT-only workflows. Techniques like GRPO are becoming relevant for:
This seems like a natural area for Trainer to support next because:
If there is alignment on the direction, I would be happy to follow up with:
Love this feature?
Give it a 👍 We prioritize the features with most 👍