Support GRPO-style post-training in Kubeflow Trainer via TrainingRuntime and targeted Torch plugin support

### What you would like to be added?

I would like to propose initial support for GRPO-style post-training in Kubeflow Trainer.
Concretely, a good first step seems to be:

1. add one or more GRPO-focused ClusterTrainingRuntime examples/manifests
2. support a GRPO trainer image + launcher contract
3. extend the Torch plugin only where GRPO needs targeted validation or command mutation
4. keep rollout backend configuration explicit through runtime/config inputs### 

### Why is this needed?

Kubeflow Trainer v2 already has a strong direction for distributed training, LLM fine-tuning, and runtime-centric extensibility.

At the same time, LLM post-training is moving beyond SFT-only workflows. Techniques like GRPO are becoming relevant for:

- RLHF-style post-training
- reward-driven alignment
- iterative improvement of open-weight models
- experimentation with smaller or more specialized models after supervised fine-tuning

This seems like a natural area for Trainer to support next because:

- Trainer already supports specialized LLM fine-tuning behavior through runtimes and targeted plugin logic
- the LLM Trainer design already leaves room for future fine-tuning techniques such as RLHF
- there does not appear to be a clear upstream path yet for GRPO specifically
- I searched existing issues for GRPO and did not find a dedicated issue already tracking this.

If there is alignment on the direction, I would be happy to follow up with:

- a narrow implementation PR, or
- a proposal/KEP first, if that is the preferred path





### Love this feature?

Give it a 👍 We prioritize the features with most 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GRPO-style post-training in Kubeflow Trainer via TrainingRuntime and targeted Torch plugin support #3508

What you would like to be added?

Why is this needed?

Love this feature?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support GRPO-style post-training in Kubeflow Trainer via TrainingRuntime and targeted Torch plugin support #3508

Description

What you would like to be added?

Why is this needed?

Love this feature?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions