Skip to content

Support GRPO-style post-training in Kubeflow Trainer via TrainingRuntime and targeted Torch plugin support #3508

@kapil27

Description

@kapil27

What you would like to be added?

I would like to propose initial support for GRPO-style post-training in Kubeflow Trainer.
Concretely, a good first step seems to be:

  1. add one or more GRPO-focused ClusterTrainingRuntime examples/manifests
  2. support a GRPO trainer image + launcher contract
  3. extend the Torch plugin only where GRPO needs targeted validation or command mutation
  4. keep rollout backend configuration explicit through runtime/config inputs###

Why is this needed?

Kubeflow Trainer v2 already has a strong direction for distributed training, LLM fine-tuning, and runtime-centric extensibility.

At the same time, LLM post-training is moving beyond SFT-only workflows. Techniques like GRPO are becoming relevant for:

  • RLHF-style post-training
  • reward-driven alignment
  • iterative improvement of open-weight models
  • experimentation with smaller or more specialized models after supervised fine-tuning

This seems like a natural area for Trainer to support next because:

  • Trainer already supports specialized LLM fine-tuning behavior through runtimes and targeted plugin logic
  • the LLM Trainer design already leaves room for future fine-tuning techniques such as RLHF
  • there does not appear to be a clear upstream path yet for GRPO specifically
  • I searched existing issues for GRPO and did not find a dedicated issue already tracking this.

If there is alignment on the direction, I would be happy to follow up with:

  • a narrow implementation PR, or
  • a proposal/KEP first, if that is the preferred path

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions