Skip to content

feat: add update_runtime_status utility for progress reporting#368

Draft
abhijeet-dhumal wants to merge 3 commits intokubeflow:mainfrom
abhijeet-dhumal:training-progression
Draft

feat: add update_runtime_status utility for progress reporting#368
abhijeet-dhumal wants to merge 3 commits intokubeflow:mainfrom
abhijeet-dhumal:training-progression

Conversation

@abhijeet-dhumal
Copy link
Member

@abhijeet-dhumal abhijeet-dhumal commented Mar 7, 2026

What this PR does / why we need it:
Adds update_runtime_status() utility for training scripts to report progress to the Kubeflow Trainer controller. Includes throttling, token caching, and graceful error handling. Works with the TrainJobProgress feature gate in
kubeflow/trainer#3227.

Which issue(s) this PR fixes:
Fixes #367

Checklist:

  • Docs included if any changes are user facing

Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
@google-oss-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kramaranya for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot requested review from kramaranya and szaher March 7, 2026 09:19
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
@abhijeet-dhumal
Copy link
Member Author

Waiting for Kubeflow Trainer feature implementation to be merged : kubeflow/trainer#3227 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add utility for TrainJob progress reporting

1 participant