Skip to content

V2.1.0 odh rebase#16

Closed
abhijeet-dhumal wants to merge 121 commits intoopendatahub-io:mainfrom
abhijeet-dhumal:v2.1.0-odh-rebase
Closed

V2.1.0 odh rebase#16
abhijeet-dhumal wants to merge 121 commits intoopendatahub-io:mainfrom
abhijeet-dhumal:v2.1.0-odh-rebase

Conversation

@abhijeet-dhumal
Copy link
Copy Markdown
Member

@abhijeet-dhumal abhijeet-dhumal commented Nov 11, 2025

What this PR does / why we need it:
RHOAIENG-38137
Upstream Trainer v2 operator has latest release v2.1 - kubeflow#2859
This PR is to follow the upgrade and rebase midstream/downstream.

Checklist:

  • Docs included if any changes are user facing

Summary by CodeRabbit

Release Notes

  • New Features

    • Added GPU-enabled end-to-end testing workflow
    • Added Rust testing support
    • Added Helm chart OCI-based installation and publishing
    • Introduced Volcano PodGroup policy support for pod scheduling
    • Added pod template override capabilities and container restart rule configurations to Python API
  • Documentation

    • Published security policy and vulnerability reporting guidelines
    • Expanded contributor guide with coding style standards
    • Added 2025 roadmap milestones
  • Updates

    • Updated Kubernetes test versions and tool dependencies
    • Bumped Python API and Helm chart versions to 2.1.0
    • Added new notebook-based training examples to E2E tests

kramaranya and others added 30 commits June 11, 2025 08:18
* fix(docs): convert commits to list in changelog.py for compatibility

Signed-off-by: kramaranya <kramaranya15@gmail.com>

* chore(docs): add Changelog for Trainer v2.0.0-rc.0

Signed-off-by: kramaranya <kramaranya15@gmail.com>

---------

Signed-off-by: kramaranya <kramaranya15@gmail.com>
…nShift (kubeflow#2682)

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
…#2685)

* chore(runtime): Bump Torch to 2.7.1 and DeepSpeed to 0.17.1

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update cuda to 12.8

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
…w#2382)

* Add the manifests overlay for Kubeflow Training V2

Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
Signed-off-by: Xinmin Du <2812493086@qq.com>

* Update manifest: adjust permissions, and format changes

Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
Signed-off-by: Xinmin Du <2812493086@qq.com>

* Update manifest: rename overlay, adjust event permissions

Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
Signed-off-by: Xinmin Du <2812493086@qq.com>

* Update manifest: make namespace configurable

Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
Signed-off-by: Xinmin Du <2812493086@qq.com>

* Update manifest: move standalone, only-manager installation in namespace: kubeflow-system

Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
Signed-off-by: Xinmin Du <2812493086@qq.com>

* Update manifest: add overlay for Kubeflow Platform installation

Signed-off-by: Xinmin Du <2812493086@qq.com>

* add permission for pods log read & rm persistentvolumeclaims

Signed-off-by: Xinmin Du <2812493086@qq.com>

* create the runtimes before the webhooks

Signed-off-by: Xinmin Du <2812493086@qq.com>

* Specify sorting order: fifo

Signed-off-by: Xinmin Du <2812493086@qq.com>

* Deploy jobset first

Signed-off-by: Xinmin Du <2812493086@qq.com>

* remove edit permissions to runtimes; install runtimes after crds

Signed-off-by: Xinmin Du <2812493086@qq.com>

* remove pretraining directory

Signed-off-by: Xinmin Du <2812493086@qq.com>

* patch runtimes images

Signed-off-by: Xinmin Du <2812493086@qq.com>

* fix: correct image

Signed-off-by: Xinmin Du <2812493086@qq.com>

* add image patch for more runtimes

Signed-off-by: Xinmin Du <2812493086@qq.com>

* Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Du Xinmin <2812493086@qq.com>

* Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Du Xinmin <2812493086@qq.com>

* Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Du Xinmin <2812493086@qq.com>

* Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Du Xinmin <2812493086@qq.com>

* Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Du Xinmin <2812493086@qq.com>

* role_bind for notebook & profile

Signed-off-by: Xinmin Du <2812493086@qq.com>

* fix: reorder images

Signed-off-by: Xinmin Du <2812493086@qq.com>

* fix: reuse overlay/manager & runtimes

Signed-off-by: Xinmin Du <2812493086@qq.com>

* fix: remove namespace with patch

Signed-off-by: Xinmin Du <2812493086@qq.com>

---------

Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
Signed-off-by: Xinmin Du <2812493086@qq.com>
Signed-off-by: Du Xinmin <2812493086@qq.com>
Co-authored-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
…ith CTR and TrainJob yaml files (kubeflow#2669)

* chore(mainfests): include torchtune runtimes.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(manifests): Update torchtune runtimes.:

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore(manifests): Update mounting path in CTRs.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(manifests): Update output_dir.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(manifests): Update numProcPerNode to auto.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
…w#2675)

* fix(plugins): fix errors in trainer command mutation of torchtune.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(plugins): remove config file format suffix.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(test): update UTs.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(initializer): Update the workspace of dataset/model initializer.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(plugins): parse nproc_per_node from GPU resource.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(torchtune): Add bitsandbytes dependency in requirements.txt

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(lint): fix lint error.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(torchtune): Remove unnecessary num_proc_per_node calculation.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* test(torch): Update invalid parameters.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* feat: Mutable PodSpecOverrides for suspended TrainJob

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

* Include @tenzen-y review

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

* Add unit tests

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

---------

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
* feat(example): Add alpaca-trianjob-yaml.ipynb.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(example): Update the overview of the torchtune llama3_2 example.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(example): Update the pvc description.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* chore(example): Add the get the fine-tuned model section.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(example): Fix some errors.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(example): fix some errors.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(manifests): Fix debug tag.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(example): Change PVC creation method to Python SDK.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

* fix(example): Remove config load.

Signed-off-by: Electronic-Waste <2690692950@qq.com>

---------

Signed-off-by: Electronic-Waste <2690692950@qq.com>
Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
* feat: Add schedulingGates to PodSpecOverrides

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

* Change desired job to target job in PodSpecOverrides comments

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

---------

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
* fix(module): Change Go module name to v2

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Bump x/net to v0.38.0

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* chore(docs): Add Changelog for v2.0.0-rc.1

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Move example to misc

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* Add Red Hat to ADOPTERS.md

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* Update ADOPTERS.md

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
…d to job (kubeflow#2719)

Signed-off-by: rudeigerc <rudeigerc@gmail.com>
Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* chore(ci): Add GitHub action to verify PR titles

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Use operator scope

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add examples scope

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add scripts to scope

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add exporter

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* add wip ignore label

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add PR title to the contrib guide

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Ignore dependencies label

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix text

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Use action only on master branch

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
…ue template (kubeflow#2732)

Signed-off-by: rudeigerc <rudeigerc@gmail.com>
… jobset (kubeflow#2734)

Signed-off-by: rudeigerc <rudeigerc@gmail.com>
Signed-off-by: Koray Oksay <koray.oksay@gmail.com>
* chore(docs): Add Changelog for Kubeflow Trainer v2.0.0

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add links for blog post and migration guide

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add links for blog post and website

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* feat(docs): Kubeflow Trainer ROADMAP 2025

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update roadmap

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add issue for Trainer UI

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add issues for MPI and plugin extension

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add issues for builtin trainers

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
…ubeflow#2754)

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
google-oss-robot and others added 14 commits October 31, 2025 16:48
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
kubeflow#2911)

* feat(initializer): add s3 model and dataset initializers

Signed-off-by: rudeigerc <rudeigerc@gmail.com>

* chore: refactor with opendal

Signed-off-by: rudeigerc <rudeigerc@gmail.com>

* chore: support `role_arn` and add `ignore_patterns` field in the Initializers configs

Signed-off-by: rudeigerc <rudeigerc@gmail.com>

---------

Signed-off-by: rudeigerc <rudeigerc@gmail.com>
Co-authored-by: rudeigerc <rudeigerc@gmail.com>
…ubeflow#2912)

* chore(operator): Use SSA throughout runtime framework

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

* Fix lint error

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

* Update go.mod file

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

---------

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
Co-authored-by: Antonin Stefanutti <antonin@stefanutti.fr>
…harts (kubeflow#2914)

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
Co-authored-by: Antonin Stefanutti <antonin@stefanutti.fr>
…branch (kubeflow#2917)

* feat(manifests): Publish Trainer Helm Charts (kubeflow#2906)

* Solve Remaining Error and bugs

Signed-off-by: adity1raut <araut7798@gmail.com>

* Solve the confige

Signed-off-by: adity1raut <araut7798@gmail.com>

* Update The Suggest Change

Signed-off-by: adity1raut <araut7798@gmail.com>

* Update After REview

Signed-off-by: adity1raut <araut7798@gmail.com>

* Update the Helm publish action

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update release doc

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Use 0.0.0 version for master branch

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update release doc

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: adity1raut <araut7798@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* fix(manifests): Fix Helm charts image name (kubeflow#2915)

* fix(manifests): Fix Helm charts image name

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Always insert appVersion to the Chart.yaml file

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix comment

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Simplify action

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* fix(manifests): Remove the default tag from the controller image (kubeflow#2916)

* fix(manifests): Remove the default tag from the controller image

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix README template

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: adity1raut <araut7798@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Aditya Raut <159172287+adity1raut@users.noreply.github.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
…cache nodes (kubeflow#2920)

Signed-off-by: Akshay Chitneni <achitneni@apple.com>
Co-authored-by: Akshay Chitneni <achitneni@apple.com>
…#2924)

* add local docker training example

Signed-off-by: Brian Gallagher <briangal@gmail.com>

* feat: Adding local execution example notebook Co-authored-by Brian Gallagher <bgallagh@redhat.com>

Signed-off-by: Fiona Waters <fiwaters6@gmail.com>

---------

Signed-off-by: Brian Gallagher <briangal@gmail.com>
Signed-off-by: Fiona Waters <fiwaters6@gmail.com>
Co-authored-by: Brian Gallagher <briangal@gmail.com>
Co-authored-by: Fiona Waters <fiwaters6@gmail.com>
…ubeflow#2927)

* fix(ci): Fix the Kubeflow SDK installation with Docker

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Uncomment delete job in local Notebooks

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update .github/workflows/test-e2e.yaml

Co-authored-by: Anya Kramar <akramar@redhat.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Anya Kramar <akramar@redhat.com>
…e and example (kubeflow#2928)

Signed-off-by: Akshay Chitneni <achitneni@apple.com>
Co-authored-by: Akshay Chitneni <achitneni@apple.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* created github workflow for trainer

* added workflow dispatcher

* updating temp quay token in github

* Remove odh-kfto-sdk-notebooks-sync workflow

* updated build pipeline to use rhoai docker file

* removed pre-build commands from build and publish

* added multiarch docker file

* fixed typo for multiarch

* fixed multiarch file

* temporary quay push

* reverted local build image testing creds

* Update Dockerfile.rhoai

* update dockerfile.rhoai to dockerfile.odh

* fixed nitpick comments

* removed odh-release.yaml
- Add RHOAI specific Dockerfile for Trainer V2 controller image
- Add RHOAI overlay manifests for Trainer V2
- Add custom training runtimes in rhoai overlay
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Nov 11, 2025

Walkthrough

This pull request introduces version 2.1.0 of Kubeflow Trainer with expanded CI/CD workflows, new model definitions in the Python API, Helm chart updates, configuration enhancements, and documentation improvements. Key additions include GPU E2E testing, workflow approval automation, Rust support, Volcano CRD integration, and extensive Kubernetes API model extensions for container restart rules, pod certificates, and extended resource handling.

Changes

Cohort / File(s) Summary
Configuration & Linting
.flake8, .gitattributes, .gitignore, .pre-commit-config.yaml, .golangci-kal.yml
Extended flake8 ignore rules (E203), added linguist-generated markers for Go generated code and API client paths, expanded gitignore patterns for Helm and data_cache artifacts, configured pre-commit hooks for Rust formatting/checking, introduced kubeapilinter configuration with customized KubeAPI linting rules
GitHub Workflows – CI/CD Pipelines
.github/workflows/build-and-push-images.yaml, .github/workflows/test-*.yaml, .github/workflows/publish-*.yaml, .github/workflows/template-publish-image/action.yaml, .github/workflows/trivy-scan.yaml
Extended build matrix with data-cache component (linux/amd64, linux/arm64), restricted mlx-runtime to linux/amd64 only, updated Kubernetes test versions (1.31.0 → 1.34.0), added Kubeflow SDK installation with docker extras, added new notebooks to E2E tests (local-training-mnist, local-container-mnist), added repository-scoped gating for Go jobs
GitHub Workflows – New Features
.github/workflows/check-pr-title.yaml, .github/workflows/gh-workflow-approve.yaml, .github/workflows/test-e2e-gpu.yaml, .github/workflows/test-rust.yaml, .github/workflows/trivy-scan.yaml, .github/workflows/publish-helm-charts.yaml
Introduced semantic PR title validation (conventional commits), implemented workflow approval automation for Kubeflow members and labeled PRs, added GPU-enabled E2E test workflow with label gating and optional cluster cleanup, added Rust unit/integration test support, introduced Trivy vulnerability scanning, added Helm chart publication to GHCR (OCI registry)
Documentation & Governance
README.md, ROADMAP.md, SECURITY.md, CONTRIBUTING.md, CHANGELOG.md, ADOPTERS.md, OWNERS
Updated README with Latest News section, replaced badges, expanded overview descriptions, added Kubeflow Trainer-specific terminology; added 2025 roadmap section; introduced security policy with versioning and vulnerability handling; restructured CONTRIBUTING guide with coding style section and conventional commit guidance; added v2.0.1 release notes; added Red Hat adopter entry; promoted astefanutti from reviewer to approver
Build Automation
Makefile
Bumped Kubernetes/controller toolchain versions (K8S_VERSION 1.32.0→1.34.0, ENVTEST_VERSION release-0.20→release-0.22, CONTROLLER_GEN_VERSION v0.17.2→v0.18.0), added golangci-lint-kal target with KubeAPI linting, introduced Volcano CRD support and targets, expanded manifests generation for config/v1alpha1, added test-rust and test-e2e-setup-gpu-cluster targets, broadened test exclusion filters
Python API – Version & Models Index
api/python_api/kubeflow_trainer_api/__init__.py, api/python_api/kubeflow_trainer_api/models/__init__.py
Bumped package version 2.0.0→2.1.0, added 15+ new model imports (container restart rules, extended resource handling, pod certificates, file key selector, volcano pod group policy, pod template overrides, network topology)
Python API – Kubernetes Core Models
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_*.py
Added container restart policy support (ContainerRestartRule, OnExitCodes), introduced extended resource claim handling (ContainerExtendedResourceRequest, PodExtendedResourceClaimStatus), added file-based environment variable sourcing (FileKeySelector), added pod certificate projection support, added node swap status tracking, updated pod spec with hostnameOverride field, added observed_generation and extended resource claim status to pod status, updated field descriptions to remove beta feature gate references and clarify semantics
Python API – Batch & Scheduling Models
api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_*.py, api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_*.py, api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_*.py
Shortened descriptions for backoffLimit, backoffLimitPerIndex, maxFailedIndexes to remove feature gate references, added message pattern matching to failure policy rules, changed JobSet restarts default from None→0, added groupName field to ReplicatedJob, introduced NetworkTopologySpec for network topology constraints
Python API – Trainer Models
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_*.py
Replaced PodSpecOverride with PodTemplateOverride (adds metadata handling), added Volcano pod group policy support (VolcanoPodGroupPolicySource), added pod template metadata overrides, extended pod spec overrides with affinity/annotations/imagePullSecrets/labels, updated field descriptions across all trainer models for clarity and consistency, made TrainingRuntimeSpec template field optional
Python API – Configuration
api/python_api/pyproject.toml
Simplified license declaration from table format to string, removed wheel build targets configuration
Helm Chart
charts/kubeflow-trainer/Chart.yaml, charts/kubeflow-trainer/README.md
Bumped chart version 2.0.0→2.1.0, updated jobset dependency (oci://registry.k8s.io/jobset/charts/jobset v0.8.2→oci://registry.k8s.io/jobset/charts 0.10.1), updated installation instructions to use OCI-based Helm chart from GHCR, added manager security context and controller configuration defaults, expanded documented values with new manager fields

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Python API Model Updates (api/python_api/kubeflow_trainer_api/models/): Review the consistency of serialization/deserialization logic (to_dict, from_dict) across ~50 model files; verify alias handling and None-field exclusion patterns are applied correctly
  • Kubernetes Version & Tooling Bumps (Makefile, workflows): Confirm compatibility of controller-gen v0.18.0, envtest release-0.22, and K8S 1.34.0 with existing code generation and manifests
  • Pod Template Override Refactoring (trainer_v1alpha1_pod_template_override.py, trainer_v1alpha1_train_job_spec.py): Ensure the transition from PodSpecOverride to PodTemplateOverride is backward compatible and properly handles metadata merging
  • Workflow Approval Automation (.github/workflows/gh-workflow-approve.yaml): Verify membership check logic and workflow run approval API interactions are sound and handle edge cases (404s, non-members)
  • GPU E2E Workflow (.github/workflows/test-e2e-gpu.yaml): Confirm label-gating, cluster setup/teardown, and artifact handling work correctly with Kind GPU setup

🐰 With whiskers twitching in delight,
New models hop and schemas take flight,
Volcano rumbles, GPU clusters gleam,
Version 2.1 fulfills the training dream!
Workflows approve, the tests run so bright—
Kubeflow Trainer bounds forward tonight! 🚀

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'V2.1.0 odh rebase' refers to rebasing the repository to Trainer v2.1.0, which is the main objective described in the PR. It is specific and directly related to the core purpose of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 23

🧹 Nitpick comments (6)
api/python_api/pyproject.toml (1)

35-36: Consider pinning hatchling version for reproducible builds.

The build-system requirement specifies hatchling without a version constraint, which could lead to non-deterministic builds if breaking changes are introduced in future releases. For a reproducible build pipeline, consider pinning to a specific version or a narrow version range.

Example: requires = ["hatchling>=1.0,<2.0"] or similar.

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_cluster_training_runtime.py (1)

33-34: Documentation descriptions are less informative after the upstream rebase.

The field descriptions have been simplified:

  • metadata: Previously "Standard object's metadata.", now "metadata of the ClusterTrainingRuntime."
  • spec: Previously "Specification of the desired ClusterTrainingRuntime.", now "spec of the ClusterTrainingRuntime."

Since this is auto-generated code from the OpenAPI specification, the less descriptive text is coming from the upstream v2.1.0 spec. If more detailed descriptions would benefit API consumers, consider proposing improvements to the upstream OpenAPI definition in the Kubeflow Trainer repository.

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py (1)

30-30: Remove unused noqa directive.

Static analysis flagged the # noqa: E501 on this line as unused. Since this is generated code, you may want to verify with your code generator configuration or clean up manually if applicable.

Apply this diff to remove the unused directive:

-    ModelInitializer represents the desired configuration to initialize pre-trained model. The ModelInitializer spec will override the runtime Job template which contains this label: `trainer.kubeflow.org/trainjob-ancestor-step: dataset-initializer`
-    """ # noqa: E501
+    ModelInitializer represents the desired configuration to initialize pre-trained model. The ModelInitializer spec will override the runtime Job template which contains this label: `trainer.kubeflow.org/trainjob-ancestor-step: dataset-initializer`
+    """
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py (1)

30-30: Remove unused noqa directive.

The # noqa: E501 directive is unnecessary here as the line length doesn't trigger E501.

Apply this diff:

-    """ # noqa: E501
+    """

Note: Since this is an auto-generated file, consider whether the fix should be applied to the OpenAPI generator template instead.

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (2)

25-31: Consider adding field validation constraints.

The field descriptions mention specific constraints that aren't enforced in the model:

  • operator: Should be "In" or "NotIn"
  • values: Maximum 255 elements allowed

While server-side validation may handle these constraints, adding client-side validation would provide earlier error detection.

Example validation with Pydantic v2 features:

+from typing import Literal
+from pydantic import field_validator

 class IoK8sApiCoreV1ContainerRestartRuleOnExitCodes(BaseModel):
     """
     ContainerRestartRuleOnExitCodes describes the condition for handling an exited container based on its exit codes.
     """ # noqa: E501
-    operator: StrictStr = Field(description="Represents the relationship between the container exit code(s) and the specified values. Possible values are: - In: the requirement is satisfied if the container exit code is in the   set of specified values. - NotIn: the requirement is satisfied if the container exit code is   not in the set of specified values.")
+    operator: Literal["In", "NotIn"] = Field(description="Represents the relationship between the container exit code(s) and the specified values. Possible values are: - In: the requirement is satisfied if the container exit code is in the   set of specified values. - NotIn: the requirement is satisfied if the container exit code is   not in the set of specified values.")
     values: Optional[List[StrictInt]] = Field(default=None, description="Specifies the set of values to check for container exit codes. At most 255 elements are allowed.")
     __properties: ClassVar[List[str]] = ["operator", "values"]
+    
+    @field_validator('values')
+    @classmethod
+    def validate_values_length(cls, v):
+        if v is not None and len(v) > 255:
+            raise ValueError('values list cannot exceed 255 elements')
+        return v

54-72: Simplify to_dict() by removing unused excluded_fields.

The excluded_fields set is defined but remains empty and unused. This appears to be generated template code.

Apply this diff to simplify:

     def to_dict(self) -> Dict[str, Any]:
         """Return the dictionary representation of the model using alias.

         This has the following differences from calling pydantic's
         `self.model_dump(by_alias=True)`:

         * `None` is only added to the output dict for nullable fields that
           were set at model initialization. Other fields with value `None`
           are ignored.
         """
-        excluded_fields: Set[str] = set([
-        ])
-
-        _dict = self.model_dump(
+        return self.model_dump(
             by_alias=True,
-            exclude=excluded_fields,
             exclude_none=True,
         )
-        return _dict

Note: If this pattern is needed for consistency with other generated models in the codebase, it's acceptable to leave as-is.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0c78f5 and c156297.

⛔ Files ignored due to path filters (6)
  • docs/images/trainer-tech-stack.drawio.svg is excluded by !**/*.svg
  • docs/proposals/2437-volcano-scheduler/user-roles-scheduler.drawio.svg is excluded by !**/*.svg
  • docs/proposals/2442-jax-runtime-trainer-v2/drawing.drawio.svg is excluded by !**/*.svg
  • go.sum is excluded by !**/*.sum
  • pkg/data_cache/Cargo.lock is excluded by !**/*.lock
  • pkg/data_cache/test/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (107)
  • .flake8 (1 hunks)
  • .gitattributes (1 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.yaml (1 hunks)
  • .github/workflows/build-and-push-images.yaml (2 hunks)
  • .github/workflows/check-pr-title.yaml (1 hunks)
  • .github/workflows/gh-workflow-approve.yaml (1 hunks)
  • .github/workflows/publish-helm-charts.yaml (1 hunks)
  • .github/workflows/template-publish-image/action.yaml (1 hunks)
  • .github/workflows/test-e2e-gpu.yaml (1 hunks)
  • .github/workflows/test-e2e.yaml (3 hunks)
  • .github/workflows/test-go.yaml (2 hunks)
  • .github/workflows/test-rust.yaml (1 hunks)
  • .github/workflows/trivy-scan.yaml (1 hunks)
  • .gitignore (1 hunks)
  • .golangci-kal.yml (1 hunks)
  • .pre-commit-config.yaml (1 hunks)
  • ADOPTERS.md (1 hunks)
  • CHANGELOG.md (4 hunks)
  • CONTRIBUTING.md (1 hunks)
  • Makefile (7 hunks)
  • OWNERS (1 hunks)
  • README.md (3 hunks)
  • ROADMAP.md (1 hunks)
  • SECURITY.md (1 hunks)
  • api/python_api/kubeflow_trainer_api/__init__.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/__init__.py (7 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py (3 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_spec.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_status.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_pod_failure_policy_rule.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_success_policy.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (6 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py (3 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoint_address.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoint_port.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoint_subset.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoints.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoints_list.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_from_source.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var_source.py (4 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container.py (6 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container_common.py (6 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_glusterfs_volume_source.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_lifecycle.py (3 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_system_info.py (4 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_persistent_volume_claim_spec.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_persistent_volume_claim_status.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_persistent_volume_spec.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_affinity_term.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_anti_affinity.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_condition.py (3 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_spec.py (3 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_status.py (4 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_replication_controller_spec.py (2 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_quota_spec.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_requirements.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_scoped_resource_selector_requirement.py (2 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_service_spec.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_taint.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_topology_spread_constraint.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume_projection.py (4 hunks)
  • api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume_source.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_failure_policy_rule.py (2 hunks)
  • api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_job_set_status.py (2 hunks)
  • api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_replicated_job.py (2 hunks)
  • api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_cluster_training_runtime.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_container_override.py (2 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_coscheduling_pod_group_policy_source.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_dataset_initializer.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_initializer.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_job_set_template_spec.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_job_status.py (2 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy_source.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_mpiml_policy_source.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py (3 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py (3 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_spec_override.py (4 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_spec_override_target_job.py (2 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_runtime_ref.py (2 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_elastic_policy.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_ml_policy_source.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_spec.py (4 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_status.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_trainer.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime_spec.py (1 hunks)
  • api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (1 hunks)
  • api/python_api/pyproject.toml (1 hunks)
  • charts/kubeflow-trainer/Chart.yaml (1 hunks)
  • charts/kubeflow-trainer/README.md (3 hunks)
⛔ Files not processed due to max files limit (51)
  • charts/kubeflow-trainer/README.md.gotmpl
  • charts/kubeflow-trainer/crds/trainer.kubeflow.org_clustertrainingruntimes.yaml
  • charts/kubeflow-trainer/crds/trainer.kubeflow.org_trainingruntimes.yaml
  • charts/kubeflow-trainer/templates/_helpers.tpl
  • charts/kubeflow-trainer/templates/manager/_helpers.tpl
  • charts/kubeflow-trainer/templates/manager/configmap.yaml
  • charts/kubeflow-trainer/templates/manager/deployment.yaml
  • charts/kubeflow-trainer/templates/rbac/clusterrole.yaml
  • charts/kubeflow-trainer/tests/manager/configmap_test.yaml
  • charts/kubeflow-trainer/values.yaml
  • cmd/data_cache/Dockerfile
  • cmd/initializers/dataset/Dockerfile
  • cmd/initializers/dataset/requirements.txt
  • cmd/initializers/model/Dockerfile
  • cmd/initializers/model/requirements.txt
  • cmd/runtimes/deepspeed/Dockerfile
  • cmd/runtimes/deepspeed/requirements.txt
  • cmd/runtimes/mlx/Dockerfile
  • cmd/runtimes/mlx/requirements.txt
  • cmd/trainer-controller-manager/main.go
  • cmd/trainers/torchtune/Dockerfile
  • cmd/trainers/torchtune/requirements.txt
  • docs/proposals/2170-kubeflow-trainer-v2/README.md
  • docs/proposals/2401-llm-trainer-v2/README.md
  • docs/proposals/2432-gpu-testing-on-llm-blueprints/OCI VM/bootstrap.sh
  • docs/proposals/2432-gpu-testing-on-llm-blueprints/OCI VM/main.tf
  • docs/proposals/2432-gpu-testing-on-llm-blueprints/OCI VM/terraform.tfvars
  • docs/proposals/2432-gpu-testing-on-llm-blueprints/README.md
  • docs/proposals/2437-volcano-scheduler/README.md
  • docs/proposals/2442-jax-runtime-trainer-v2/README.md
  • docs/release/README.md
  • examples/local/local-container-mnist.ipynb
  • examples/local/local-training-mnist.ipynb
  • examples/mlx/language-modeling/fine-tune-llama.ipynb
  • examples/pytorch/data-cache/fine-tune-with-cache.ipynb
  • examples/pytorch/image-classification/mnist.ipynb
  • examples/pytorch/question-answering/fine-tune-distilbert.ipynb
  • examples/torchtune/llama3_2/alpaca-trainjob-yaml.ipynb
  • examples/torchtune/qwen2_5/qwen2.5-1.5B-with-alpaca.ipynb
  • go.mod
  • hack/.custom-gcl.yaml
  • hack/data_cache/run_with_remote_table.sh
  • hack/e2e-run-notebook.sh
  • hack/e2e-setup-cluster.sh
  • hack/e2e-setup-gpu-cluster.sh
  • hack/python-api/gen-api.sh
  • hack/scripts/load-image-to-kind.sh
  • hack/swagger/main.go
  • hack/update-codegen.sh
  • hack/violation_exception_v1alpha1.list
  • manifests/base/crds/trainer.kubeflow.org_clustertrainingruntimes.yaml
🧰 Additional context used
🧬 Code graph analysis (44)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume_projection.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py (1)
  • IoK8sApiCoreV1PodCertificateProjection (25-95)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (2)
  • to_dict (55-76)
  • from_dict (79-91)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container.py (3)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (3)
  • IoK8sApiCoreV1ContainerRestartRule (26-91)
  • to_dict (55-76)
  • from_dict (79-91)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (2)
  • to_dict (108-193)
  • from_dict (196-231)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (2)
  • to_dict (54-72)
  • from_dict (75-87)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_initializer.py (2)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_dataset_initializer.py (1)
  • TrainerV1alpha1DatasetInitializer (27-101)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py (1)
  • TrainerV1alpha1ModelInitializer (27-101)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (1)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (2)
  • to_dict (58-89)
  • from_dict (92-105)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_cluster_training_runtime.py (1)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime_spec.py (1)
  • TrainerV1alpha1TrainingRuntimeSpec (28-101)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var_source.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (3)
  • IoK8sApiCoreV1FileKeySelector (25-91)
  • to_dict (56-74)
  • from_dict (77-91)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)
  • IoK8sApiCoreV1EnvVar (26-93)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime.py (1)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime_spec.py (1)
  • TrainerV1alpha1TrainingRuntimeSpec (28-101)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py (2)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_mpiml_policy_source.py (1)
  • TrainerV1alpha1MPIMLPolicySource (25-91)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_ml_policy_source.py (1)
  • TrainerV1alpha1TorchMLPolicySource (27-95)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy_source.py (2)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_mpiml_policy_source.py (1)
  • TrainerV1alpha1MPIMLPolicySource (25-91)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_ml_policy_source.py (1)
  • TrainerV1alpha1TorchMLPolicySource (27-95)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_lifecycle.py (1)
  • stop_signal_validate_enum (36-43)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_glusterfs_volume_source.py (1)
  • IoK8sApiCoreV1GlusterfsVolumeSource (25-89)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container_common.py (4)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (3)
  • IoK8sApiCoreV1ContainerRestartRule (26-91)
  • to_dict (55-76)
  • from_dict (79-91)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_from_source.py (3)
  • IoK8sApiCoreV1EnvFromSource (27-97)
  • to_dict (57-81)
  • from_dict (84-97)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (2)
  • to_dict (108-193)
  • from_dict (196-231)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container.py (2)
  • to_dict (109-194)
  • from_dict (197-233)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (2)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)
  • from_dict (80-93)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (1)
  • from_dict (77-91)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (2)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (6)
  • TrainerV1alpha1PodTemplateOverrideTargetJob (25-85)
  • to_str (39-41)
  • to_json (43-46)
  • to_dict (53-71)
  • from_json (49-51)
  • from_dict (74-85)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (6)
  • TrainerV1alpha1PodTemplateSpecOverride (31-152)
  • to_str (53-55)
  • to_json (57-60)
  • to_dict (67-130)
  • from_json (63-65)
  • from_dict (133-152)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py (2)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (3)
  • TrainerV1alpha1VolcanoPodGroupPolicySource (26-89)
  • to_dict (54-75)
  • from_dict (78-89)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py (2)
  • to_dict (56-80)
  • from_dict (83-95)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_ml_policy_source.py (1)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_elastic_policy.py (1)
  • TrainerV1alpha1TorchElasticPolicy (26-99)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (6)
  • IoK8sApiCoreV1ContainerRestartRuleOnExitCodes (25-87)
  • to_str (40-42)
  • to_json (44-47)
  • to_dict (54-72)
  • from_json (50-52)
  • from_dict (75-87)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (2)
api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (6)
  • SchedulingV1beta1NetworkTopologySpec (25-87)
  • to_str (40-42)
  • to_json (44-47)
  • to_dict (54-72)
  • from_json (50-52)
  • from_dict (75-87)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py (5)
  • to_str (42-44)
  • to_json (46-49)
  • to_dict (56-80)
  • from_json (52-54)
  • from_dict (83-95)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_status.py (1)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_job_status.py (1)
  • TrainerV1alpha1JobStatus (25-95)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py (2)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_spec.py (2)
  • to_dict (92-122)
  • from_dict (125-151)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py (2)
  • to_dict (83-134)
  • from_dict (137-167)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_spec_override.py (2)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_container_override.py (3)
  • TrainerV1alpha1ContainerOverride (27-105)
  • to_dict (57-89)
  • from_dict (92-105)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (2)
  • to_dict (67-130)
  • from_dict (133-152)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (3)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (3)
  • IoK8sApiCoreV1ContainerRestartRule (26-91)
  • to_dict (55-76)
  • from_dict (79-91)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_from_source.py (3)
  • IoK8sApiCoreV1EnvFromSource (27-97)
  • to_dict (57-81)
  • from_dict (84-97)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container.py (2)
  • to_dict (109-194)
  • from_dict (197-233)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_status.py (2)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_condition.py (1)
  • IoK8sApiCoreV1PodCondition (26-98)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py (1)
  • IoK8sApiCoreV1PodExtendedResourceClaimStatus (26-95)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_system_info.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py (1)
  • IoK8sApiCoreV1NodeSwapStatus (25-85)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (6)
  • IoK8sApiCoreV1ContainerExtendedResourceRequest (25-89)
  • to_str (41-43)
  • to_json (45-48)
  • to_dict (55-73)
  • from_json (51-53)
  • from_dict (76-89)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py (2)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (3)
  • to_dict (53-71)
  • from_json (49-51)
  • from_dict (74-85)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_spec.py (3)
  • to_dict (64-98)
  • from_json (60-62)
  • from_dict (101-119)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py (2)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (3)
  • TrainerV1alpha1VolcanoPodGroupPolicySource (26-89)
  • to_dict (54-75)
  • from_dict (78-89)
api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (2)
  • to_dict (54-72)
  • from_dict (75-87)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var_source.py (1)
  • from_dict (98-113)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume_source.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_glusterfs_volume_source.py (1)
  • IoK8sApiCoreV1GlusterfsVolumeSource (25-89)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (2)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume.py (1)
  • IoK8sApiCoreV1Volume (55-265)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_container_override.py (6)
  • TrainerV1alpha1ContainerOverride (27-105)
  • to_str (43-45)
  • to_json (47-50)
  • to_dict (57-89)
  • from_json (53-55)
  • from_dict (92-105)
api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (1)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (2)
  • to_dict (54-75)
  • from_dict (78-89)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_spec.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_success_policy.py (1)
  • IoK8sApiBatchV1SuccessPolicy (26-93)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_lifecycle.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py (2)
  • stop_signal_validate_enum (53-60)
  • from_dict (137-167)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job.py (2)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_spec.py (1)
  • TrainerV1alpha1TrainJobSpec (29-119)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_status.py (1)
  • TrainerV1alpha1TrainJobStatus (27-103)
api/python_api/kubeflow_trainer_api/models/__init__.py (12)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (1)
  • IoK8sApiCoreV1ContainerExtendedResourceRequest (25-89)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (1)
  • IoK8sApiCoreV1ContainerRestartRule (26-91)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (1)
  • IoK8sApiCoreV1ContainerRestartRuleOnExitCodes (25-87)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (1)
  • IoK8sApiCoreV1FileKeySelector (25-91)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py (1)
  • IoK8sApiCoreV1NodeSwapStatus (25-85)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py (1)
  • IoK8sApiCoreV1PodCertificateProjection (25-95)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py (1)
  • IoK8sApiCoreV1PodExtendedResourceClaimStatus (26-95)
api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (1)
  • SchedulingV1beta1NetworkTopologySpec (25-87)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (1)
  • TrainerV1alpha1PodTemplateOverride (28-105)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (1)
  • TrainerV1alpha1PodTemplateOverrideTargetJob (25-85)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (1)
  • TrainerV1alpha1PodTemplateSpecOverride (31-152)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (1)
  • TrainerV1alpha1VolcanoPodGroupPolicySource (26-89)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_trainer.py (2)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)
  • IoK8sApiCoreV1EnvVar (26-93)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_requirements.py (1)
  • IoK8sApiCoreV1ResourceRequirements (27-122)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_spec.py (2)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (1)
  • IoK8sApiCoreV1Container (36-231)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_requirements.py (1)
  • IoK8sApiCoreV1ResourceRequirements (27-122)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_dataset_initializer.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)
  • IoK8sApiCoreV1EnvVar (26-93)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime_spec.py (3)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py (1)
  • TrainerV1alpha1MLPolicy (27-97)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py (1)
  • TrainerV1alpha1PodGroupPolicy (27-95)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_job_set_template_spec.py (1)
  • TrainerV1alpha1JobSetTemplateSpec (27-95)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_container_override.py (1)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)
  • IoK8sApiCoreV1EnvVar (26-93)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_spec.py (4)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (3)
  • TrainerV1alpha1PodTemplateOverride (28-105)
  • to_dict (58-89)
  • from_dict (92-105)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (2)
  • to_dict (53-71)
  • from_dict (74-85)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (2)
  • to_dict (67-130)
  • from_dict (133-152)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py (2)
  • to_dict (54-72)
  • from_dict (75-87)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py (2)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (5)
  • to_str (41-43)
  • to_json (45-48)
  • to_dict (55-73)
  • from_json (51-53)
  • from_dict (76-89)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (5)
  • to_str (42-44)
  • to_json (46-49)
  • to_dict (56-74)
  • from_json (52-54)
  • from_dict (77-91)
🪛 actionlint (1.7.8)
.github/workflows/test-e2e-gpu.yaml

14-14: label "oracle-vm-16cpu-a10gpu-240gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


87-87: label "oracle-vm-16cpu-a10gpu-240gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 Flake8 (7.3.0)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py

[error] 25-25: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py

[error] 24-24: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py

[error] 23-23: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py

[error] 23-23: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py

[error] 24-24: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py

[error] 24-24: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py

[error] 28-28: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

🪛 LanguageTool
CONTRIBUTING.md

[uncategorized] ~114-~114: Possible missing comma found.
Context: ...res or enhancement for Kubeflow Trainer project we follow the [Kubeflow Enhancement Pro...

(AI_HYDRA_LEO_MISSING_COMMA)

SECURITY.md

[uncategorized] ~19-~19: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... report vulnerabilities to the Kubeflow Open Source Community. All reports are thoroughly i...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[typographical] ~58-~58: Consider adding a comma here.
Context: ...tion Channels For the general questions please join the following resources: - Kubefl...

(PLEASE_COMMA)

README.md

[uncategorized] ~25-~25: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: .../releases/tag/v2.0.0). - [2025/04] From High Performance Computing To AI Workloads on Kubernetes...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

charts/kubeflow-trainer/README.md

[style] ~51-~51: Try moving the adverb to make the sentence clearer.
Context: ...ullnameOverride | string | "jobset" | String to fully override jobset release name. | | commonLabels | object | {} | Com...

(SPLIT_INFINITIVE)

🪛 markdownlint-cli2 (0.18.1)
ROADMAP.md

24-24: Bare URL used

(MD034, no-bare-urls)

CONTRIBUTING.md

136-136: Bare URL used

(MD034, no-bare-urls)


138-138: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.4)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_quota_spec.py

44-44: Avoid specifying long messages outside the exception class

(TRY003)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_scoped_resource_selector_requirement.py

45-45: Avoid specifying long messages outside the exception class

(TRY003)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py

30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py

30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py

59-59: Avoid specifying long messages outside the exception class

(TRY003)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


25-25: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


31-31: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py

24-24: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


23-23: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


29-29: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


23-23: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


29-29: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py

24-24: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


29-29: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py

24-24: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


28-28: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


34-34: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_lifecycle.py

42-42: Avoid specifying long messages outside the exception class

(TRY003)

api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_failure_policy_rule.py

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_from_source.py

30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)


28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: pre-commit

Comment thread .golangci-kal.yml
Comment on lines +63 to +69
exclusions:
generated: strict
rules:
## KAL should only run on API folders.
- path-except: "pkg/apis/*"
linters:
- kubeapilinter
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Exclusion rule is inverted—KAL won't run on API folders.

The exclusion rule contradicts the stated intent. Line 66 says "KAL should only run on API folders," but path-except: "pkg/apis/*" at line 67 excludes kubeapilinter from paths matching pkg/apis/*—the opposite of the goal.

To fix, the exclusion should target paths outside the API folder. Apply this diff:

  issues:
    max-same-issues: 0
+  exclude-rules:
+    - path-except: "pkg/apis/*"
+      linters:
+        - kubeapilinter
-  exclusions:
-    generated: strict
-    rules:
-      ## KAL should only run on API folders.
-      - path-except: "pkg/apis/*"
-        linters:
-          - kubeapilinter

Alternatively, if path-except is not the intended syntax, use path: to explicitly match only API folders:

-  exclusions:
-    generated: strict
-    rules:
-      ## KAL should only run on API folders.
-      - path-except: "pkg/apis/*"
+  exclude-rules:
+    ## KAL should only run on API folders.
+    - path: "^(?!pkg/apis/)"
         linters:
           - kubeapilinter

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
.golangci-kal.yml around lines 63 to 69: the exclusion currently uses
path-except: "pkg/apis/*", which excludes API folders instead of limiting KAL to
them; change the rule to explicitly target only API folders by replacing
path-except with path: "pkg/apis/*" (or invert the pattern so excluded paths are
everything except pkg/apis/*), and keep the linters block the same so
kubeapilinter runs only on pkg/apis/*.

from typing import Any, ClassVar, Dict, List, Optional
from kubeflow_trainer_api.models.io_k8s_api_autoscaling_v2_hpa_scaling_policy import IoK8sApiAutoscalingV2HPAScalingPolicy
from kubeflow_trainer_api.models.io_k8s_apimachinery_pkg_api_resource_quantity import IoK8sApimachineryPkgApiResourceQuantity
from typing import Optional, Set
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate Optional import.

Optional is already imported on line 21. Line 24 should only import Set.

Apply this diff to fix the duplicate import:

-from typing import Optional, Set
+from typing import Set

Note: Since this file is auto-generated (line 11), consider whether this issue originates from the OpenAPI Generator templates and should be addressed there if it's a recurring pattern across generated files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import Optional, Set
from typing import Set
🧰 Tools
🪛 Flake8 (7.3.0)

[error] 24-24: redefinition of unused 'Optional' from line 21

(F811)

🪛 Ruff (0.14.4)

24-24: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

🤖 Prompt for AI Agents
In
api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py
around line 24, there is a duplicate import of Optional (it’s already imported
on line 21); remove Optional from the line and leave only Set so the import
reads only Set, and if this duplication appears across generated files consider
fixing the OpenAPI Generator template that emits duplicate imports.

Comment on lines +34 to +45
backoff_limit: Optional[StrictInt] = Field(default=None, description="Specifies the number of retries before marking this job failed. Defaults to 6, unless backoffLimitPerIndex (only Indexed Job) is specified. When backoffLimitPerIndex is specified, backoffLimit defaults to 2147483647.", alias="backoffLimit")
backoff_limit_per_index: Optional[StrictInt] = Field(default=None, description="Specifies the limit for the number of retries within an index before marking this index as failed. When enabled the number of failures per index is kept in the pod's batch.kubernetes.io/job-index-failure-count annotation. It can only be set when Job's completionMode=Indexed, and the Pod's restart policy is Never. The field is immutable.", alias="backoffLimitPerIndex")
completion_mode: Optional[StrictStr] = Field(default=None, description="completionMode specifies how Pod completions are tracked. It can be `NonIndexed` (default) or `Indexed`. `NonIndexed` means that the Job is considered complete when there have been .spec.completions successfully completed Pods. Each Pod completion is homologous to each other. `Indexed` means that the Pods of a Job get an associated completion index from 0 to (.spec.completions - 1), available in the annotation batch.kubernetes.io/job-completion-index. The Job is considered complete when there is one successfully completed Pod for each index. When value is `Indexed`, .spec.completions must be specified and `.spec.parallelism` must be less than or equal to 10^5. In addition, The Pod name takes the form `$(job-name)-$(index)-$(random-string)`, the Pod hostname takes the form `$(job-name)-$(index)`. More completion modes can be added in the future. If the Job controller observes a mode that it doesn't recognize, which is possible during upgrades due to version skew, the controller skips updates for the Job. Possible enum values: - `\"Indexed\"` is a Job completion mode. In this mode, the Pods of a Job get an associated completion index from 0 to (.spec.completions - 1). The Job is considered complete when a Pod completes for each completion index. - `\"NonIndexed\"` is a Job completion mode. In this mode, the Job is considered complete when there have been .spec.completions successfully completed Pods. Pod completions are homologous to each other.", alias="completionMode")
completions: Optional[StrictInt] = Field(default=None, description="Specifies the desired number of successfully finished pods the job should be run with. Setting to null means that the success of any pod signals the success of all pods, and allows parallelism to have any positive value. Setting to 1 means that parallelism is limited to 1 and the success of that pod signals the success of the job. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/")
managed_by: Optional[StrictStr] = Field(default=None, description="ManagedBy field indicates the controller that manages a Job. The k8s Job controller reconciles jobs which don't have this field at all or the field value is the reserved string `kubernetes.io/job-controller`, but skips reconciling Jobs with a custom value for this field. The value must be a valid domain-prefixed path (e.g. acme.io/foo) - all characters before the first \"/\" must be a valid subdomain as defined by RFC 1123. All characters trailing the first \"/\" must be valid HTTP Path characters as defined by RFC 3986. The value cannot exceed 63 characters. This field is immutable. This field is beta-level. The job controller accepts setting the field when the feature gate JobManagedBy is enabled (enabled by default).", alias="managedBy")
manual_selector: Optional[StrictBool] = Field(default=None, description="manualSelector controls generation of pod labels and pod selectors. Leave `manualSelector` unset unless you are certain what you are doing. When false or unset, the system pick labels unique to this job and appends those labels to the pod template. When true, the user is responsible for picking unique labels and specifying the selector. Failure to pick a unique label may cause this and other jobs to not function correctly. However, You may see `manualSelector=true` in jobs that were created with the old `extensions/v1beta1` API. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector", alias="manualSelector")
max_failed_indexes: Optional[StrictInt] = Field(default=None, description="Specifies the maximal number of failed indexes before marking the Job as failed, when backoffLimitPerIndex is set. Once the number of failed indexes exceeds this number the entire Job is marked as Failed and its execution is terminated. When left as null the job continues execution of all of its indexes and is marked with the `Complete` Job condition. It can only be specified when backoffLimitPerIndex is set. It can be null or up to completions. It is required and must be less than or equal to 10^4 when is completions greater than 10^5. This field is beta-level. It can be used when the `JobBackoffLimitPerIndex` feature gate is enabled (enabled by default).", alias="maxFailedIndexes")
max_failed_indexes: Optional[StrictInt] = Field(default=None, description="Specifies the maximal number of failed indexes before marking the Job as failed, when backoffLimitPerIndex is set. Once the number of failed indexes exceeds this number the entire Job is marked as Failed and its execution is terminated. When left as null the job continues execution of all of its indexes and is marked with the `Complete` Job condition. It can only be specified when backoffLimitPerIndex is set. It can be null or up to completions. It is required and must be less than or equal to 10^4 when is completions greater than 10^5.", alias="maxFailedIndexes")
parallelism: Optional[StrictInt] = Field(default=None, description="Specifies the maximum desired number of pods the job should run at any given time. The actual number of pods running in steady state will be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism), i.e. when the work left to do is less than max parallelism. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/")
pod_failure_policy: Optional[IoK8sApiBatchV1PodFailurePolicy] = Field(default=None, description="Specifies the policy of handling failed pods. In particular, it allows to specify the set of actions and conditions which need to be satisfied to take the associated action. If empty, the default behaviour applies - the counter of failed pods, represented by the jobs's .status.failed field, is incremented and it is checked against the backoffLimit. This field cannot be used in combination with restartPolicy=OnFailure.", alias="podFailurePolicy")
pod_replacement_policy: Optional[StrictStr] = Field(default=None, description="podReplacementPolicy specifies when to create replacement Pods. Possible values are: - TerminatingOrFailed means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed. - Failed means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod. When using podFailurePolicy, Failed is the the only allowed value. TerminatingOrFailed and Failed are allowed values when podFailurePolicy is not in use. This is an beta field. To use this, enable the JobPodReplacementPolicy feature toggle. This is on by default. Possible enum values: - `\"Failed\"` means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod. - `\"TerminatingOrFailed\"` means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed.", alias="podReplacementPolicy")
pod_replacement_policy: Optional[StrictStr] = Field(default=None, description="podReplacementPolicy specifies when to create replacement Pods. Possible values are: - TerminatingOrFailed means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed. - Failed means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod. When using podFailurePolicy, Failed is the the only allowed value. TerminatingOrFailed and Failed are allowed values when podFailurePolicy is not in use. Possible enum values: - `\"Failed\"` means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod. - `\"TerminatingOrFailed\"` means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed.", alias="podReplacementPolicy")
selector: Optional[IoK8sApimachineryPkgApisMetaV1LabelSelector] = Field(default=None, description="A label query over pods that should match the pod count. Normally, the system sets this field for you. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors")
success_policy: Optional[IoK8sApiBatchV1SuccessPolicy] = Field(default=None, description="successPolicy specifies the policy when the Job can be declared as succeeded. If empty, the default behavior applies - the Job is declared as succeeded only when the number of succeeded pods equals to the completions. When the field is specified, it must be immutable and works only for the Indexed Jobs. Once the Job meets the SuccessPolicy, the lingering pods are terminated. This field is beta-level. To use this field, you must enable the `JobSuccessPolicy` feature gate (enabled by default).", alias="successPolicy")
success_policy: Optional[IoK8sApiBatchV1SuccessPolicy] = Field(default=None, description="successPolicy specifies the policy when the Job can be declared as succeeded. If empty, the default behavior applies - the Job is declared as succeeded only when the number of succeeded pods equals to the completions. When the field is specified, it must be immutable and works only for the Indexed Jobs. Once the Job meets the SuccessPolicy, the lingering pods are terminated.", alias="successPolicy")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation updates align with API stabilization; minor typo detected.

Lines 34–45 update field descriptions to remove feature-gate and beta references, clarifying default behaviors for interdependent fields (e.g., backoffLimit/backoffLimitPerIndex interaction). These changes are consistent with Kubernetes v2.1.0 feature stabilization and appropriate for an auto-generated model file.

However, Line 43 contains a typo: "the the only allowed value" should be "the only allowed value".

Please correct the typo on line 43:

-    pod_replacement_policy: Optional[StrictStr] = Field(default=None, description="podReplacementPolicy specifies when to create replacement Pods. Possible values are: - TerminatingOrFailed means that we recreate pods   when they are terminating (has a metadata.deletionTimestamp) or failed. - Failed means to wait until a previously created Pod is fully terminated (has phase   Failed or Succeeded) before creating a replacement Pod.  When using podFailurePolicy, Failed is the the only allowed value. TerminatingOrFailed and Failed are allowed values when podFailurePolicy is not in use.  Possible enum values:  - `\"Failed\"` means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod.  - `\"TerminatingOrFailed\"` means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed.", alias="podReplacementPolicy")
+    pod_replacement_policy: Optional[StrictStr] = Field(default=None, description="podReplacementPolicy specifies when to create replacement Pods. Possible values are: - TerminatingOrFailed means that we recreate pods   when they are terminating (has a metadata.deletionTimestamp) or failed. - Failed means to wait until a previously created Pod is fully terminated (has phase   Failed or Succeeded) before creating a replacement Pod.  When using podFailurePolicy, Failed is the only allowed value. TerminatingOrFailed and Failed are allowed values when podFailurePolicy is not in use.  Possible enum values:  - `\"Failed\"` means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod.  - `\"TerminatingOrFailed\"` means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed.", alias="podReplacementPolicy")
🤖 Prompt for AI Agents
In api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_spec.py
around lines 34 to 45, fix the typo in the pod_replacement_policy Field
description where it currently reads "the the only allowed value" by removing
the duplicate "the" so it reads "the only allowed value"; make this single-word
fix inside the description string and leave all other text unchanged.

Comment on lines +15 to +23
from __future__ import annotations
import pprint
import re # noqa: F401
import json

from pydantic import BaseModel, ConfigDict, Field, StrictInt, StrictStr
from typing import Any, ClassVar, Dict, List, Optional
from typing import Optional, Set
from typing_extensions import Self
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clean up import issues in the code generator.

The import section has several issues flagged by static analysis:

  • Line 17: re is imported but never used
  • Line 22: Optional is imported twice (duplicate from line 21)
  • Lines 12, 17, 28: Unused noqa directives

Since this is auto-generated code (line 11 states "Do not edit the class manually"), these issues should be addressed in the OpenAPI code generator configuration or templates.

Apply this diff to manually fix the imports:

 from __future__ import annotations
 import pprint
-import re  # noqa: F401
 import json

 from pydantic import BaseModel, ConfigDict, Field, StrictInt, StrictStr
 from typing import Any, ClassVar, Dict, List, Optional
-from typing import Optional, Set
+from typing import Set
 from typing_extensions import Self
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from __future__ import annotations
import pprint
import re # noqa: F401
import json
from pydantic import BaseModel, ConfigDict, Field, StrictInt, StrictStr
from typing import Any, ClassVar, Dict, List, Optional
from typing import Optional, Set
from typing_extensions import Self
from __future__ import annotations
import pprint
import json
from pydantic import BaseModel, ConfigDict, Field, StrictInt, StrictStr
from typing import Any, ClassVar, Dict, List, Optional
from typing import Set
from typing_extensions import Self
🧰 Tools
🪛 Flake8 (7.3.0)

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

🪛 Ruff (0.14.4)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

🤖 Prompt for AI Agents
In
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py
around lines 15 to 23, the import block contains unused/duplicate imports and
unnecessary noqa directives; remove the unused "re" import, delete the duplicate
"Optional" import, and remove the redundant "# noqa: F401" and any other unused
noqa comments so only required imports remain, and update the OpenAPI generator
templates/config to prevent reintroducing these unused imports in generated
code.

Comment on lines +44 to +47
def to_json(self) -> str:
"""Returns the JSON representation of the model using alias"""
# TODO: pydantic v2: use .model_dump_json(by_alias=True, exclude_unset=True) instead
return json.dumps(self.to_dict())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Address the TODO: Use Pydantic v2's model_dump_json().

The TODO comment suggests using .model_dump_json(), and since Pydantic v2.10.0 is already in use, this should be implemented. The current approach of calling json.dumps(self.to_dict()) is less efficient than the native Pydantic v2 method.

Apply this diff to use the Pydantic v2 method:

     def to_json(self) -> str:
         """Returns the JSON representation of the model using alias"""
-        # TODO: pydantic v2: use .model_dump_json(by_alias=True, exclude_unset=True) instead
-        return json.dumps(self.to_dict())
+        return self.model_dump_json(by_alias=True, exclude_none=True)

Note: Using exclude_none=True to match the behavior of to_dict() (line 70).

🤖 Prompt for AI Agents
In
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py
around lines 44 to 47, replace the manual json.dumps(self.to_dict()) with
Pydantic v2's native JSON exporter: call self.model_dump_json(by_alias=True,
exclude_none=True) (include any additional flags like exclude_unset=True if you
want to preserve previous unset-exclusion behavior), so the method returns the
model_dump_json result instead of json.dumps(self.to_dict()).

Comment on lines +22 to +24
from kubeflow_trainer_api.models.scheduling_v1beta1_network_topology_spec import SchedulingV1beta1NetworkTopologySpec
from typing import Optional, Set
from typing_extensions import Self
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the duplicate Optional import.

Line 23 redefines Optional, which was already imported from typing on line 21.

Apply this diff:

 from kubeflow_trainer_api.models.scheduling_v1beta1_network_topology_spec import SchedulingV1beta1NetworkTopologySpec
-from typing import Optional, Set
+from typing import Set
 from typing_extensions import Self
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from kubeflow_trainer_api.models.scheduling_v1beta1_network_topology_spec import SchedulingV1beta1NetworkTopologySpec
from typing import Optional, Set
from typing_extensions import Self
from kubeflow_trainer_api.models.scheduling_v1beta1_network_topology_spec import SchedulingV1beta1NetworkTopologySpec
from typing import Set
from typing_extensions import Self
🧰 Tools
🪛 Flake8 (7.3.0)

[error] 23-23: redefinition of unused 'Optional' from line 21

(F811)

🪛 Ruff (0.14.4)

23-23: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

🤖 Prompt for AI Agents
In
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py
around lines 22 to 24, the second import reintroduces Optional which is already
imported earlier; remove the duplicate by changing the line to only import the
remaining needed symbol(s) (e.g., replace "from typing import Optional, Set"
with "from typing import Set") or delete the redundant Optional so Optional is
imported only once.

Comment thread CONTRIBUTING.md
### Kubeflow Enhancement Proposal (KEP)

Apply the job:
For any significant features or enhancement for Kubeflow Trainer project we follow the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add missing comma in sentence.

Line 114 reads: "...enhancement for Kubeflow Trainer project we follow the..." which needs a comma after "project".

Apply this diff:

-For any significant features or enhancement for Kubeflow Trainer project we follow the
+For any significant features or enhancement for Kubeflow Trainer project, we follow the
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
For any significant features or enhancement for Kubeflow Trainer project we follow the
For any significant features or enhancement for Kubeflow Trainer project, we follow the
🧰 Tools
🪛 LanguageTool

[uncategorized] ~114-~114: Possible missing comma found.
Context: ...res or enhancement for Kubeflow Trainer project we follow the [Kubeflow Enhancement Pro...

(AI_HYDRA_LEO_MISSING_COMMA)

🤖 Prompt for AI Agents
CONTRIBUTING.md around line 114: the sentence "For any significant features or
enhancement for Kubeflow Trainer project we follow the" is missing a comma after
"project"; insert a comma immediately after "project" so it reads "...Kubeflow
Trainer project, we follow the" to correct the punctuation.

Comment thread CONTRIBUTING.md
Comment on lines 136 to +138
Changes to the Kubeflow Trainer Python SDK can be made in the https://github.com/kubeflow/sdk repo.

The Trainer SDK can be found at https://github.com/kubeflow/sdk/tree/main/python/kubeflow/trainer.
The Trainer SDK can be found at https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Convert bare URLs to markdown links.

Lines 136 and 138 contain bare URLs that should be formatted as markdown links for consistency and improved readability.

Apply this diff:

-The Trainer SDK can be found at https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer.
+The Trainer SDK can be found at [https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer](https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer).

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

136-136: Bare URL used

(MD034, no-bare-urls)


138-138: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In CONTRIBUTING.md around lines 136 to 138, the two bare URLs should be
converted to markdown links; replace the plain URLs with descriptive
markdown-style links (for example, link the repo URL using text like "Kubeflow
SDK repo" and link the trainer path using text like "Trainer SDK") so both lines
read with [Link Text](URL) syntax for consistency and readability.

Comment thread SECURITY.md
## Reporting a Vulnerability

We're extremely grateful for security researchers and users that report vulnerabilities to the
Kubeflow Open Source Community. All reports are thoroughly investigated by Kubeflow projects owners.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use hyphen for compound adjective.

"Kubeflow Open Source Community" should be "Kubeflow Open-Source Community" when used as a compound adjective before a noun.

Apply this diff:

-We're extremely grateful for security researchers and users that report vulnerabilities to the Kubeflow Open Source Community. All reports are thoroughly investigated by Kubeflow projects owners.
+We're extremely grateful for security researchers and users that report vulnerabilities to the Kubeflow Open-Source Community. All reports are thoroughly investigated by Kubeflow projects owners.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Kubeflow Open Source Community. All reports are thoroughly investigated by Kubeflow projects owners.
Kubeflow Open-Source Community. All reports are thoroughly investigated by Kubeflow projects owners.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~19-~19: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... report vulnerabilities to the Kubeflow Open Source Community. All reports are thoroughly i...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🤖 Prompt for AI Agents
In SECURITY.md around line 19, the phrase "Kubeflow Open Source Community" used
as a compound adjective should be hyphenated; update the text on line 19 to read
"Kubeflow Open-Source Community" (preserve capitalization and surrounding
punctuation) so the compound adjective is correct.

Comment thread SECURITY.md

## Communication Channels

For the general questions please join the following resources:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add comma for readability.

Line 58 needs a comma after "For the general questions".

Apply this diff:

-For the general questions please join the following resources:
+For general questions, please join the following resources:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
For the general questions please join the following resources:
For general questions, please join the following resources:
🧰 Tools
🪛 LanguageTool

[typographical] ~58-~58: Consider adding a comma here.
Context: ...tion Channels For the general questions please join the following resources: - Kubefl...

(PLEASE_COMMA)

🤖 Prompt for AI Agents
In SECURITY.md around line 58, the sentence "For the general questions please
join the following resources:" is missing a comma; add a comma after "For the
general questions" so it reads "For the general questions, please join the
following resources:" to improve readability.

@efazal
Copy link
Copy Markdown

efazal commented Nov 11, 2025

/lgtm

@kapil27 kapil27 deleted the branch opendatahub-io:main November 12, 2025 17:14
@kapil27 kapil27 closed this Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.