V2.1.0 odh rebase by abhijeet-dhumal · Pull Request #16 · opendatahub-io/trainer

abhijeet-dhumal · 2025-11-11T05:21:29Z

What this PR does / why we need it:
RHOAIENG-38137
Upstream Trainer v2 operator has latest release v2.1 - kubeflow#2859
This PR is to follow the upgrade and rebase midstream/downstream.

Checklist:

Docs included if any changes are user facing

Summary by CodeRabbit

Release Notes

New Features
- Added GPU-enabled end-to-end testing workflow
- Added Rust testing support
- Added Helm chart OCI-based installation and publishing
- Introduced Volcano PodGroup policy support for pod scheduling
- Added pod template override capabilities and container restart rule configurations to Python API
Documentation
- Published security policy and vulnerability reporting guidelines
- Expanded contributor guide with coding style standards
- Added 2025 roadmap milestones
Updates
- Updated Kubernetes test versions and tool dependencies
- Bumped Python API and Helm chart versions to 2.1.0
- Added new notebook-based training examples to E2E tests

* fix(docs): convert commits to list in changelog.py for compatibility Signed-off-by: kramaranya <kramaranya15@gmail.com> * chore(docs): add Changelog for Trainer v2.0.0-rc.0 Signed-off-by: kramaranya <kramaranya15@gmail.com> --------- Signed-off-by: kramaranya <kramaranya15@gmail.com>

…nShift (kubeflow#2682) Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

…#2685) * chore(runtime): Bump Torch to 2.7.1 and DeepSpeed to 0.17.1 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Update cuda to 12.8 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

…w#2382) * Add the manifests overlay for Kubeflow Training V2 Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com> Signed-off-by: Xinmin Du <2812493086@qq.com> * Update manifest: adjust permissions, and format changes Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com> Signed-off-by: Xinmin Du <2812493086@qq.com> * Update manifest: rename overlay, adjust event permissions Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com> Signed-off-by: Xinmin Du <2812493086@qq.com> * Update manifest: make namespace configurable Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com> Signed-off-by: Xinmin Du <2812493086@qq.com> * Update manifest: move standalone, only-manager installation in namespace: kubeflow-system Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com> Signed-off-by: Xinmin Du <2812493086@qq.com> * Update manifest: add overlay for Kubeflow Platform installation Signed-off-by: Xinmin Du <2812493086@qq.com> * add permission for pods log read & rm persistentvolumeclaims Signed-off-by: Xinmin Du <2812493086@qq.com> * create the runtimes before the webhooks Signed-off-by: Xinmin Du <2812493086@qq.com> * Specify sorting order: fifo Signed-off-by: Xinmin Du <2812493086@qq.com> * Deploy jobset first Signed-off-by: Xinmin Du <2812493086@qq.com> * remove edit permissions to runtimes; install runtimes after crds Signed-off-by: Xinmin Du <2812493086@qq.com> * remove pretraining directory Signed-off-by: Xinmin Du <2812493086@qq.com> * patch runtimes images Signed-off-by: Xinmin Du <2812493086@qq.com> * fix: correct image Signed-off-by: Xinmin Du <2812493086@qq.com> * add image patch for more runtimes Signed-off-by: Xinmin Du <2812493086@qq.com> * Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Du Xinmin <2812493086@qq.com> * Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Du Xinmin <2812493086@qq.com> * Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Du Xinmin <2812493086@qq.com> * Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Du Xinmin <2812493086@qq.com> * Update manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Du Xinmin <2812493086@qq.com> * role_bind for notebook & profile Signed-off-by: Xinmin Du <2812493086@qq.com> * fix: reorder images Signed-off-by: Xinmin Du <2812493086@qq.com> * fix: reuse overlay/manager & runtimes Signed-off-by: Xinmin Du <2812493086@qq.com> * fix: remove namespace with patch Signed-off-by: Xinmin Du <2812493086@qq.com> --------- Signed-off-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com> Signed-off-by: Xinmin Du <2812493086@qq.com> Signed-off-by: Du Xinmin <2812493086@qq.com> Co-authored-by: Xinmin Du <10803082+doris-xm@user.noreply.gitee.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

…ith CTR and TrainJob yaml files (kubeflow#2669) * chore(mainfests): include torchtune runtimes. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(manifests): Update torchtune runtimes.: Signed-off-by: Electronic-Waste <2690692950@qq.com> * chore(manifests): Update mounting path in CTRs. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(manifests): Update output_dir. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(manifests): Update numProcPerNode to auto. Signed-off-by: Electronic-Waste <2690692950@qq.com> --------- Signed-off-by: Electronic-Waste <2690692950@qq.com>

…w#2675) * fix(plugins): fix errors in trainer command mutation of torchtune. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(plugins): remove config file format suffix. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(test): update UTs. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(initializer): Update the workspace of dataset/model initializer. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(plugins): parse nproc_per_node from GPU resource. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(torchtune): Add bitsandbytes dependency in requirements.txt Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(lint): fix lint error. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(torchtune): Remove unnecessary num_proc_per_node calculation. Signed-off-by: Electronic-Waste <2690692950@qq.com> * test(torch): Update invalid parameters. Signed-off-by: Electronic-Waste <2690692950@qq.com> --------- Signed-off-by: Electronic-Waste <2690692950@qq.com>

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

…ubeflow#2695) Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

@tenzen-y

* feat: Mutable PodSpecOverrides for suspended TrainJob Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> * Include @tenzen-y review Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> * Add unit tests Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> --------- Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

* feat(example): Add alpaca-trianjob-yaml.ipynb. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(example): Update the overview of the torchtune llama3_2 example. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(example): Update the pvc description. Signed-off-by: Electronic-Waste <2690692950@qq.com> * chore(example): Add the get the fine-tuned model section. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(example): Fix some errors. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(example): fix some errors. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(manifests): Fix debug tag. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(example): Change PVC creation method to Python SDK. Signed-off-by: Electronic-Waste <2690692950@qq.com> * fix(example): Remove config load. Signed-off-by: Electronic-Waste <2690692950@qq.com> --------- Signed-off-by: Electronic-Waste <2690692950@qq.com>

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

* feat: Add schedulingGates to PodSpecOverrides Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> * Change desired job to target job in PodSpecOverrides comments Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> --------- Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

* fix(module): Change Go module name to v2 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Bump x/net to v0.38.0 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* chore(docs): Add Changelog for v2.0.0-rc.1 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Move example to misc Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add Red Hat to ADOPTERS.md Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Update ADOPTERS.md Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

…d to job (kubeflow#2719) Signed-off-by: rudeigerc <rudeigerc@gmail.com>

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

…ow#2731) Signed-off-by: rudeigerc <rudeigerc@gmail.com>

* chore(ci): Add GitHub action to verify PR titles Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Use operator scope Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add examples scope Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add scripts to scope Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add exporter Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * add wip ignore label Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add PR title to the contrib guide Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Ignore dependencies label Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Fix text Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Use action only on master branch Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

…ue template (kubeflow#2732) Signed-off-by: rudeigerc <rudeigerc@gmail.com>

… jobset (kubeflow#2734) Signed-off-by: rudeigerc <rudeigerc@gmail.com>

Signed-off-by: Koray Oksay <koray.oksay@gmail.com>

* chore(docs): Add Changelog for Kubeflow Trainer v2.0.0 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add links for blog post and migration guide Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add links for blog post and website Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* feat(docs): Kubeflow Trainer ROADMAP 2025 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Update roadmap Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add issue for Trainer UI Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add issues for MPI and plugin extension Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add issues for builtin trainers Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

…ubeflow#2754) Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

kubeflow#2911) * feat(initializer): add s3 model and dataset initializers Signed-off-by: rudeigerc <rudeigerc@gmail.com> * chore: refactor with opendal Signed-off-by: rudeigerc <rudeigerc@gmail.com> * chore: support `role_arn` and add `ignore_patterns` field in the Initializers configs Signed-off-by: rudeigerc <rudeigerc@gmail.com> --------- Signed-off-by: rudeigerc <rudeigerc@gmail.com> Co-authored-by: rudeigerc <rudeigerc@gmail.com>

…ubeflow#2912) * chore(operator): Use SSA throughout runtime framework Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> * Fix lint error Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> * Update go.mod file Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> --------- Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> Co-authored-by: Antonin Stefanutti <antonin@stefanutti.fr>

…harts (kubeflow#2914) Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> Co-authored-by: Antonin Stefanutti <antonin@stefanutti.fr>

…branch (kubeflow#2917) * feat(manifests): Publish Trainer Helm Charts (kubeflow#2906) * Solve Remaining Error and bugs Signed-off-by: adity1raut <araut7798@gmail.com> * Solve the confige Signed-off-by: adity1raut <araut7798@gmail.com> * Update The Suggest Change Signed-off-by: adity1raut <araut7798@gmail.com> * Update After REview Signed-off-by: adity1raut <araut7798@gmail.com> * Update the Helm publish action Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Update release doc Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Use 0.0.0 version for master branch Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Update release doc Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: adity1raut <araut7798@gmail.com> Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * fix(manifests): Fix Helm charts image name (kubeflow#2915) * fix(manifests): Fix Helm charts image name Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Always insert appVersion to the Chart.yaml file Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Fix comment Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Simplify action Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * fix(manifests): Remove the default tag from the controller image (kubeflow#2916) * fix(manifests): Remove the default tag from the controller image Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Fix README template Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: adity1raut <araut7798@gmail.com> Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Aditya Raut <159172287+adity1raut@users.noreply.github.com>

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

…cache nodes (kubeflow#2920) Signed-off-by: Akshay Chitneni <achitneni@apple.com> Co-authored-by: Akshay Chitneni <achitneni@apple.com>

…#2924) * add local docker training example Signed-off-by: Brian Gallagher <briangal@gmail.com> * feat: Adding local execution example notebook Co-authored-by Brian Gallagher <bgallagh@redhat.com> Signed-off-by: Fiona Waters <fiwaters6@gmail.com> --------- Signed-off-by: Brian Gallagher <briangal@gmail.com> Signed-off-by: Fiona Waters <fiwaters6@gmail.com> Co-authored-by: Brian Gallagher <briangal@gmail.com> Co-authored-by: Fiona Waters <fiwaters6@gmail.com>

…ubeflow#2927) * fix(ci): Fix the Kubeflow SDK installation with Docker Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Uncomment delete job in local Notebooks Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Update .github/workflows/test-e2e.yaml Co-authored-by: Anya Kramar <akramar@redhat.com> Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Anya Kramar <akramar@redhat.com>

…e and example (kubeflow#2928) Signed-off-by: Akshay Chitneni <achitneni@apple.com> Co-authored-by: Akshay Chitneni <achitneni@apple.com>

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* created github workflow for trainer * added workflow dispatcher * updating temp quay token in github * Remove odh-kfto-sdk-notebooks-sync workflow * updated build pipeline to use rhoai docker file * removed pre-build commands from build and publish * added multiarch docker file * fixed typo for multiarch * fixed multiarch file * temporary quay push * reverted local build image testing creds * Update Dockerfile.rhoai * update dockerfile.rhoai to dockerfile.odh * fixed nitpick comments * removed odh-release.yaml

- Add RHOAI specific Dockerfile for Trainer V2 controller image - Add RHOAI overlay manifests for Trainer V2 - Add custom training runtimes in rhoai overlay

coderabbitai · 2025-11-11T05:21:49Z

Walkthrough

This pull request introduces version 2.1.0 of Kubeflow Trainer with expanded CI/CD workflows, new model definitions in the Python API, Helm chart updates, configuration enhancements, and documentation improvements. Key additions include GPU E2E testing, workflow approval automation, Rust support, Volcano CRD integration, and extensive Kubernetes API model extensions for container restart rules, pod certificates, and extended resource handling.

Changes

Cohort / File(s)	Summary
Configuration & Linting `.flake8`, `.gitattributes`, `.gitignore`, `.pre-commit-config.yaml`, `.golangci-kal.yml`	Extended flake8 ignore rules (E203), added linguist-generated markers for Go generated code and API client paths, expanded gitignore patterns for Helm and data_cache artifacts, configured pre-commit hooks for Rust formatting/checking, introduced kubeapilinter configuration with customized KubeAPI linting rules
GitHub Workflows – CI/CD Pipelines `.github/workflows/build-and-push-images.yaml`, `.github/workflows/test-.yaml`, `.github/workflows/publish-.yaml`, `.github/workflows/template-publish-image/action.yaml`, `.github/workflows/trivy-scan.yaml`	Extended build matrix with data-cache component (linux/amd64, linux/arm64), restricted mlx-runtime to linux/amd64 only, updated Kubernetes test versions (1.31.0 → 1.34.0), added Kubeflow SDK installation with docker extras, added new notebooks to E2E tests (local-training-mnist, local-container-mnist), added repository-scoped gating for Go jobs
GitHub Workflows – New Features `.github/workflows/check-pr-title.yaml`, `.github/workflows/gh-workflow-approve.yaml`, `.github/workflows/test-e2e-gpu.yaml`, `.github/workflows/test-rust.yaml`, `.github/workflows/trivy-scan.yaml`, `.github/workflows/publish-helm-charts.yaml`	Introduced semantic PR title validation (conventional commits), implemented workflow approval automation for Kubeflow members and labeled PRs, added GPU-enabled E2E test workflow with label gating and optional cluster cleanup, added Rust unit/integration test support, introduced Trivy vulnerability scanning, added Helm chart publication to GHCR (OCI registry)
Documentation & Governance `README.md`, `ROADMAP.md`, `SECURITY.md`, `CONTRIBUTING.md`, `CHANGELOG.md`, `ADOPTERS.md`, `OWNERS`	Updated README with Latest News section, replaced badges, expanded overview descriptions, added Kubeflow Trainer-specific terminology; added 2025 roadmap section; introduced security policy with versioning and vulnerability handling; restructured CONTRIBUTING guide with coding style section and conventional commit guidance; added v2.0.1 release notes; added Red Hat adopter entry; promoted astefanutti from reviewer to approver
Build Automation `Makefile`	Bumped Kubernetes/controller toolchain versions (K8S_VERSION 1.32.0→1.34.0, ENVTEST_VERSION release-0.20→release-0.22, CONTROLLER_GEN_VERSION v0.17.2→v0.18.0), added golangci-lint-kal target with KubeAPI linting, introduced Volcano CRD support and targets, expanded manifests generation for config/v1alpha1, added test-rust and test-e2e-setup-gpu-cluster targets, broadened test exclusion filters
Python API – Version & Models Index `api/python_api/kubeflow_trainer_api/__init__.py`, `api/python_api/kubeflow_trainer_api/models/__init__.py`	Bumped package version 2.0.0→2.1.0, added 15+ new model imports (container restart rules, extended resource handling, pod certificates, file key selector, volcano pod group policy, pod template overrides, network topology)
Python API – Kubernetes Core Models `api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_*.py`	Added container restart policy support (ContainerRestartRule, OnExitCodes), introduced extended resource claim handling (ContainerExtendedResourceRequest, PodExtendedResourceClaimStatus), added file-based environment variable sourcing (FileKeySelector), added pod certificate projection support, added node swap status tracking, updated pod spec with hostnameOverride field, added observed_generation and extended resource claim status to pod status, updated field descriptions to remove beta feature gate references and clarify semantics
Python API – Batch & Scheduling Models `api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_.py`, `api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_.py`, `api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_*.py`	Shortened descriptions for backoffLimit, backoffLimitPerIndex, maxFailedIndexes to remove feature gate references, added message pattern matching to failure policy rules, changed JobSet restarts default from None→0, added groupName field to ReplicatedJob, introduced NetworkTopologySpec for network topology constraints
Python API – Trainer Models `api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_*.py`	Replaced PodSpecOverride with PodTemplateOverride (adds metadata handling), added Volcano pod group policy support (VolcanoPodGroupPolicySource), added pod template metadata overrides, extended pod spec overrides with affinity/annotations/imagePullSecrets/labels, updated field descriptions across all trainer models for clarity and consistency, made TrainingRuntimeSpec template field optional
Python API – Configuration `api/python_api/pyproject.toml`	Simplified license declaration from table format to string, removed wheel build targets configuration
Helm Chart `charts/kubeflow-trainer/Chart.yaml`, `charts/kubeflow-trainer/README.md`	Bumped chart version 2.0.0→2.1.0, updated jobset dependency (oci://registry.k8s.io/jobset/charts/jobset v0.8.2→oci://registry.k8s.io/jobset/charts 0.10.1), updated installation instructions to use OCI-based Helm chart from GHCR, added manager security context and controller configuration defaults, expanded documented values with new manager fields

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

Python API Model Updates (api/python_api/kubeflow_trainer_api/models/): Review the consistency of serialization/deserialization logic (to_dict, from_dict) across ~50 model files; verify alias handling and None-field exclusion patterns are applied correctly
Kubernetes Version & Tooling Bumps (Makefile, workflows): Confirm compatibility of controller-gen v0.18.0, envtest release-0.22, and K8S 1.34.0 with existing code generation and manifests
Pod Template Override Refactoring (trainer_v1alpha1_pod_template_override.py, trainer_v1alpha1_train_job_spec.py): Ensure the transition from PodSpecOverride to PodTemplateOverride is backward compatible and properly handles metadata merging
Workflow Approval Automation (.github/workflows/gh-workflow-approve.yaml): Verify membership check logic and workflow run approval API interactions are sound and handle edge cases (404s, non-members)
GPU E2E Workflow (.github/workflows/test-e2e-gpu.yaml): Confirm label-gating, cluster setup/teardown, and artifact handling work correctly with Kind GPU setup

🐰 With whiskers twitching in delight,
New models hop and schemas take flight,
Volcano rumbles, GPU clusters gleam,
Version 2.1 fulfills the training dream!
Workflows approve, the tests run so bright—
Kubeflow Trainer bounds forward tonight! 🚀

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'V2.1.0 odh rebase' refers to rebasing the repository to Trainer v2.1.0, which is the main objective described in the PR. It is specific and directly related to the core purpose of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 23

🧹 Nitpick comments (6)

api/python_api/pyproject.toml (1)

35-36: Consider pinning hatchling version for reproducible builds.

The build-system requirement specifies hatchling without a version constraint, which could lead to non-deterministic builds if breaking changes are introduced in future releases. For a reproducible build pipeline, consider pinning to a specific version or a narrow version range.

Example: requires = ["hatchling>=1.0,<2.0"] or similar.

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_cluster_training_runtime.py (1)

33-34: Documentation descriptions are less informative after the upstream rebase.

The field descriptions have been simplified:

metadata: Previously "Standard object's metadata.", now "metadata of the ClusterTrainingRuntime."

spec: Previously "Specification of the desired ClusterTrainingRuntime.", now "spec of the ClusterTrainingRuntime."

Since this is auto-generated code from the OpenAPI specification, the less descriptive text is coming from the upstream v2.1.0 spec. If more detailed descriptions would benefit API consumers, consider proposing improvements to the upstream OpenAPI definition in the Kubeflow Trainer repository.
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py (1)
30-30: Remove unused noqa directive.

Static analysis flagged the # noqa: E501 on this line as unused. Since this is generated code, you may want to verify with your code generator configuration or clean up manually if applicable.

Apply this diff to remove the unused directive:
-    ModelInitializer represents the desired configuration to initialize pre-trained model. The ModelInitializer spec will override the runtime Job template which contains this label: `trainer.kubeflow.org/trainjob-ancestor-step: dataset-initializer`
-    """ # noqa: E501
+    ModelInitializer represents the desired configuration to initialize pre-trained model. The ModelInitializer spec will override the runtime Job template which contains this label: `trainer.kubeflow.org/trainjob-ancestor-step: dataset-initializer`
+    """
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py (1)
30-30: Remove unused noqa directive.

The # noqa: E501 directive is unnecessary here as the line length doesn't trigger E501.

Apply this diff:
-    """ # noqa: E501
+    """
Note: Since this is an auto-generated file, consider whether the fix should be applied to the OpenAPI generator template instead.
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (2)
25-31: Consider adding field validation constraints.

The field descriptions mention specific constraints that aren't enforced in the model:

operator: Should be "In" or "NotIn"

values: Maximum 255 elements allowed

While server-side validation may handle these constraints, adding client-side validation would provide earlier error detection.

Example validation with Pydantic v2 features:
+from typing import Literal
+from pydantic import field_validator

 class IoK8sApiCoreV1ContainerRestartRuleOnExitCodes(BaseModel):
     """
     ContainerRestartRuleOnExitCodes describes the condition for handling an exited container based on its exit codes.
     """ # noqa: E501
-    operator: StrictStr = Field(description="Represents the relationship between the container exit code(s) and the specified values. Possible values are: - In: the requirement is satisfied if the container exit code is in the   set of specified values. - NotIn: the requirement is satisfied if the container exit code is   not in the set of specified values.")
+    operator: Literal["In", "NotIn"] = Field(description="Represents the relationship between the container exit code(s) and the specified values. Possible values are: - In: the requirement is satisfied if the container exit code is in the   set of specified values. - NotIn: the requirement is satisfied if the container exit code is   not in the set of specified values.")
     values: Optional[List[StrictInt]] = Field(default=None, description="Specifies the set of values to check for container exit codes. At most 255 elements are allowed.")
     __properties: ClassVar[List[str]] = ["operator", "values"]
+    
+    @field_validator('values')
+    @classmethod
+    def validate_values_length(cls, v):
+        if v is not None and len(v) > 255:
+            raise ValueError('values list cannot exceed 255 elements')
+        return v
54-72: Simplify to_dict() by removing unused excluded_fields.

The excluded_fields set is defined but remains empty and unused. This appears to be generated template code.

Apply this diff to simplify:
     def to_dict(self) -> Dict[str, Any]:
         """Return the dictionary representation of the model using alias.

         This has the following differences from calling pydantic's
         `self.model_dump(by_alias=True)`:

         * `None` is only added to the output dict for nullable fields that
           were set at model initialization. Other fields with value `None`
           are ignored.
         """
-        excluded_fields: Set[str] = set([
-        ])
-
-        _dict = self.model_dump(
+        return self.model_dump(
             by_alias=True,
-            exclude=excluded_fields,
             exclude_none=True,
         )
-        return _dict
Note: If this pattern is needed for consistency with other generated models in the codebase, it's acceptable to leave as-is.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0c78f5 and c156297.

⛔ Files ignored due to path filters (6)

docs/images/trainer-tech-stack.drawio.svg is excluded by !**/*.svg
docs/proposals/2437-volcano-scheduler/user-roles-scheduler.drawio.svg is excluded by !**/*.svg
docs/proposals/2442-jax-runtime-trainer-v2/drawing.drawio.svg is excluded by !**/*.svg
go.sum is excluded by !**/*.sum
pkg/data_cache/Cargo.lock is excluded by !**/*.lock
pkg/data_cache/test/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (107)

.flake8 (1 hunks)
.gitattributes (1 hunks)
.github/ISSUE_TEMPLATE/bug_report.yaml (1 hunks)
.github/workflows/build-and-push-images.yaml (2 hunks)
.github/workflows/check-pr-title.yaml (1 hunks)
.github/workflows/gh-workflow-approve.yaml (1 hunks)
.github/workflows/publish-helm-charts.yaml (1 hunks)
.github/workflows/template-publish-image/action.yaml (1 hunks)
.github/workflows/test-e2e-gpu.yaml (1 hunks)
.github/workflows/test-e2e.yaml (3 hunks)
.github/workflows/test-go.yaml (2 hunks)
.github/workflows/test-rust.yaml (1 hunks)
.github/workflows/trivy-scan.yaml (1 hunks)
.gitignore (1 hunks)
.golangci-kal.yml (1 hunks)
.pre-commit-config.yaml (1 hunks)
ADOPTERS.md (1 hunks)
CHANGELOG.md (4 hunks)
CONTRIBUTING.md (1 hunks)
Makefile (7 hunks)
OWNERS (1 hunks)
README.md (3 hunks)
ROADMAP.md (1 hunks)
SECURITY.md (1 hunks)
api/python_api/kubeflow_trainer_api/__init__.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/__init__.py (7 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py (3 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_spec.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_status.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_pod_failure_policy_rule.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_success_policy.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (6 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py (3 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoint_address.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoint_port.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoint_subset.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoints.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_endpoints_list.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_from_source.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var_source.py (4 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container.py (6 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container_common.py (6 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_glusterfs_volume_source.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_lifecycle.py (3 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_system_info.py (4 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_persistent_volume_claim_spec.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_persistent_volume_claim_status.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_persistent_volume_spec.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_affinity_term.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_anti_affinity.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_condition.py (3 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_spec.py (3 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_status.py (4 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_replication_controller_spec.py (2 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_quota_spec.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_requirements.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_scoped_resource_selector_requirement.py (2 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_service_spec.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_taint.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_topology_spread_constraint.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume_projection.py (4 hunks)
api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume_source.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_failure_policy_rule.py (2 hunks)
api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_job_set_status.py (2 hunks)
api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_replicated_job.py (2 hunks)
api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_cluster_training_runtime.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_container_override.py (2 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_coscheduling_pod_group_policy_source.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_dataset_initializer.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_initializer.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_job_set_template_spec.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_job_status.py (2 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy_source.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_mpiml_policy_source.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py (3 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py (3 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_spec_override.py (4 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_spec_override_target_job.py (2 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_runtime_ref.py (2 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_elastic_policy.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_ml_policy_source.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_spec.py (4 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_status.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_trainer.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime_spec.py (1 hunks)
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (1 hunks)
api/python_api/pyproject.toml (1 hunks)
charts/kubeflow-trainer/Chart.yaml (1 hunks)
charts/kubeflow-trainer/README.md (3 hunks)

⛔ Files not processed due to max files limit (51)

charts/kubeflow-trainer/README.md.gotmpl
charts/kubeflow-trainer/crds/trainer.kubeflow.org_clustertrainingruntimes.yaml
charts/kubeflow-trainer/crds/trainer.kubeflow.org_trainingruntimes.yaml
charts/kubeflow-trainer/templates/_helpers.tpl
charts/kubeflow-trainer/templates/manager/_helpers.tpl
charts/kubeflow-trainer/templates/manager/configmap.yaml
charts/kubeflow-trainer/templates/manager/deployment.yaml
charts/kubeflow-trainer/templates/rbac/clusterrole.yaml
charts/kubeflow-trainer/tests/manager/configmap_test.yaml
charts/kubeflow-trainer/values.yaml
cmd/data_cache/Dockerfile
cmd/initializers/dataset/Dockerfile
cmd/initializers/dataset/requirements.txt
cmd/initializers/model/Dockerfile
cmd/initializers/model/requirements.txt
cmd/runtimes/deepspeed/Dockerfile
cmd/runtimes/deepspeed/requirements.txt
cmd/runtimes/mlx/Dockerfile
cmd/runtimes/mlx/requirements.txt
cmd/trainer-controller-manager/main.go
cmd/trainers/torchtune/Dockerfile
cmd/trainers/torchtune/requirements.txt
docs/proposals/2170-kubeflow-trainer-v2/README.md
docs/proposals/2401-llm-trainer-v2/README.md
docs/proposals/2432-gpu-testing-on-llm-blueprints/OCI VM/bootstrap.sh
docs/proposals/2432-gpu-testing-on-llm-blueprints/OCI VM/main.tf
docs/proposals/2432-gpu-testing-on-llm-blueprints/OCI VM/terraform.tfvars
docs/proposals/2432-gpu-testing-on-llm-blueprints/README.md
docs/proposals/2437-volcano-scheduler/README.md
docs/proposals/2442-jax-runtime-trainer-v2/README.md
docs/release/README.md
examples/local/local-container-mnist.ipynb
examples/local/local-training-mnist.ipynb
examples/mlx/language-modeling/fine-tune-llama.ipynb
examples/pytorch/data-cache/fine-tune-with-cache.ipynb
examples/pytorch/image-classification/mnist.ipynb
examples/pytorch/question-answering/fine-tune-distilbert.ipynb
examples/torchtune/llama3_2/alpaca-trainjob-yaml.ipynb
examples/torchtune/qwen2_5/qwen2.5-1.5B-with-alpaca.ipynb
go.mod
hack/.custom-gcl.yaml
hack/data_cache/run_with_remote_table.sh
hack/e2e-run-notebook.sh
hack/e2e-setup-cluster.sh
hack/e2e-setup-gpu-cluster.sh
hack/python-api/gen-api.sh
hack/scripts/load-image-to-kind.sh
hack/swagger/main.go
hack/update-codegen.sh
hack/violation_exception_v1alpha1.list
manifests/base/crds/trainer.kubeflow.org_clustertrainingruntimes.yaml

🧰 Additional context used

🧬 Code graph analysis (44)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume_projection.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py (1)

IoK8sApiCoreV1PodCertificateProjection (25-95)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (2)

to_dict (55-76)

from_dict (79-91)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container.py (3)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (3)

IoK8sApiCoreV1ContainerRestartRule (26-91)

to_dict (55-76)

from_dict (79-91)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (2)

to_dict (108-193)

from_dict (196-231)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (2)

to_dict (54-72)

from_dict (75-87)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_initializer.py (2)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_dataset_initializer.py (1)

TrainerV1alpha1DatasetInitializer (27-101)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py (1)

TrainerV1alpha1ModelInitializer (27-101)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (1)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (2)

to_dict (58-89)

from_dict (92-105)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_cluster_training_runtime.py (1)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime_spec.py (1)

TrainerV1alpha1TrainingRuntimeSpec (28-101)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var_source.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (3)

IoK8sApiCoreV1FileKeySelector (25-91)

to_dict (56-74)

from_dict (77-91)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)

IoK8sApiCoreV1EnvVar (26-93)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime.py (1)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime_spec.py (1)

TrainerV1alpha1TrainingRuntimeSpec (28-101)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py (2)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_mpiml_policy_source.py (1)

TrainerV1alpha1MPIMLPolicySource (25-91)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_ml_policy_source.py (1)

TrainerV1alpha1TorchMLPolicySource (27-95)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy_source.py (2)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_mpiml_policy_source.py (1)

TrainerV1alpha1MPIMLPolicySource (25-91)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_ml_policy_source.py (1)

TrainerV1alpha1TorchMLPolicySource (27-95)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_lifecycle.py (1)

stop_signal_validate_enum (36-43)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_glusterfs_volume_source.py (1)

IoK8sApiCoreV1GlusterfsVolumeSource (25-89)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container_common.py (4)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (3)

IoK8sApiCoreV1ContainerRestartRule (26-91)

to_dict (55-76)

from_dict (79-91)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_from_source.py (3)

IoK8sApiCoreV1EnvFromSource (27-97)

to_dict (57-81)

from_dict (84-97)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (2)

to_dict (108-193)

from_dict (196-231)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container.py (2)

to_dict (109-194)

from_dict (197-233)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (2)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)

from_dict (80-93)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (1)

from_dict (77-91)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (2)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (6)

TrainerV1alpha1PodTemplateOverrideTargetJob (25-85)

to_str (39-41)

to_json (43-46)

to_dict (53-71)

from_json (49-51)

from_dict (74-85)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (6)

TrainerV1alpha1PodTemplateSpecOverride (31-152)

to_str (53-55)

to_json (57-60)

to_dict (67-130)

from_json (63-65)

from_dict (133-152)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py (2)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (3)

TrainerV1alpha1VolcanoPodGroupPolicySource (26-89)

to_dict (54-75)

from_dict (78-89)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py (2)

to_dict (56-80)

from_dict (83-95)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_ml_policy_source.py (1)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_torch_elastic_policy.py (1)

TrainerV1alpha1TorchElasticPolicy (26-99)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (6)

IoK8sApiCoreV1ContainerRestartRuleOnExitCodes (25-87)

to_str (40-42)

to_json (44-47)

to_dict (54-72)

from_json (50-52)

from_dict (75-87)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (2)

api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (6)

SchedulingV1beta1NetworkTopologySpec (25-87)

to_str (40-42)

to_json (44-47)

to_dict (54-72)

from_json (50-52)

from_dict (75-87)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py (5)

to_str (42-44)

to_json (46-49)

to_dict (56-80)

from_json (52-54)

from_dict (83-95)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_status.py (1)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_job_status.py (1)

TrainerV1alpha1JobStatus (25-95)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py (2)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_spec.py (2)

to_dict (92-122)

from_dict (125-151)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py (2)

to_dict (83-134)

from_dict (137-167)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_spec_override.py (2)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_container_override.py (3)

TrainerV1alpha1ContainerOverride (27-105)

to_dict (57-89)

from_dict (92-105)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (2)

to_dict (67-130)

from_dict (133-152)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (3)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (3)

IoK8sApiCoreV1ContainerRestartRule (26-91)

to_dict (55-76)

from_dict (79-91)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_from_source.py (3)

IoK8sApiCoreV1EnvFromSource (27-97)

to_dict (57-81)

from_dict (84-97)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_ephemeral_container.py (2)

to_dict (109-194)

from_dict (197-233)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_status.py (2)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_condition.py (1)

IoK8sApiCoreV1PodCondition (26-98)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py (1)

IoK8sApiCoreV1PodExtendedResourceClaimStatus (26-95)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_system_info.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py (1)

IoK8sApiCoreV1NodeSwapStatus (25-85)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (6)

IoK8sApiCoreV1ContainerExtendedResourceRequest (25-89)

to_str (41-43)

to_json (45-48)

to_dict (55-73)

from_json (51-53)

from_dict (76-89)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py (2)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (3)

to_dict (53-71)

from_json (49-51)

from_dict (74-85)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_spec.py (3)

to_dict (64-98)

from_json (60-62)

from_dict (101-119)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py (2)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (3)

TrainerV1alpha1VolcanoPodGroupPolicySource (26-89)

to_dict (54-75)

from_dict (78-89)

api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (2)

to_dict (54-72)

from_dict (75-87)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var_source.py (1)

from_dict (98-113)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume_source.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_glusterfs_volume_source.py (1)

IoK8sApiCoreV1GlusterfsVolumeSource (25-89)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (2)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_volume.py (1)

IoK8sApiCoreV1Volume (55-265)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_container_override.py (6)

TrainerV1alpha1ContainerOverride (27-105)

to_str (43-45)

to_json (47-50)

to_dict (57-89)

from_json (53-55)

from_dict (92-105)

api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (1)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (2)

to_dict (54-75)

from_dict (78-89)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_spec.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_success_policy.py (1)

IoK8sApiBatchV1SuccessPolicy (26-93)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_lifecycle.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py (2)

stop_signal_validate_enum (53-60)

from_dict (137-167)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job.py (2)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_spec.py (1)

TrainerV1alpha1TrainJobSpec (29-119)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_status.py (1)

TrainerV1alpha1TrainJobStatus (27-103)

api/python_api/kubeflow_trainer_api/models/__init__.py (12)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (1)

IoK8sApiCoreV1ContainerExtendedResourceRequest (25-89)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py (1)

IoK8sApiCoreV1ContainerRestartRule (26-91)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py (1)

IoK8sApiCoreV1ContainerRestartRuleOnExitCodes (25-87)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (1)

IoK8sApiCoreV1FileKeySelector (25-91)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py (1)

IoK8sApiCoreV1NodeSwapStatus (25-85)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py (1)

IoK8sApiCoreV1PodCertificateProjection (25-95)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py (1)

IoK8sApiCoreV1PodExtendedResourceClaimStatus (26-95)

api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py (1)

SchedulingV1beta1NetworkTopologySpec (25-87)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (1)

TrainerV1alpha1PodTemplateOverride (28-105)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (1)

TrainerV1alpha1PodTemplateOverrideTargetJob (25-85)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (1)

TrainerV1alpha1PodTemplateSpecOverride (31-152)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py (1)

TrainerV1alpha1VolcanoPodGroupPolicySource (26-89)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_trainer.py (2)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)

IoK8sApiCoreV1EnvVar (26-93)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_requirements.py (1)

IoK8sApiCoreV1ResourceRequirements (27-122)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_spec.py (2)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container.py (1)

IoK8sApiCoreV1Container (36-231)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_requirements.py (1)

IoK8sApiCoreV1ResourceRequirements (27-122)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_dataset_initializer.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)

IoK8sApiCoreV1EnvVar (26-93)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_training_runtime_spec.py (3)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py (1)

TrainerV1alpha1MLPolicy (27-97)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py (1)

TrainerV1alpha1PodGroupPolicy (27-95)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_job_set_template_spec.py (1)

TrainerV1alpha1JobSetTemplateSpec (27-95)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_container_override.py (1)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_var.py (1)

IoK8sApiCoreV1EnvVar (26-93)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_spec.py (4)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py (3)

TrainerV1alpha1PodTemplateOverride (28-105)

to_dict (58-89)

from_dict (92-105)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py (2)

to_dict (53-71)

from_dict (74-85)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py (2)

to_dict (67-130)

from_dict (133-152)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py (2)

to_dict (54-72)

from_dict (75-87)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py (2)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py (5)

to_str (41-43)

to_json (45-48)

to_dict (55-73)

from_json (51-53)

from_dict (76-89)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py (5)

to_str (42-44)

to_json (46-49)

to_dict (56-74)

from_json (52-54)

from_dict (77-91)

🪛 actionlint (1.7.8)

.github/workflows/test-e2e-gpu.yaml

14-14: label "oracle-vm-16cpu-a10gpu-240gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

87-87: label "oracle-vm-16cpu-a10gpu-240gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 Flake8 (7.3.0)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py

[error] 25-25: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py

[error] 24-24: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py

[error] 23-23: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py

[error] 23-23: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py

[error] 24-24: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py

[error] 24-24: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py

[error] 28-28: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

🪛 LanguageTool

CONTRIBUTING.md

[uncategorized] ~114-~114: Possible missing comma found.
Context: ...res or enhancement for Kubeflow Trainer project we follow the [Kubeflow Enhancement Pro...

(AI_HYDRA_LEO_MISSING_COMMA)

SECURITY.md

[uncategorized] ~19-~19: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... report vulnerabilities to the Kubeflow Open Source Community. All reports are thoroughly i...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

[typographical] ~58-~58: Consider adding a comma here.
Context: ...tion Channels For the general questions please join the following resources: - Kubefl...

(PLEASE_COMMA)

README.md

[uncategorized] ~25-~25: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: .../releases/tag/v2.0.0). - [2025/04] From High Performance Computing To AI Workloads on Kubernetes...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

charts/kubeflow-trainer/README.md

[style] ~51-~51: Try moving the adverb to make the sentence clearer.
Context: ...ullnameOverride | string | "jobset" | String to fully override jobset release name. | | commonLabels | object | {} | Com...

(SPLIT_INFINITIVE)

🪛 markdownlint-cli2 (0.18.1)

ROADMAP.md

24-24: Bare URL used

(MD034, no-bare-urls)

CONTRIBUTING.md

136-136: Bare URL used

(MD034, no-bare-urls)

138-138: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.4)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_resource_quota_spec.py

44-44: Avoid specifying long messages outside the exception class

(TRY003)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override_target_job.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_scoped_resource_selector_requirement.py

45-45: Avoid specifying long messages outside the exception class

(TRY003)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_node_swap_status.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_model_initializer.py

30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_ml_policy.py

30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_status.py

59-59: Avoid specifying long messages outside the exception class

(TRY003)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_extended_resource_request.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_override.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

25-25: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

31-31: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy.py

24-24: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

23-23: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

29-29: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

23-23: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

29-29: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py

24-24: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_extended_resource_claim_status.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

29-29: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_metadata_override.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_group_policy_source.py

24-24: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_file_key_selector.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_pod_template_spec_override.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

28-28: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

34-34: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/scheduling_v1beta1_network_topology_spec.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_lifecycle.py

42-42: Avoid specifying long messages outside the exception class

(TRY003)

api/python_api/kubeflow_trainer_api/models/jobset_v1alpha2_failure_policy_rule.py

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_env_from_source.py

30-30: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_pod_certificate_projection.py

12-12: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

28-28: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: pre-commit

coderabbitai · 2025-11-11T06:21:17Z

+  exclusions:
+    generated: strict
+    rules:
+      ## KAL should only run on API folders.
+      - path-except: "pkg/apis/*"
+        linters:
+          - kubeapilinter


⚠️ Potential issue | 🔴 Critical

Exclusion rule is inverted—KAL won't run on API folders.

The exclusion rule contradicts the stated intent. Line 66 says "KAL should only run on API folders," but path-except: "pkg/apis/*" at line 67 excludes kubeapilinter from paths matching pkg/apis/*—the opposite of the goal.

To fix, the exclusion should target paths outside the API folder. Apply this diff:

issues: max-same-issues: 0 + exclude-rules: + - path-except: "pkg/apis/*" + linters: + - kubeapilinter - exclusions: - generated: strict - rules: - ## KAL should only run on API folders. - - path-except: "pkg/apis/*" - linters: - - kubeapilinter

Alternatively, if path-except is not the intended syntax, use path: to explicitly match only API folders:

- exclusions: - generated: strict - rules: - ## KAL should only run on API folders. - - path-except: "pkg/apis/*" + exclude-rules: + ## KAL should only run on API folders. + - path: "^(?!pkg/apis/)" linters: - kubeapilinter

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

.golangci-kal.yml around lines 63 to 69: the exclusion currently uses path-except: "pkg/apis/*", which excludes API folders instead of limiting KAL to them; change the rule to explicitly target only API folders by replacing path-except with path: "pkg/apis/*" (or invert the pattern so excluded paths are everything except pkg/apis/*), and keep the linters block the same so kubeapilinter runs only on pkg/apis/*.

coderabbitai · 2025-11-11T06:21:17Z

 from typing import Any, ClassVar, Dict, List, Optional
 from kubeflow_trainer_api.models.io_k8s_api_autoscaling_v2_hpa_scaling_policy import IoK8sApiAutoscalingV2HPAScalingPolicy
+from kubeflow_trainer_api.models.io_k8s_apimachinery_pkg_api_resource_quantity import IoK8sApimachineryPkgApiResourceQuantity
 from typing import Optional, Set


⚠️ Potential issue | 🟡 Minor

Remove duplicate Optional import.

Optional is already imported on line 21. Line 24 should only import Set.

Apply this diff to fix the duplicate import:

-from typing import Optional, Set +from typing import Set

Note: Since this file is auto-generated (line 11), consider whether this issue originates from the OpenAPI Generator templates and should be addressed there if it's a recurring pattern across generated files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from typing import Optional, Set

from typing import Set

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 24-24: redefinition of unused 'Optional' from line 21

(F811)

🪛 Ruff (0.14.4)

24-24: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

🤖 Prompt for AI Agents

In api/python_api/kubeflow_trainer_api/models/io_k8s_api_autoscaling_v2_hpa_scaling_rules.py around line 24, there is a duplicate import of Optional (it’s already imported on line 21); remove Optional from the line and leave only Set so the import reads only Set, and if this duplication appears across generated files consider fixing the OpenAPI Generator template that emits duplicate imports.

coderabbitai · 2025-11-11T06:21:18Z

+    backoff_limit: Optional[StrictInt] = Field(default=None, description="Specifies the number of retries before marking this job failed. Defaults to 6, unless backoffLimitPerIndex (only Indexed Job) is specified. When backoffLimitPerIndex is specified, backoffLimit defaults to 2147483647.", alias="backoffLimit")
+    backoff_limit_per_index: Optional[StrictInt] = Field(default=None, description="Specifies the limit for the number of retries within an index before marking this index as failed. When enabled the number of failures per index is kept in the pod's batch.kubernetes.io/job-index-failure-count annotation. It can only be set when Job's completionMode=Indexed, and the Pod's restart policy is Never. The field is immutable.", alias="backoffLimitPerIndex")
    completion_mode: Optional[StrictStr] = Field(default=None, description="completionMode specifies how Pod completions are tracked. It can be `NonIndexed` (default) or `Indexed`.  `NonIndexed` means that the Job is considered complete when there have been .spec.completions successfully completed Pods. Each Pod completion is homologous to each other.  `Indexed` means that the Pods of a Job get an associated completion index from 0 to (.spec.completions - 1), available in the annotation batch.kubernetes.io/job-completion-index. The Job is considered complete when there is one successfully completed Pod for each index. When value is `Indexed`, .spec.completions must be specified and `.spec.parallelism` must be less than or equal to 10^5. In addition, The Pod name takes the form `$(job-name)-$(index)-$(random-string)`, the Pod hostname takes the form `$(job-name)-$(index)`.  More completion modes can be added in the future. If the Job controller observes a mode that it doesn't recognize, which is possible during upgrades due to version skew, the controller skips updates for the Job.  Possible enum values:  - `\"Indexed\"` is a Job completion mode. In this mode, the Pods of a Job get an associated completion index from 0 to (.spec.completions - 1). The Job is considered complete when a Pod completes for each completion index.  - `\"NonIndexed\"` is a Job completion mode. In this mode, the Job is considered complete when there have been .spec.completions successfully completed Pods. Pod completions are homologous to each other.", alias="completionMode")
    completions: Optional[StrictInt] = Field(default=None, description="Specifies the desired number of successfully finished pods the job should be run with.  Setting to null means that the success of any pod signals the success of all pods, and allows parallelism to have any positive value.  Setting to 1 means that parallelism is limited to 1 and the success of that pod signals the success of the job. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/")
    managed_by: Optional[StrictStr] = Field(default=None, description="ManagedBy field indicates the controller that manages a Job. The k8s Job controller reconciles jobs which don't have this field at all or the field value is the reserved string `kubernetes.io/job-controller`, but skips reconciling Jobs with a custom value for this field. The value must be a valid domain-prefixed path (e.g. acme.io/foo) - all characters before the first \"/\" must be a valid subdomain as defined by RFC 1123. All characters trailing the first \"/\" must be valid HTTP Path characters as defined by RFC 3986. The value cannot exceed 63 characters. This field is immutable.  This field is beta-level. The job controller accepts setting the field when the feature gate JobManagedBy is enabled (enabled by default).", alias="managedBy")
    manual_selector: Optional[StrictBool] = Field(default=None, description="manualSelector controls generation of pod labels and pod selectors. Leave `manualSelector` unset unless you are certain what you are doing. When false or unset, the system pick labels unique to this job and appends those labels to the pod template.  When true, the user is responsible for picking unique labels and specifying the selector.  Failure to pick a unique label may cause this and other jobs to not function correctly.  However, You may see `manualSelector=true` in jobs that were created with the old `extensions/v1beta1` API. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector", alias="manualSelector")
-    max_failed_indexes: Optional[StrictInt] = Field(default=None, description="Specifies the maximal number of failed indexes before marking the Job as failed, when backoffLimitPerIndex is set. Once the number of failed indexes exceeds this number the entire Job is marked as Failed and its execution is terminated. When left as null the job continues execution of all of its indexes and is marked with the `Complete` Job condition. It can only be specified when backoffLimitPerIndex is set. It can be null or up to completions. It is required and must be less than or equal to 10^4 when is completions greater than 10^5. This field is beta-level. It can be used when the `JobBackoffLimitPerIndex` feature gate is enabled (enabled by default).", alias="maxFailedIndexes")
+    max_failed_indexes: Optional[StrictInt] = Field(default=None, description="Specifies the maximal number of failed indexes before marking the Job as failed, when backoffLimitPerIndex is set. Once the number of failed indexes exceeds this number the entire Job is marked as Failed and its execution is terminated. When left as null the job continues execution of all of its indexes and is marked with the `Complete` Job condition. It can only be specified when backoffLimitPerIndex is set. It can be null or up to completions. It is required and must be less than or equal to 10^4 when is completions greater than 10^5.", alias="maxFailedIndexes")
    parallelism: Optional[StrictInt] = Field(default=None, description="Specifies the maximum desired number of pods the job should run at any given time. The actual number of pods running in steady state will be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism), i.e. when the work left to do is less than max parallelism. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/")
    pod_failure_policy: Optional[IoK8sApiBatchV1PodFailurePolicy] = Field(default=None, description="Specifies the policy of handling failed pods. In particular, it allows to specify the set of actions and conditions which need to be satisfied to take the associated action. If empty, the default behaviour applies - the counter of failed pods, represented by the jobs's .status.failed field, is incremented and it is checked against the backoffLimit. This field cannot be used in combination with restartPolicy=OnFailure.", alias="podFailurePolicy")
-    pod_replacement_policy: Optional[StrictStr] = Field(default=None, description="podReplacementPolicy specifies when to create replacement Pods. Possible values are: - TerminatingOrFailed means that we recreate pods   when they are terminating (has a metadata.deletionTimestamp) or failed. - Failed means to wait until a previously created Pod is fully terminated (has phase   Failed or Succeeded) before creating a replacement Pod.  When using podFailurePolicy, Failed is the the only allowed value. TerminatingOrFailed and Failed are allowed values when podFailurePolicy is not in use. This is an beta field. To use this, enable the JobPodReplacementPolicy feature toggle. This is on by default.  Possible enum values:  - `\"Failed\"` means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod.  - `\"TerminatingOrFailed\"` means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed.", alias="podReplacementPolicy")
+    pod_replacement_policy: Optional[StrictStr] = Field(default=None, description="podReplacementPolicy specifies when to create replacement Pods. Possible values are: - TerminatingOrFailed means that we recreate pods   when they are terminating (has a metadata.deletionTimestamp) or failed. - Failed means to wait until a previously created Pod is fully terminated (has phase   Failed or Succeeded) before creating a replacement Pod.  When using podFailurePolicy, Failed is the the only allowed value. TerminatingOrFailed and Failed are allowed values when podFailurePolicy is not in use.  Possible enum values:  - `\"Failed\"` means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod.  - `\"TerminatingOrFailed\"` means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed.", alias="podReplacementPolicy")
    selector: Optional[IoK8sApimachineryPkgApisMetaV1LabelSelector] = Field(default=None, description="A label query over pods that should match the pod count. Normally, the system sets this field for you. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors")
-    success_policy: Optional[IoK8sApiBatchV1SuccessPolicy] = Field(default=None, description="successPolicy specifies the policy when the Job can be declared as succeeded. If empty, the default behavior applies - the Job is declared as succeeded only when the number of succeeded pods equals to the completions. When the field is specified, it must be immutable and works only for the Indexed Jobs. Once the Job meets the SuccessPolicy, the lingering pods are terminated.  This field is beta-level. To use this field, you must enable the `JobSuccessPolicy` feature gate (enabled by default).", alias="successPolicy")
+    success_policy: Optional[IoK8sApiBatchV1SuccessPolicy] = Field(default=None, description="successPolicy specifies the policy when the Job can be declared as succeeded. If empty, the default behavior applies - the Job is declared as succeeded only when the number of succeeded pods equals to the completions. When the field is specified, it must be immutable and works only for the Indexed Jobs. Once the Job meets the SuccessPolicy, the lingering pods are terminated.", alias="successPolicy")


⚠️ Potential issue | 🟡 Minor

Documentation updates align with API stabilization; minor typo detected.

Lines 34–45 update field descriptions to remove feature-gate and beta references, clarifying default behaviors for interdependent fields (e.g., backoffLimit/backoffLimitPerIndex interaction). These changes are consistent with Kubernetes v2.1.0 feature stabilization and appropriate for an auto-generated model file.

However, Line 43 contains a typo: "the the only allowed value" should be "the only allowed value".

Please correct the typo on line 43:

- pod_replacement_policy: Optional[StrictStr] = Field(default=None, description="podReplacementPolicy specifies when to create replacement Pods. Possible values are: - TerminatingOrFailed means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed. - Failed means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod. When using podFailurePolicy, Failed is the the only allowed value. TerminatingOrFailed and Failed are allowed values when podFailurePolicy is not in use. Possible enum values: - `\"Failed\"` means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod. - `\"TerminatingOrFailed\"` means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed.", alias="podReplacementPolicy") + pod_replacement_policy: Optional[StrictStr] = Field(default=None, description="podReplacementPolicy specifies when to create replacement Pods. Possible values are: - TerminatingOrFailed means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed. - Failed means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod. When using podFailurePolicy, Failed is the only allowed value. TerminatingOrFailed and Failed are allowed values when podFailurePolicy is not in use. Possible enum values: - `\"Failed\"` means to wait until a previously created Pod is fully terminated (has phase Failed or Succeeded) before creating a replacement Pod. - `\"TerminatingOrFailed\"` means that we recreate pods when they are terminating (has a metadata.deletionTimestamp) or failed.", alias="podReplacementPolicy")

🤖 Prompt for AI Agents

In api/python_api/kubeflow_trainer_api/models/io_k8s_api_batch_v1_job_spec.py around lines 34 to 45, fix the typo in the pod_replacement_policy Field description where it currently reads "the the only allowed value" by removing the duplicate "the" so it reads "the only allowed value"; make this single-word fix inside the description string and leave all other text unchanged.

coderabbitai · 2025-11-11T06:21:18Z

+from __future__ import annotations
+import pprint
+import re  # noqa: F401
+import json
+
+from pydantic import BaseModel, ConfigDict, Field, StrictInt, StrictStr
+from typing import Any, ClassVar, Dict, List, Optional
+from typing import Optional, Set
+from typing_extensions import Self


⚠️ Potential issue | 🟡 Minor

Clean up import issues in the code generator.

The import section has several issues flagged by static analysis:

Line 17: re is imported but never used

Line 22: Optional is imported twice (duplicate from line 21)

Lines 12, 17, 28: Unused noqa directives

Since this is auto-generated code (line 11 states "Do not edit the class manually"), these issues should be addressed in the OpenAPI code generator configuration or templates.

Apply this diff to manually fix the imports:

from __future__ import annotations import pprint -import re # noqa: F401 import json from pydantic import BaseModel, ConfigDict, Field, StrictInt, StrictStr from typing import Any, ClassVar, Dict, List, Optional -from typing import Optional, Set +from typing import Set from typing_extensions import Self

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from __future__ import annotations

import pprint

import re # noqa: F401

import json

from pydantic import BaseModel, ConfigDict, Field, StrictInt, StrictStr

from typing import Any, ClassVar, Dict, List, Optional

from typing import Optional, Set

from typing_extensions import Self

from __future__ import annotations

import pprint

import json

from pydantic import BaseModel, ConfigDict, Field, StrictInt, StrictStr

from typing import Any, ClassVar, Dict, List, Optional

from typing import Set

from typing_extensions import Self

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 22-22: redefinition of unused 'Optional' from line 21

(F811)

🪛 Ruff (0.14.4)

17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

22-22: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

🤖 Prompt for AI Agents

In api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py around lines 15 to 23, the import block contains unused/duplicate imports and unnecessary noqa directives; remove the unused "re" import, delete the duplicate "Optional" import, and remove the redundant "# noqa: F401" and any other unused noqa comments so only required imports remain, and update the OpenAPI generator templates/config to prevent reintroducing these unused imports in generated code.

coderabbitai · 2025-11-11T06:21:18Z

+    def to_json(self) -> str:
+        """Returns the JSON representation of the model using alias"""
+        # TODO: pydantic v2: use .model_dump_json(by_alias=True, exclude_unset=True) instead
+        return json.dumps(self.to_dict())


🛠️ Refactor suggestion | 🟠 Major

Address the TODO: Use Pydantic v2's model_dump_json().

The TODO comment suggests using .model_dump_json(), and since Pydantic v2.10.0 is already in use, this should be implemented. The current approach of calling json.dumps(self.to_dict()) is less efficient than the native Pydantic v2 method.

Apply this diff to use the Pydantic v2 method:

def to_json(self) -> str: """Returns the JSON representation of the model using alias""" - # TODO: pydantic v2: use .model_dump_json(by_alias=True, exclude_unset=True) instead - return json.dumps(self.to_dict()) + return self.model_dump_json(by_alias=True, exclude_none=True)

Note: Using exclude_none=True to match the behavior of to_dict() (line 70).

🤖 Prompt for AI Agents

In api/python_api/kubeflow_trainer_api/models/io_k8s_api_core_v1_container_restart_rule_on_exit_codes.py around lines 44 to 47, replace the manual json.dumps(self.to_dict()) with Pydantic v2's native JSON exporter: call self.model_dump_json(by_alias=True, exclude_none=True) (include any additional flags like exclude_unset=True if you want to preserve previous unset-exclusion behavior), so the method returns the model_dump_json result instead of json.dumps(self.to_dict()).

coderabbitai · 2025-11-11T06:21:20Z

+from kubeflow_trainer_api.models.scheduling_v1beta1_network_topology_spec import SchedulingV1beta1NetworkTopologySpec
+from typing import Optional, Set
+from typing_extensions import Self


⚠️ Potential issue | 🟡 Minor

Fix the duplicate Optional import.

Line 23 redefines Optional, which was already imported from typing on line 21.

Apply this diff:

from kubeflow_trainer_api.models.scheduling_v1beta1_network_topology_spec import SchedulingV1beta1NetworkTopologySpec -from typing import Optional, Set +from typing import Set from typing_extensions import Self

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from kubeflow_trainer_api.models.scheduling_v1beta1_network_topology_spec import SchedulingV1beta1NetworkTopologySpec

from typing import Optional, Set

from typing_extensions import Self

from kubeflow_trainer_api.models.scheduling_v1beta1_network_topology_spec import SchedulingV1beta1NetworkTopologySpec

from typing import Set

from typing_extensions import Self

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 23-23: redefinition of unused 'Optional' from line 21

(F811)

🪛 Ruff (0.14.4)

23-23: Redefinition of unused Optional from line 21

Remove definition: Optional

(F811)

🤖 Prompt for AI Agents

In api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_volcano_pod_group_policy_source.py around lines 22 to 24, the second import reintroduces Optional which is already imported earlier; remove the duplicate by changing the line to only import the remaining needed symbol(s) (e.g., replace "from typing import Optional, Set" with "from typing import Set") or delete the redundant Optional so Optional is imported only once.

coderabbitai · 2025-11-11T06:21:21Z

+### Kubeflow Enhancement Proposal (KEP)

-Apply the job:
+For any significant features or enhancement for Kubeflow Trainer project we follow the


⚠️ Potential issue | 🟡 Minor

Add missing comma in sentence.

Line 114 reads: "...enhancement for Kubeflow Trainer project we follow the..." which needs a comma after "project".

Apply this diff:

-For any significant features or enhancement for Kubeflow Trainer project we follow the +For any significant features or enhancement for Kubeflow Trainer project, we follow the

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

For any significant features or enhancement for Kubeflow Trainer project we follow the

For any significant features or enhancement for Kubeflow Trainer project, we follow the

🧰 Tools

🪛 LanguageTool

[uncategorized] ~114-~114: Possible missing comma found.
Context: ...res or enhancement for Kubeflow Trainer project we follow the [Kubeflow Enhancement Pro...

(AI_HYDRA_LEO_MISSING_COMMA)

🤖 Prompt for AI Agents

CONTRIBUTING.md around line 114: the sentence "For any significant features or enhancement for Kubeflow Trainer project we follow the" is missing a comma after "project"; insert a comma immediately after "project" so it reads "...Kubeflow Trainer project, we follow the" to correct the punctuation.

coderabbitai · 2025-11-11T06:21:21Z

 Changes to the Kubeflow Trainer Python SDK can be made in the https://github.com/kubeflow/sdk repo.

-The Trainer SDK can be found at https://github.com/kubeflow/sdk/tree/main/python/kubeflow/trainer.
+The Trainer SDK can be found at https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer.


⚠️ Potential issue | 🟡 Minor

Convert bare URLs to markdown links.

Lines 136 and 138 contain bare URLs that should be formatted as markdown links for consistency and improved readability.

Apply this diff:

-The Trainer SDK can be found at https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer. +The Trainer SDK can be found at [https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer](https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer).

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

136-136: Bare URL used

(MD034, no-bare-urls)

138-138: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents

In CONTRIBUTING.md around lines 136 to 138, the two bare URLs should be converted to markdown links; replace the plain URLs with descriptive markdown-style links (for example, link the repo URL using text like "Kubeflow SDK repo" and link the trainer path using text like "Trainer SDK") so both lines read with [Link Text](URL) syntax for consistency and readability.

coderabbitai · 2025-11-11T06:21:21Z

+## Reporting a Vulnerability
+
+We're extremely grateful for security researchers and users that report vulnerabilities to the
+Kubeflow Open Source Community. All reports are thoroughly investigated by Kubeflow projects owners.


⚠️ Potential issue | 🟡 Minor

Use hyphen for compound adjective.

"Kubeflow Open Source Community" should be "Kubeflow Open-Source Community" when used as a compound adjective before a noun.

Apply this diff:

-We're extremely grateful for security researchers and users that report vulnerabilities to the Kubeflow Open Source Community. All reports are thoroughly investigated by Kubeflow projects owners. +We're extremely grateful for security researchers and users that report vulnerabilities to the Kubeflow Open-Source Community. All reports are thoroughly investigated by Kubeflow projects owners.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Kubeflow Open Source Community. All reports are thoroughly investigated by Kubeflow projects owners.

Kubeflow Open-Source Community. All reports are thoroughly investigated by Kubeflow projects owners.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~19-~19: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... report vulnerabilities to the Kubeflow Open Source Community. All reports are thoroughly i...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🤖 Prompt for AI Agents

In SECURITY.md around line 19, the phrase "Kubeflow Open Source Community" used as a compound adjective should be hyphenated; update the text on line 19 to read "Kubeflow Open-Source Community" (preserve capitalization and surrounding punctuation) so the compound adjective is correct.

coderabbitai · 2025-11-11T06:21:21Z

+
+## Communication Channels
+
+For the general questions please join the following resources:


⚠️ Potential issue | 🟡 Minor

Add comma for readability.

Line 58 needs a comma after "For the general questions".

Apply this diff:

-For the general questions please join the following resources: +For general questions, please join the following resources:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

For the general questions please join the following resources:

For general questions, please join the following resources:

🧰 Tools

🪛 LanguageTool

[typographical] ~58-~58: Consider adding a comma here.
Context: ...tion Channels For the general questions please join the following resources: - Kubefl...

(PLEASE_COMMA)

🤖 Prompt for AI Agents

In SECURITY.md around line 58, the sentence "For the general questions please join the following resources:" is missing a comma; add a comma after "For the general questions" so it reads "For the general questions, please join the following resources:" to improve readability.

efazal · 2025-11-11T12:11:45Z

/lgtm

kramaranya and others added 30 commits June 11, 2025 08:18

fix(rbac): Add required RBAC to update ClusterTrainingRuntimes on Ope…

29ffc84

…nShift (kubeflow#2682) Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

chore(helm): Sync ClusterRule in Helm chart (kubeflow#2686)

582025b

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

chore: Remove the vendor specific parameters (kubeflow#2691)

c17a162

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

chore: Replace the deprecated intstr.FromInt with intstr.FromInt32 (k…

4d47314

…ubeflow#2695) Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

chore: Copy generated CRDs into Helm charts (kubeflow#2703)

b9c6635

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

chore(docs): Update Release Guide (kubeflow#2710)

f0d7ea3

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Add Red Hat to ADOPTERS.md (kubeflow#2714)

bc9365b

* Add Red Hat to ADOPTERS.md Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Update ADOPTERS.md Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

fix(manifests): fix position of labels of dataset-initializer from po…

ed5f859

…d to job (kubeflow#2719) Signed-off-by: rudeigerc <rudeigerc@gmail.com>

chore: Upgrade JobSet to version 0.8.2 (kubeflow#2726)

cf3b420

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

feat(docs): Guide to report security vulnerability (kubeflow#2718)

2ff9aa6

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

fix(manifests): add rbac config of events for event recorders (kubefl…

857e7cb

…ow#2731) Signed-off-by: rudeigerc <rudeigerc@gmail.com>

fix: fix the command for fetching Kubeflow Trainer version in the iss…

853226a

…ue template (kubeflow#2732) Signed-off-by: rudeigerc <rudeigerc@gmail.com>

feat(operator): force trainjob name to be compliant with RFC 1035 for…

705cdbf

… jobset (kubeflow#2734) Signed-off-by: rudeigerc <rudeigerc@gmail.com>

chore: update github runners to oci gh arc runners (kubeflow#2739)

f8d71ec

Signed-off-by: Koray Oksay <koray.oksay@gmail.com>

Upgrade Kubernetes to v1.33 (kubeflow#2756)

45ce64f

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

fix(test): Fix Ginkgo command for integration tests (kubeflow#2758)

8251450

Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>

feat(runtimes): Remove command from the Runtimes with CustomTrainer (k…

66da5ea

…ubeflow#2754) Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

google-oss-robot and others added 14 commits October 31, 2025 16:48

fix(runtimes): Update pip version in the MLX runtime (kubeflow#2910)

a4b50ba

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

[release-2.1] fix(manifests): Fix boolean values defaulting in Helm c…

cba6e30

…harts (kubeflow#2914) Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr> Co-authored-by: Antonin Stefanutti <antonin@stefanutti.fr>

Kubeflow Trainer Official Release v2.1.0-rc.1

3a71dd0

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

[release-2.1] feat(cache): KEP-2655 - Supporting readiness probes on …

3be3331

…cache nodes (kubeflow#2920) Signed-off-by: Akshay Chitneni <achitneni@apple.com> Co-authored-by: Akshay Chitneni <achitneni@apple.com>

[release-2.1] feat(cache): KEP-2655: Adding default runtime with cach…

fa39664

…e and example (kubeflow#2928) Signed-off-by: Akshay Chitneni <achitneni@apple.com> Co-authored-by: Akshay Chitneni <achitneni@apple.com>

Kubeflow Trainer Official Release v2.1.0

73c9bec

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

CARRY: Fix pre-commit PR check failures

e78979d

Add RHOAI manifests overlay for Trainer V2

827368b

- Add RHOAI specific Dockerfile for Trainer V2 controller image - Add RHOAI overlay manifests for Trainer V2 - Add custom training runtimes in rhoai overlay

abhijeet-dhumal requested review from ChughShilpa, kapil27 and sutaakar November 11, 2025 05:21

Update ODH Trainer operator image

c156297

abhijeet-dhumal force-pushed the v2.1.0-odh-rebase branch from 2708e8f to c156297 Compare November 11, 2025 06:03

abhijeet-dhumal marked this pull request as ready for review November 11, 2025 06:03

abhijeet-dhumal requested review from efazal and robert-bell November 11, 2025 06:04

abhijeet-dhumal mentioned this pull request Nov 11, 2025

[WIP] feat: Implement controller support for polling and tracking training job progression #17

Closed

1 task

coderabbitai Bot reviewed Nov 11, 2025

View reviewed changes

kapil27 approved these changes Nov 11, 2025

View reviewed changes

sutaakar approved these changes Nov 11, 2025

View reviewed changes

kapil27 deleted the branch opendatahub-io:main November 12, 2025 17:14

kapil27 closed this Nov 12, 2025

	For any significant features or enhancement for Kubeflow Trainer project we follow the
	For any significant features or enhancement for Kubeflow Trainer project, we follow the

	Kubeflow Open Source Community. All reports are thoroughly investigated by Kubeflow projects owners.
	Kubeflow Open-Source Community. All reports are thoroughly investigated by Kubeflow projects owners.


		## Communication Channels

		For the general questions please join the following resources:

	For the general questions please join the following resources:
	For general questions, please join the following resources:

Conversation

abhijeet-dhumal commented Nov 11, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

efazal commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

abhijeet-dhumal commented Nov 11, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 11, 2025 •

edited

Loading