feat: move OpenShift Route and CRB management into kserve behind distro build tags#1404
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: VedantMahabaleshwarkar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds distro-gated OpenShift support for InferenceService: new RawOCPRouteReconciler (distro) to manage Route lifecycle (exposed vs cluster-local, auth-driven TLS/port, admission handling) and a distro resolver to select that reconciler. Introduces build-tagged hooks to register Route scheme and Owns Routes, and platform-permissions reconciliation that creates/updates/deletes system:auth-delegator ClusterRoleBindings per InferenceService auth annotation. Removes shared Route URL utility, shifts ingress URL generation to ingress-config-driven logic, adds kubebuilder RBAC markers and a generated ClusterRole for distro, and introduces many distro/non-distro test helpers and Makefile RBAC generation changes. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (9)
config/overlays/odh/rbac/inferenceservice/role.yaml (1)
6-18: Broadclusterrolebindingspermissions warrant documentation.Full CRUD on
clusterrolebindingswithoutresourceNamesrestriction grants broad cluster-wide RBAC manipulation capability. While this is necessary for dynamic CRB management per-ISVC (auth-delegator pattern), consider:
- Documenting the security rationale in the generated role or an accompanying ADR
- The controller SA with these permissions becomes a high-value target; ensure it runs with minimal attack surface
No
bindorescalateverbs present, so direct privilege escalation to arbitrary roles isn't possible, but a compromised controller could still manipulate existing CRBs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/overlays/odh/rbac/inferenceservice/role.yaml` around lines 6 - 18, The role grants full CRUD on clusterrolebindings (rules -> resources: clusterrolebindings with verbs: create, delete, get, list, patch, update, watch), which is broad and should be documented and risk-mitigated; update the role manifest by adding a clear security rationale (e.g., as an annotation on the Role/ClusterRole or a linked ADR) explaining why dynamic CRB management is required for the inferenceservice controller, and harden the controller service account by reducing scope where possible (restrict verbs and/or add resourceNames if you can target specific CRBs, ensure the controller runs with minimal RBAC/Node capabilities, and note these mitigations in the same annotation or ADR).pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go (2)
199-405: Large inline expected Deployment spec reduces test maintainability.This 200+ line struct literal is difficult to review and update. Consider extracting a helper function or using a fixture file. Changes to deployment defaults will require updating this block.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go` around lines 199 - 405, The test embeds a huge inline Deployment struct in expectedDeployment which harms maintainability; extract that literal into a helper (e.g., makeExpectedDeployment or loadExpectedDeploymentFixture) that accepts the dynamic inputs used in the test (predictorDeploymentKey, serviceKey, isvc, replicas, defaultResource, gracePeriod, revisionHistory, progressDeadlineSeconds) and returns *appsv1.Deployment, then replace the inline expectedDeployment with a call to that helper or load from a small YAML/JSON fixture; ensure the helper constructs the same fields referenced in assertions (metadata name/namespace, pod template labels/annotations, containers including InferenceServiceContainerName and KubeRbacContainerName, volumes, probes, resources, and Strategy) so future changes to defaults can be updated in one place.
104-105: Inconsistent context usage:context.TODO()vscontext.Background().The test defines
ctx := context.Background()at line 145, but usescontext.TODO()at lines 104, 139, 193, 417, 461, 502, 526, 562, 617, 636, 656. Pick one for consistency—ctxis already scoped correctly.Also applies to: 139-140, 193-194
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go` around lines 104 - 105, Replace all uses of context.TODO() in this test with the existing ctx variable (context.Background()) to ensure consistent context usage; for example change k8sClient.Create(context.TODO(), configMap) and k8sClient.Delete(context.TODO(), configMap) as well as other k8sClient calls that pass context.TODO() to use ctx instead, and update any helper calls within the test that currently pass context.TODO() to accept ctx so the same context is used throughout.pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp_test.go (2)
36-41: Scheme registration errors silently ignored (same pattern as route reconciler tests).Handle or panic on scheme errors
func permTestScheme() *runtime.Scheme { s := runtime.NewScheme() - _ = v1beta1.AddToScheme(s) - _ = rbacv1.AddToScheme(s) + if err := v1beta1.AddToScheme(s); err != nil { + panic(fmt.Sprintf("failed to add v1beta1 to scheme: %v", err)) + } + if err := rbacv1.AddToScheme(s); err != nil { + panic(fmt.Sprintf("failed to add rbacv1 to scheme: %v", err)) + } return s }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp_test.go` around lines 36 - 41, permTestScheme currently ignores errors from runtime.NewScheme() registrations; update it to handle or fail fast on registration errors by checking the returned errors from v1beta1.AddToScheme(s) and rbacv1.AddToScheme(s) inside permTestScheme(), and either return the error to the caller or panic/log.Fatalf with a clear message including the failing function name (e.g., v1beta1.AddToScheme, rbacv1.AddToScheme) so scheme registration failures are not silently suppressed.
108-134: Missing test for RoleRef change scenario.
TestReconcilePlatformPermissions_AuthEnabled_CRBDiffersonly tests subject drift. The production code atplatform_permissions_ocp.go:99-107has a distinct delete+recreate path whenRoleRefdiffers. Add a test case where the existing CRB has a different RoleRef to cover that branch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp_test.go` around lines 108 - 134, Add a new case to TestReconcilePlatformPermissions_AuthEnabled_CRBDiffers that simulates a CRB with a differing RoleRef: create existingCRB with the same Name ("ns1-default-auth-delegator") but a RoleRef.Name that is NOT "system:auth-delegator" (and any Subjects), call reconcilePlatformPermissions, then fetch the CRB and assert the RoleRef was replaced to the expected RoleRef (APIGroup "rbac.authorization.k8s.io", Kind "ClusterRole", Name "system:auth-delegator") and that Subjects/other expected fields were updated; this will exercise the delete+recreate path in platform_permissions_ocp.go (around the RoleRef diff logic).pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.go (2)
161-173: Subject comparison is order-sensitive; may cause unnecessary updates.
equalCRBcompares subjects positionally. If an external actor reorders theSubjectsslice (same content, different order), this returnsfalseand triggers an update. While currently only one subject is added, this could cause reconciliation churn if the pattern changes.Consider sorting or using a set-based comparison
func equalCRB(a, b *rbacv1.ClusterRoleBinding) bool { if a.RoleRef != b.RoleRef { return false } if len(a.Subjects) != len(b.Subjects) { return false } + // For single-subject case this is fine, but for future-proofing: for i := range a.Subjects { if a.Subjects[i] != b.Subjects[i] { return false } } return true }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.go` around lines 161 - 173, equalCRB currently does an order-sensitive, positional comparison of ClusterRoleBinding Subjects which can trigger needless updates if the slice is reordered; modify equalCRB to compare Subjects in an order-insensitive way by either sorting the Subjects before comparison or building a set/map keyed by stable identity (e.g., Subject.Kind + Subject.APIGroup + Subject.Namespace + Subject.Name) and comparing entries, while still comparing RoleRef as before; update equalCRB to use that set-based/sorted comparison so identical subject lists in different orders are treated as equal.
99-107: Delete+Create race window for RoleRef changes.Between
DeleteandCreate, the CRB doesn't exist. If the controller restarts or another reconciliation runs concurrently, the auth-delegator permission is temporarily missing. Consider using a finalizer or ensuring idempotency through naming.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.go` around lines 99 - 107, The current logic deletes and recreates the ClusterRoleBinding when existing.RoleRef != desired.RoleRef which creates a race window (crbName, cl.Delete, cl.Create, existing.RoleRef, desired.RoleRef); instead, modify the reconcile to perform an in-place update (fetch existing, deep-copy, set existing.RoleRef = desired.RoleRef and call cl.Update or use a server-side apply/patch) wrapped in a retry-on-conflict loop (e.g., RetryOnConflict) so concurrent reconciles safely reconcile RoleRef without removing the binding; handle update errors and fall back to recreate only if update consistently fails after retries.pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go (1)
53-54: Unused parameter_ *v1beta1.InferenceServicesuggests potential signature drift.The OCP counterpart at
rawkube_platform_ocp_test.go:54also has this unused parameter. If neither build tag variant uses it, consider removing it from both to keep the interface minimal. If the OCP variant needs it in the future, the signature divergence will cause build failures in the other mode.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go` around lines 53 - 54, The test helper assertPlatformIngressSpec currently declares an unused parameter `_ *v1beta1.InferenceService`; remove that unused parameter from the function signature of assertPlatformIngressSpec and from its counterpart in the OCP variant so both build-tag variants stay in sync, then update any call sites to stop passing the extra argument (or alternatively, if the InferenceService will be needed, use the parameter inside assertPlatformIngressSpec in both variants); reference the symbol assertPlatformIngressSpec and the unused parameter `_ *v1beta1.InferenceService` when making the change.pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler_test.go (1)
173-199: TestTestRawOCPRoute_ExposedWithAuthpre-creates route with wrong TLS termination.Line 180 sets
route.Spec.TLS.Termination = routev1.TLSTerminationReencrypt, butadmittedRoutehelper already setsTLSTerminationEdgeat line 126. The test doesn't verify reconciler corrects the TLS termination—it only checks ISVC status. Consider asserting the route spec is updated to match auth-enabled expectations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler_test.go` around lines 173 - 199, The test TestRawOCPRoute_ExposedWithAuth pre-sets the route TLS termination to routev1.TLSTerminationReencrypt even though admittedRoute already sets TLSTerminationEdge, and the test never asserts that the reconciler updates the Route resource to the expected TLS when auth is enabled; update the test to (1) ensure the initial route state is clear (use admittedRoute as-is or explicitly set TLS to the scenario you want), (2) after calling newOCPReconciler(...).Reconcile(...), fetch or inspect the reconciler-updated Route object and assert route.Spec.TLS.Termination equals the expected value for auth-enabled flows, and (3) keep the existing ISVC status assertions (isvc.Status.URL and isvc.Status.Address checks) while adding the new assertion referencing admittedRoute, TestRawOCPRoute_ExposedWithAuth, route.Spec.TLS.Termination, newOCPReconciler, and constants.OauthProxyPort so the test verifies the Route is corrected by the reconciler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@charts/kserve-resources/files/kserve/resources.yaml`:
- Around line 234-245: The kserve-manager-role in files/kserve/resources.yaml
grants unconstrained CRUD on clusterrolebindings to all deployments even though
reconcilePlatformPermissions (implemented as a no-op in
platform_permissions_default.go for non-distro builds) never uses it; remove or
constrain that rule: either delete the clusterrolebindings rule from
files/kserve/resources.yaml and let distro overlays add it, or modify the rule
to restrict resourceNames to kserve-inferencegraph-auth-verifiers (matching the
existing constrained rule at lines 246–256), or add a conditional patch to strip
the rule for non-distro builds so only distro deployments retain the elevated
RBAC.
In `@config/rbac/role.yaml`:
- Around line 176-187: The ClusterRole currently grants broad write verbs on the
cluster-wide resource "clusterrolebindings" (see the resources: -
clusterrolebindings and verbs: - create - patch - update - delete), which allows
ServiceAccount compromise to escalate to cluster-admin; restrict this by
removing write verbs for "clusterrolebindings" from this base ClusterRole,
replace them with either (a) no access here and move any needed creation/update
of ClusterRoleBindings into a distro-specific overlay
ClusterRole/ClusterRoleBinding, or (b) if absolutely required, tighten to
resourceNames to only specific binding names and/or change to a namespaced
Role/RoleBinding where possible; update the resources/verbs block for the
functions that reference clusterrolebindings and ensure tests/overlays are
updated accordingly.
In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`:
- Around line 37922-37933: The RBAC rule currently grants broad write access to
ClusterRoleBinding objects (resource "clusterrolebindings" with verbs
create/update/delete), enabling privilege escalation; restrict it by removing
create/update/delete and instead allow only the minimal verbs (e.g.,
get/list/watch and only patch if absolutely required) or restrict write
operations to specific resourceNames (explicit ClusterRoleBinding names) or
namespace-scoped RoleBinding resources used by the controller. Locate the RBAC
rule referencing "clusterrolebindings" in the manifest and modify it to (a) drop
create/delete/update verbs, or (b) add a resourceNames list limiting which
ClusterRoleBinding(s) can be modified, or (c) change the resource to
"rolebindings" scoped to the controller's namespace if cluster-wide binding is
not needed; ensure any remaining patch/write permission is justified and
documented.
In
`@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`:
- Around line 37508-37519: The current RBAC rule grants broad mutation rights on
the "clusterrolebindings" resource (verbs: create, delete, patch, update) which
allows a compromised installer/service account to tamper with cluster-wide
bindings; restrict this by removing destructive verbs (delete, patch, update) or
scoping resourceNames to only the KServe-managed ClusterRoleBindings (e.g.,
names prefixed with your controller's name), or replace the ClusterRole rule
with a Role that targets only the namespace and RoleBindings if possible; locate
the RBAC block that lists "apiGroups: - rbac.authorization.k8s.io" and
"resources: - clusterrolebindings" and change the verbs/resourceNames
accordingly to limit mutation to only the specific KServe clusterrolebinding
names or drop mutation entirely.
In `@pkg/controller/v1beta1/inferenceservice/controller.go`:
- Around line 393-397: When reconcilePlatformPermissions fails inside the
deploymentMode branch (when deploymentMode == constants.Standard ||
deploymentMode == constants.LegacyRawDeployment), call r.updateStatus(ctx, isvc,
deploymentMode) to persist the latest status before returning the wrapped error;
i.e., update the status in the same way other error paths do (see ingress and
modelconfig error handling) and then return reconcile.Result{},
errors.Wrapf(err, "fails to reconcile platform permissions") so transient
CRB/permission failures don't leave stale status on the InferenceService.
In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.go`:
- Around line 52-54: The crbName built as isvc.Namespace + "-" + saName +
"-auth-delegator can collide (e.g. "a"+"b-c" vs "a-b"+"c"); replace this plain
concatenation in the CRB name construction (the crbName variable that uses
isvc.Namespace and saName) with a deterministic, safe name: keep a short
readable prefix (e.g., isvc.Name or "isvc-auth-delegator") and append a hash
(e.g., SHA256/MD5 truncated hex) computed over the tuple isvc.Namespace + "|" +
saName (the pipe is only for hashing, not part of the final name) so the final
crbName is deterministic, unique, and avoids separator ambiguity while
respecting k8s name length limits.
In `@pkg/controller/v1beta1/inferenceservice/rawkube_controller_test.go`:
- Line 1753: Tests now skip validating Route creation/spec/deletion and external
status URL because OCP helpers (used by assertPlatformIngressExists and related
helpers) are no-ops; update the OCP-specific helpers in
pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go so that
assertPlatformIngressExists, assertPlatformIngressDeleted, and
expectedIsvcStatusURL actually verify routev1.Route existence, its spec
(host/target), deletion, and return the external Route host instead of
cluster-local host, or alternatively add distro-only assertions in the
rawkube_controller_test helpers to explicitly assert route
creation/spec/deletion and the Route-backed status URL for RawOCPRouteReconciler
scenarios (reference the helper names assertPlatformIngressExists,
assertPlatformIngressDeleted, expectedIsvcStatusURL and the
RawOCPRouteReconciler behavior).
In
`@pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.go`:
- Around line 86-102: When forceStopRuntime is true we delete the Route and set
IngressReady false but never clear isvc.Status.URL, leaving a stale host; inside
the forceStopRuntime block (after r.client.Delete and before returning) clear
the URL on the InferenceService status (set isvc.Status.URL to empty/nil) and
persist the change via the status client (r.client.Status().Update(ctx, isvc) or
the project’s status update helper) so callers that only inspect isvc.Status.URL
don’t see the old Route host; ensure this happens even when the Route was
already not found and keep the existing condition update (IngressReady /
StoppedISVCReason).
- Around line 317-334: The setRouteTargetPort function silently falls back to
the first port when no port is named "http"/"https", which can misroute traffic;
update setRouteTargetPort to (1) if svc.Spec.Ports is empty return the existing
error, (2) if there is exactly one port return it but emit a warning log
mentioning svc.Namespace/svc.Name and the chosen port, and (3) if there are
multiple ports and none match desiredName return an explicit error instead of
silently picking the first. Use the reconciler's logger (or existing logging
facility) to record the warning and reference setRouteTargetPort,
svc.Spec.Ports, and desiredName when implementing the change.
- Around line 202-228: In setAddress, the ClusterIPNone port append is lost when
authEnabled is true because host is recomputed and overwrites the earlier
change; fix by using a single baseHost (getRawServiceHost(isvc)) and conditional
logic: detect entryPointSvc.Spec.ClusterIP == corev1.ClusterIPNone first, then
set host based on authEnabled — if authEnabled set host = baseHost + ":" +
strconv.Itoa(constants.OauthProxyPort) and addrScheme = "https", else if
ClusterIPNone set host = baseHost + ":" +
constants.InferenceServiceDefaultHttpPort and addrScheme = "http"; update
isvc.Status.Address accordingly in setAddress so the ClusterIPNone handling
(constants.InferenceServiceDefaultHttpPort) is not overridden by the authEnabled
branch.
- Around line 362-366: semanticRouteEquals currently DeepEqual's full
Annotations which causes reconciliation loops because OpenShift's router adds
unmanaged annotations; change equality logic to compare only managed annotation
keys (the ones set by buildDesiredRoute such as
"haproxy.router.openshift.io/timeout") by filtering both desired.Annotations and
existing.Annotations to that managed-key subset before comparing. Update
semanticRouteEquals to use a helper that extracts only managed keys from the
annotation maps and compare those, and ensure code that applies annotations
(where existingRoute.Annotations is overwritten) only sets the managed keys
instead of replacing the entire map.
---
Nitpick comments:
In `@config/overlays/odh/rbac/inferenceservice/role.yaml`:
- Around line 6-18: The role grants full CRUD on clusterrolebindings (rules ->
resources: clusterrolebindings with verbs: create, delete, get, list, patch,
update, watch), which is broad and should be documented and risk-mitigated;
update the role manifest by adding a clear security rationale (e.g., as an
annotation on the Role/ClusterRole or a linked ADR) explaining why dynamic CRB
management is required for the inferenceservice controller, and harden the
controller service account by reducing scope where possible (restrict verbs
and/or add resourceNames if you can target specific CRBs, ensure the controller
runs with minimal RBAC/Node capabilities, and note these mitigations in the same
annotation or ADR).
In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp_test.go`:
- Around line 36-41: permTestScheme currently ignores errors from
runtime.NewScheme() registrations; update it to handle or fail fast on
registration errors by checking the returned errors from v1beta1.AddToScheme(s)
and rbacv1.AddToScheme(s) inside permTestScheme(), and either return the error
to the caller or panic/log.Fatalf with a clear message including the failing
function name (e.g., v1beta1.AddToScheme, rbacv1.AddToScheme) so scheme
registration failures are not silently suppressed.
- Around line 108-134: Add a new case to
TestReconcilePlatformPermissions_AuthEnabled_CRBDiffers that simulates a CRB
with a differing RoleRef: create existingCRB with the same Name
("ns1-default-auth-delegator") but a RoleRef.Name that is NOT
"system:auth-delegator" (and any Subjects), call reconcilePlatformPermissions,
then fetch the CRB and assert the RoleRef was replaced to the expected RoleRef
(APIGroup "rbac.authorization.k8s.io", Kind "ClusterRole", Name
"system:auth-delegator") and that Subjects/other expected fields were updated;
this will exercise the delete+recreate path in platform_permissions_ocp.go
(around the RoleRef diff logic).
In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.go`:
- Around line 161-173: equalCRB currently does an order-sensitive, positional
comparison of ClusterRoleBinding Subjects which can trigger needless updates if
the slice is reordered; modify equalCRB to compare Subjects in an
order-insensitive way by either sorting the Subjects before comparison or
building a set/map keyed by stable identity (e.g., Subject.Kind +
Subject.APIGroup + Subject.Namespace + Subject.Name) and comparing entries,
while still comparing RoleRef as before; update equalCRB to use that
set-based/sorted comparison so identical subject lists in different orders are
treated as equal.
- Around line 99-107: The current logic deletes and recreates the
ClusterRoleBinding when existing.RoleRef != desired.RoleRef which creates a race
window (crbName, cl.Delete, cl.Create, existing.RoleRef, desired.RoleRef);
instead, modify the reconcile to perform an in-place update (fetch existing,
deep-copy, set existing.RoleRef = desired.RoleRef and call cl.Update or use a
server-side apply/patch) wrapped in a retry-on-conflict loop (e.g.,
RetryOnConflict) so concurrent reconciles safely reconcile RoleRef without
removing the binding; handle update errors and fall back to recreate only if
update consistently fails after retries.
In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go`:
- Around line 53-54: The test helper assertPlatformIngressSpec currently
declares an unused parameter `_ *v1beta1.InferenceService`; remove that unused
parameter from the function signature of assertPlatformIngressSpec and from its
counterpart in the OCP variant so both build-tag variants stay in sync, then
update any call sites to stop passing the extra argument (or alternatively, if
the InferenceService will be needed, use the parameter inside
assertPlatformIngressSpec in both variants); reference the symbol
assertPlatformIngressSpec and the unused parameter `_ *v1beta1.InferenceService`
when making the change.
In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go`:
- Around line 199-405: The test embeds a huge inline Deployment struct in
expectedDeployment which harms maintainability; extract that literal into a
helper (e.g., makeExpectedDeployment or loadExpectedDeploymentFixture) that
accepts the dynamic inputs used in the test (predictorDeploymentKey, serviceKey,
isvc, replicas, defaultResource, gracePeriod, revisionHistory,
progressDeadlineSeconds) and returns *appsv1.Deployment, then replace the inline
expectedDeployment with a call to that helper or load from a small YAML/JSON
fixture; ensure the helper constructs the same fields referenced in assertions
(metadata name/namespace, pod template labels/annotations, containers including
InferenceServiceContainerName and KubeRbacContainerName, volumes, probes,
resources, and Strategy) so future changes to defaults can be updated in one
place.
- Around line 104-105: Replace all uses of context.TODO() in this test with the
existing ctx variable (context.Background()) to ensure consistent context usage;
for example change k8sClient.Create(context.TODO(), configMap) and
k8sClient.Delete(context.TODO(), configMap) as well as other k8sClient calls
that pass context.TODO() to use ctx instead, and update any helper calls within
the test that currently pass context.TODO() to accept ctx so the same context is
used throughout.
In
`@pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler_test.go`:
- Around line 173-199: The test TestRawOCPRoute_ExposedWithAuth pre-sets the
route TLS termination to routev1.TLSTerminationReencrypt even though
admittedRoute already sets TLSTerminationEdge, and the test never asserts that
the reconciler updates the Route resource to the expected TLS when auth is
enabled; update the test to (1) ensure the initial route state is clear (use
admittedRoute as-is or explicitly set TLS to the scenario you want), (2) after
calling newOCPReconciler(...).Reconcile(...), fetch or inspect the
reconciler-updated Route object and assert route.Spec.TLS.Termination equals the
expected value for auth-enabled flows, and (3) keep the existing ISVC status
assertions (isvc.Status.URL and isvc.Status.Address checks) while adding the new
assertion referencing admittedRoute, TestRawOCPRoute_ExposedWithAuth,
route.Spec.TLS.Termination, newOCPReconciler, and constants.OauthProxyPort so
the test verifies the Route is corrected by the reconciler.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 66777641-ee1a-43bc-a102-9ba618bce980
📒 Files selected for processing (24)
Makefile.overrides.mkcharts/kserve-resources/files/kserve/resources.yamlconfig/overlays/odh/rbac/inferenceservice/role.yamlconfig/rbac/role.yamldocs/superpowers/plans/2026-04-13-route-crb-distro-hooks-progress.mdhack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.shhack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.shpkg/controller/v1beta1/inferenceservice/controller.gopkg/controller/v1beta1/inferenceservice/controller_setup_default.gopkg/controller/v1beta1/inferenceservice/controller_setup_ocp.gopkg/controller/v1beta1/inferenceservice/distro/controller_rbac_ocp.gopkg/controller/v1beta1/inferenceservice/platform_permissions_default.gopkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.gopkg/controller/v1beta1/inferenceservice/platform_permissions_ocp_test.gopkg/controller/v1beta1/inferenceservice/rawkube_controller_test.gopkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.gopkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.gopkg/controller/v1beta1/inferenceservice/reconcilers/factory.gopkg/controller/v1beta1/inferenceservice/reconcilers/factory_default.gopkg/controller/v1beta1/inferenceservice/reconcilers/factory_ocp.gopkg/controller/v1beta1/inferenceservice/reconcilers/ingress/kube_ingress_reconciler.gopkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.gopkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler_test.gopkg/controller/v1beta1/inferenceservice/utils/utils.go
💤 Files with no reviewable changes (1)
- pkg/controller/v1beta1/inferenceservice/utils/utils.go
|
/group-test |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
pkg/controller/v1beta1/inferenceservice/controller.go (1)
746-748: Minor inconsistency:fmt.Errorfvserrors.Wrapf.This segment uses
fmt.Errorf("...: %w", err)while the rest of the file useserrors.Wrapf(err, "...")fromgithub.com/pkg/errors(e.g., lines 141, 374, 383). Both work, but mixing styles reduces grep-ability and stack trace consistency.Proposed alignment with existing style
if err := extendControllerSetup(mgr, ctrlBuilder); err != nil { - return fmt.Errorf("failed to extend controller setup: %w", err) + return errors.Wrap(err, "failed to extend controller setup") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/controller.go` around lines 746 - 748, Replace the fmt.Errorf usage with the project's standard errors.Wrapf style to keep error-wrapping and stack traces consistent; specifically update the return in the extendControllerSetup call to use errors.Wrapf(err, "failed to extend controller setup") (the call referencing extendControllerSetup, mgr and ctrlBuilder) and ensure github.com/pkg/errors is imported if not already.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.go`:
- Around line 79-84: When reconciling Routes (existingRoute retrieved via
r.client.Get for the InferenceService name), if a Route already exists but is
not owned by the current InferenceService (check owner references/ControllerRef
on existingRoute versus isvc UID), do not update or adopt it; instead reject the
reconcile by returning an error or requeue (e.g., fmt.Errorf("route %s/%s exists
and is not owned by InferenceService", existingRoute.Namespace,
existingRoute.Name)). Apply the same ownership guard wherever you create/update
Routes in this reconciler (the blocks around the initial Get and the
create/update logic referenced later in the file) so you only set
OwnerReferences or mutate Routes that are already controlled by the
InferenceService.
- Around line 126-136: When a Route is created or observed as not admitted (in
the branch handling routeNotFound and the similar branch around lines 154-161),
clear the stale external endpoint by resetting isvc.Status.URL to nil/empty
before setting the IngressReady condition and requeuing; update the code paths
around the creation logic that reference desiredRoute and the
isvc.Status.SetCondition(v1beta1.IngressReady, ...) call to explicitly remove
any previous isvc.Status.URL so the service stops publishing an out-of-date
ingress URL while the Route is not admitted.
- Around line 139-147: The code currently only merges desiredRoute.Annotations
into existingRoute.Annotations which means removed keys (e.g.
haproxy.router.openshift.io/timeout) never get deleted; update the
reconciliation in the block guarded by semanticRouteEquals (where
existingRoute.Spec/Labels are assigned) to fully reconcile annotations instead
of only merging: ensure existingRoute.Annotations contains exactly the
keys/values present in desiredRoute.Annotations for managed keys by removing any
keys present on existingRoute but absent from desiredRoute (or replace
existingRoute.Annotations with a shallow copy of desiredRoute.Annotations if no
other non-managed annotations must be preserved); apply the same change to the
other analogous reconciliation sites that use the same merge pattern.
---
Nitpick comments:
In `@pkg/controller/v1beta1/inferenceservice/controller.go`:
- Around line 746-748: Replace the fmt.Errorf usage with the project's standard
errors.Wrapf style to keep error-wrapping and stack traces consistent;
specifically update the return in the extendControllerSetup call to use
errors.Wrapf(err, "failed to extend controller setup") (the call referencing
extendControllerSetup, mgr and ctrlBuilder) and ensure github.com/pkg/errors is
imported if not already.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: e117d6be-d67b-425a-88a0-6f900926b51c
📒 Files selected for processing (3)
Makefilepkg/controller/v1beta1/inferenceservice/controller.gopkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.go
| existingRoute := &routev1.Route{} | ||
| getErr := r.client.Get(ctx, types.NamespacedName{Name: isvc.Name, Namespace: isvc.Namespace}, existingRoute) | ||
| routeNotFound := apierr.IsNotFound(getErr) | ||
| if getErr != nil && !routeNotFound { | ||
| return ctrl.Result{}, fmt.Errorf("failed to get existing Route: %w", getErr) | ||
| } |
There was a problem hiding this comment.
Guard against same-name Route takeover.
If a Route named isvc.Name already exists but is not controlled by this InferenceService, this code still updates it and can mark the ISVC ready from that foreign object's admission state. Reject the reconcile instead of mutating or trusting an unowned Route.
Suggested fix
existingRoute := &routev1.Route{}
getErr := r.client.Get(ctx, types.NamespacedName{Name: isvc.Name, Namespace: isvc.Namespace}, existingRoute)
routeNotFound := apierr.IsNotFound(getErr)
if getErr != nil && !routeNotFound {
return ctrl.Result{}, fmt.Errorf("failed to get existing Route: %w", getErr)
}
+if !routeNotFound {
+ controller := metav1.GetControllerOf(existingRoute)
+ if controller == nil || controller.UID != isvc.UID {
+ return ctrl.Result{}, fmt.Errorf(
+ "route %s/%s already exists and is not controlled by InferenceService %s/%s",
+ existingRoute.Namespace, existingRoute.Name, isvc.Namespace, isvc.Name,
+ )
+ }
+}As per coding guidelines, "Set OwnerReferences on all child resources".
Also applies to: 126-175
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.go`
around lines 79 - 84, When reconciling Routes (existingRoute retrieved via
r.client.Get for the InferenceService name), if a Route already exists but is
not owned by the current InferenceService (check owner references/ControllerRef
on existingRoute versus isvc UID), do not update or adopt it; instead reject the
reconcile by returning an error or requeue (e.g., fmt.Errorf("route %s/%s exists
and is not owned by InferenceService", existingRoute.Namespace,
existingRoute.Name)). Apply the same ownership guard wherever you create/update
Routes in this reconciler (the blocks around the initial Get and the
create/update logic referenced later in the file) so you only set
OwnerReferences or mutate Routes that are already controlled by the
InferenceService.
|
/retest-required |
|
/group-test |
KServe PR opendatahub-io/kserve#1404 moves OpenShift Route creation and system:auth-delegator ClusterRoleBinding management for raw InferenceServices into kserve behind distro build tags. This removes the now-redundant reconcilers from odh-model-controller to avoid dual-controller conflicts (infinite reconciliation loops from label fights) once both ship in the same ODH release. Removed: - KserveRawRouteReconciler and KserveRawClusterRoleBindingReconciler - Supporting handlers (RouteHandler, ClusterRoleBindingHandler) - Comparators (route, clusterrolebinding) - SetOpenshiftRouteTimeoutForIsvc utility - Integration tests for Route/CRB lifecycle - Owns watches for Route and ClusterRoleBinding - Orphaned constants (KserveNetworkVisibility, LabelEnableKserveRawRoute, KserveServiceAccountName, DefaultOpenshiftRouteTimeout, RouteTimeoutAnnotationKey) Retained: - RBAC markers for routes and clusterrolebindings (cleanup deferred) - Metrics, ServiceMonitor, Dashboard, and KEDA reconcilers - Finalizer and namespace cleanup orchestration - Scheme registration for routev1 Ref: RHOAIENG-56270 Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
7fff852 to
9e2dc5b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.go (1)
79-84:⚠️ Potential issue | 🟠 MajorRoute ownership not validated before update path (CWE-284: Improper Access Control).
When a Route already exists (line 139), the code updates it without verifying ownership. If a Route with the same name exists but is owned by another controller/resource, this reconciler will overwrite it, potentially disrupting other workloads.
Past review flagged this — still unaddressed.
Add ownership check before update
existingRoute := &routev1.Route{} getErr := r.client.Get(ctx, types.NamespacedName{Name: isvc.Name, Namespace: isvc.Namespace}, existingRoute) routeNotFound := apierr.IsNotFound(getErr) if getErr != nil && !routeNotFound { return ctrl.Result{}, fmt.Errorf("failed to get existing Route: %w", getErr) } +if !routeNotFound { + if controller := metav1.GetControllerOf(existingRoute); controller == nil || controller.UID != isvc.UID { + return ctrl.Result{}, fmt.Errorf("route %s/%s exists but is not owned by InferenceService %s", + existingRoute.Namespace, existingRoute.Name, isvc.Name) + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.go` around lines 79 - 84, The reconciler updates an existing Route (existingRoute) without verifying ownership, so before performing any update/patch (the code path after r.client.Get where existingRoute is modified) check that the Route is owned by the InferenceService (isvc) by inspecting existingRoute.OwnerReferences (or using metav1.IsControlledBy / controllerutil.ReferencesEqual) and matching the owner UID/Kind/Name to isvc; if the Route is not controlled by isvc, do not update it — instead return a clear error or requeue with no-op to avoid overwriting someone else’s resource. Ensure this ownership check occurs immediately after fetching existingRoute and before any r.client.Update / r.client.Patch calls.
🧹 Nitpick comments (4)
pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go (2)
407-409: Assertion message will show raw diff on failure.
cmp.Diffoutput is passed toEqual()matcher's diff message, but on equality the diff is empty string. Consider using a custom matcher or separate assertion for clearer failure output.Clearer assertion pattern
- Expect(actualDeployment.Spec).To(Equal(expectedDeployment.Spec), - cmp.Diff(expectedDeployment.Spec, actualDeployment.Spec, cmpopts.SortMaps(func(a, b string) bool { return a < b }))) + diff := cmp.Diff(expectedDeployment.Spec, actualDeployment.Spec, cmpopts.SortMaps(func(a, b string) bool { return a < b })) + Expect(diff).To(BeEmpty(), "Deployment spec mismatch")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go` around lines 407 - 409, The test currently calls Expect(actualDeployment.Spec).To(Equal(expectedDeployment.Spec), cmp.Diff(...)) which hides the diff on success; replace this with an assertion that explicitly checks the cmp.Diff result so failures show a meaningful diff: compute diff := cmp.Diff(expectedDeployment.Spec, actualDeployment.Spec, cmpopts.SortMaps(...)) and assert Expect(diff).To(BeEmpty(), "deployment spec mismatch:\n%s", diff) (or alternatively assert Expect(diff).To(Equal(""), ...)). Update references to actualDeployment.Spec, expectedDeployment.Spec, cmp.Diff, and cmpopts.SortMaps accordingly.
104-105: Inconsistent context usage.Line 104 uses
context.TODO()while the surrounding test usesctx := context.Background()(line 145). Use consistent context throughout for clearer test debugging.Use consistent context
- Expect(k8sClient.Create(context.TODO(), configMap)).NotTo(HaveOccurred()) - defer k8sClient.Delete(context.TODO(), configMap) + Expect(k8sClient.Create(ctx, configMap)).NotTo(HaveOccurred()) + defer k8sClient.Delete(ctx, configMap)Note:
ctxis declared later at line 145. Either move thectxdeclaration earlier or usecontext.Background()consistently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go` around lines 104 - 105, The test uses context.TODO() for k8sClient.Create/Delete but the rest of the test uses ctx := context.Background(); make this consistent by using the same ctx variable for those calls: either move the ctx := context.Background() declaration earlier so k8sClient.Create(context.TODO(), configMap) and defer k8sClient.Delete(...) become k8sClient.Create(ctx, configMap) and defer k8sClient.Delete(ctx, configMap), or replace the later ctx declaration with context.Background() and keep the original calls; update the k8sClient.Create and defer k8sClient.Delete invocations accordingly and ensure the unique identifiers k8sClient.Create, k8sClient.Delete and ctx are used consistently.pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.go (2)
161-173: Subject comparison is order-sensitive.
equalCRBcompares subjects by index position. Currently safe since each CRB has exactly one subject, but if multi-subject scenarios are introduced, this will cause unnecessary updates when subjects are reordered.Optional: Use set-based comparison for future-proofing
func equalCRB(a, b *rbacv1.ClusterRoleBinding) bool { if a.RoleRef != b.RoleRef { return false } if len(a.Subjects) != len(b.Subjects) { return false } - for i := range a.Subjects { - if a.Subjects[i] != b.Subjects[i] { + subjectSet := make(map[rbacv1.Subject]struct{}, len(a.Subjects)) + for _, s := range a.Subjects { + subjectSet[s] = struct{}{} + } + for _, s := range b.Subjects { + if _, ok := subjectSet[s]; !ok { return false } } return true }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.go` around lines 161 - 173, The equalCRB function currently compares rbacv1.ClusterRoleBinding.RoleRef by value and compares a.Subjects to b.Subjects by index, which is order-sensitive; change equalCRB to compare subjects as an order-independent set (e.g., build a map keyed by each Subject's canonical identity such as Kind/Name/Namespace/APIGroup, verify both maps have same size and same keys) so reorders don't trigger updates; keep the existing RoleRef comparison but replace the indexed loop over a.Subjects/b.Subjects with the set/map-based membership checks.
67-83: ClusterRoleBinding lacks OwnerReference — manual cleanup required on ISVC deletion.ClusterRoleBindings are cluster-scoped and cannot have namespace-scoped OwnerReferences. The current implementation relies on
otherISVCsNeedAuthCRBduring reconciliation, but if all ISVCs in a namespace are deleted simultaneously or the controller is unavailable, orphaned CRBs may persist.Consider adding a finalizer to the
InferenceServiceto ensure CRB cleanup during deletion, or document the orphan risk. As per coding guidelines, "Set OwnerReferences on all child resources" — this is a known limitation for cluster-scoped resources.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.go` around lines 67 - 83, The ClusterRoleBinding created in platform_permissions_ocp.go (constructed as desired with crbName, Subjects using saName and isvc.Namespace, RoleRef to system:auth-delegator) is cluster-scoped and cannot use namespace OwnerReferences, so add a deletion safeguard: implement a finalizer on the InferenceService (e.g., add/remove a finalizer string in the Reconcile flow) and update the deletion handling to explicitly delete the ClusterRoleBinding when the InferenceService is being deleted (use the same crbName lookup and ensure otherISVCsNeedAuthCRB is consulted before finalizer removal), or alternatively document the orphan risk; ensure you set/clear the finalizer in create/update paths and only remove it after successfully deleting the ClusterRoleBinding.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.go`:
- Around line 139-147: The update path currently only merges
desiredRoute.Annotations into existingRoute.Annotations which causes removed
annotations to persist; change the logic so existingRoute.Annotations are
synchronized bidirectionally with desiredRoute.Annotations (i.e., replace or
prune existing keys not present in desired and handle nil desired annotations by
clearing existing annotations), and update semanticRouteEquals to check for
exact equality of annotation sets (both presence and values) rather than only
ensuring desired keys exist in existing; use the existingRoute and desiredRoute
variables and the semanticRouteEquals function to locate and apply this fix.
- Around line 126-136: When the Route is created but not yet admitted
(routeNotFound branch creating desiredRoute) and when admission fails (the block
setting IngressReady to False with Reason "RouteNotAdmitted"), clear the stale
isvc.Status.URL before returning; update the same places where you set
isvc.Status.SetCondition(..., Reason: "RouteNotAdmitted") to also set
isvc.Status.URL = nil or zero-value so previously-admitted URLs aren't left
visible. Locate these changes around the route creation branch (references:
routeNotFound, desiredRoute, routeAdmissionRequeueDelay) and the
admission-failure branch (references: IngressReady condition, Reason
"RouteNotAdmitted") and ensure the status URL is cleared prior to returning with
the requeue result.
---
Duplicate comments:
In
`@pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.go`:
- Around line 79-84: The reconciler updates an existing Route (existingRoute)
without verifying ownership, so before performing any update/patch (the code
path after r.client.Get where existingRoute is modified) check that the Route is
owned by the InferenceService (isvc) by inspecting existingRoute.OwnerReferences
(or using metav1.IsControlledBy / controllerutil.ReferencesEqual) and matching
the owner UID/Kind/Name to isvc; if the Route is not controlled by isvc, do not
update it — instead return a clear error or requeue with no-op to avoid
overwriting someone else’s resource. Ensure this ownership check occurs
immediately after fetching existingRoute and before any r.client.Update /
r.client.Patch calls.
---
Nitpick comments:
In `@pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.go`:
- Around line 161-173: The equalCRB function currently compares
rbacv1.ClusterRoleBinding.RoleRef by value and compares a.Subjects to b.Subjects
by index, which is order-sensitive; change equalCRB to compare subjects as an
order-independent set (e.g., build a map keyed by each Subject's canonical
identity such as Kind/Name/Namespace/APIGroup, verify both maps have same size
and same keys) so reorders don't trigger updates; keep the existing RoleRef
comparison but replace the indexed loop over a.Subjects/b.Subjects with the
set/map-based membership checks.
- Around line 67-83: The ClusterRoleBinding created in
platform_permissions_ocp.go (constructed as desired with crbName, Subjects using
saName and isvc.Namespace, RoleRef to system:auth-delegator) is cluster-scoped
and cannot use namespace OwnerReferences, so add a deletion safeguard: implement
a finalizer on the InferenceService (e.g., add/remove a finalizer string in the
Reconcile flow) and update the deletion handling to explicitly delete the
ClusterRoleBinding when the InferenceService is being deleted (use the same
crbName lookup and ensure otherISVCsNeedAuthCRB is consulted before finalizer
removal), or alternatively document the orphan risk; ensure you set/clear the
finalizer in create/update paths and only remove it after successfully deleting
the ClusterRoleBinding.
In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go`:
- Around line 407-409: The test currently calls
Expect(actualDeployment.Spec).To(Equal(expectedDeployment.Spec), cmp.Diff(...))
which hides the diff on success; replace this with an assertion that explicitly
checks the cmp.Diff result so failures show a meaningful diff: compute diff :=
cmp.Diff(expectedDeployment.Spec, actualDeployment.Spec, cmpopts.SortMaps(...))
and assert Expect(diff).To(BeEmpty(), "deployment spec mismatch:\n%s", diff) (or
alternatively assert Expect(diff).To(Equal(""), ...)). Update references to
actualDeployment.Spec, expectedDeployment.Spec, cmp.Diff, and cmpopts.SortMaps
accordingly.
- Around line 104-105: The test uses context.TODO() for k8sClient.Create/Delete
but the rest of the test uses ctx := context.Background(); make this consistent
by using the same ctx variable for those calls: either move the ctx :=
context.Background() declaration earlier so k8sClient.Create(context.TODO(),
configMap) and defer k8sClient.Delete(...) become k8sClient.Create(ctx,
configMap) and defer k8sClient.Delete(ctx, configMap), or replace the later ctx
declaration with context.Background() and keep the original calls; update the
k8sClient.Create and defer k8sClient.Delete invocations accordingly and ensure
the unique identifiers k8sClient.Create, k8sClient.Delete and ctx are used
consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 1f512a26-c5ad-4c82-a070-975ebcfffaab
📒 Files selected for processing (21)
MakefileMakefile.overrides.mkconfig/overlays/odh/rbac/inferenceservice/role.yamldocs/superpowers/plans/2026-04-13-route-crb-distro-hooks-progress.mdpkg/controller/v1beta1/inferenceservice/controller.gopkg/controller/v1beta1/inferenceservice/controller_setup_default.gopkg/controller/v1beta1/inferenceservice/controller_setup_ocp.gopkg/controller/v1beta1/inferenceservice/distro/controller_rbac_ocp.gopkg/controller/v1beta1/inferenceservice/platform_permissions_default.gopkg/controller/v1beta1/inferenceservice/platform_permissions_ocp.gopkg/controller/v1beta1/inferenceservice/platform_permissions_ocp_test.gopkg/controller/v1beta1/inferenceservice/rawkube_controller_test.gopkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.gopkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.gopkg/controller/v1beta1/inferenceservice/reconcilers/factory.gopkg/controller/v1beta1/inferenceservice/reconcilers/factory_default.gopkg/controller/v1beta1/inferenceservice/reconcilers/factory_ocp.gopkg/controller/v1beta1/inferenceservice/reconcilers/ingress/kube_ingress_reconciler.gopkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler.gopkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler_test.gopkg/controller/v1beta1/inferenceservice/utils/utils.go
💤 Files with no reviewable changes (1)
- pkg/controller/v1beta1/inferenceservice/utils/utils.go
✅ Files skipped from review due to trivial changes (9)
- pkg/controller/v1beta1/inferenceservice/platform_permissions_default.go
- pkg/controller/v1beta1/inferenceservice/controller_setup_default.go
- pkg/controller/v1beta1/inferenceservice/reconcilers/factory_default.go
- pkg/controller/v1beta1/inferenceservice/controller.go
- config/overlays/odh/rbac/inferenceservice/role.yaml
- pkg/controller/v1beta1/inferenceservice/distro/controller_rbac_ocp.go
- pkg/controller/v1beta1/inferenceservice/platform_permissions_ocp_test.go
- pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/ocp_route_reconciler_test.go
- pkg/controller/v1beta1/inferenceservice/rawkube_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- Makefile
- pkg/controller/v1beta1/inferenceservice/reconcilers/factory_ocp.go
- pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go
- pkg/controller/v1beta1/inferenceservice/reconcilers/factory.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go (1)
56-58: Avoid single-shot spec comparison after fetch; assert spec convergence with pollingLine 108 performs a one-time equality check after only waiting for existence. In async reconciliation flows, this can flake when object creation precedes final spec update. Poll the spec equality itself.
Proposed refactor
func assertPlatformIngressSpec(ctx context.Context, serviceKey types.NamespacedName, serviceName string, _ *v1beta1.InferenceService) { pathType := netv1.PathTypePrefix - actualIngress := &netv1.Ingress{} - Eventually(func() error { - return k8sClient.Get(ctx, serviceKey, actualIngress) - }, timeout, interval).Should(Succeed()) - expectedIngress := netv1.Ingress{ Spec: netv1.IngressSpec{ Rules: []netv1.IngressRule{ @@ }, }, } - Expect(actualIngress.Spec).To(Equal(expectedIngress.Spec)) + Eventually(func() netv1.IngressSpec { + actualIngress := &netv1.Ingress{} + Expect(k8sClient.Get(ctx, serviceKey, actualIngress)).To(Succeed()) + return actualIngress.Spec + }, timeout, interval).Should(Equal(expectedIngress.Spec)) }As per coding guidelines, "REVIEW PRIORITIES: ... 3. Bug-prone patterns and error handling gaps".
Also applies to: 108-108
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go` around lines 56 - 58, The test currently only polls for the resource to exist then performs a single-shot spec equality which can flake; change the one-time equality to poll until the spec converges by wrapping the Get+comparison inside an Eventually that retries with the same timeout and interval. Specifically, replace the single equality/assert after the initial Eventually that fetches actualIngress with an Eventually(func() bool { if err := k8sClient.Get(ctx, serviceKey, actualIngress); err != nil { return false }; return reflect.DeepEqual(actualIngress.Spec, expectedIngress.Spec) }, timeout, interval).Should(BeTrue()) (or the Gomega equivalent you use) so the test keeps fetching actualIngress and asserts actualIngress.Spec equals expectedIngress.Spec until it converges. Ensure you reference actualIngress, serviceKey, k8sClient, timeout and interval in the new poll.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go`:
- Around line 40-43: Change the Consistently checks to return errors instead of
bools so unexpected API/client errors surface to Gomega: replace the bool lambda
that calls k8sClient.Get and uses apierr.IsNotFound with a func() error that
returns nil when apierr.IsNotFound(err) (success) and returns the actual err for
any other case, then assert with .Should(Succeed(), "Ingress %s should not
exist", serviceKey.Name); apply the same change to the other Consistently block
that checks Ingress existence so k8sClient.Get, apierr.IsNotFound, and the
Consistently matcher expose real errors rather than collapsing to false.
---
Nitpick comments:
In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go`:
- Around line 56-58: The test currently only polls for the resource to exist
then performs a single-shot spec equality which can flake; change the one-time
equality to poll until the spec converges by wrapping the Get+comparison inside
an Eventually that retries with the same timeout and interval. Specifically,
replace the single equality/assert after the initial Eventually that fetches
actualIngress with an Eventually(func() bool { if err := k8sClient.Get(ctx,
serviceKey, actualIngress); err != nil { return false }; return
reflect.DeepEqual(actualIngress.Spec, expectedIngress.Spec) }, timeout,
interval).Should(BeTrue()) (or the Gomega equivalent you use) so the test keeps
fetching actualIngress and asserts actualIngress.Spec equals
expectedIngress.Spec until it converges. Ensure you reference actualIngress,
serviceKey, k8sClient, timeout and interval in the new poll.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 8ab0dfa1-475c-4e96-8ff6-2ac5050175ff
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum,!**/*.sum
📒 Files selected for processing (5)
pkg/controller/v1beta1/inferenceservice/controller_setup_default.gopkg/controller/v1beta1/inferenceservice/platform_permissions_default.gopkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.gopkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.gopkg/controller/v1beta1/inferenceservice/reconcilers/factory_default.go
✅ Files skipped from review due to trivial changes (4)
- pkg/controller/v1beta1/inferenceservice/controller_setup_default.go
- pkg/controller/v1beta1/inferenceservice/platform_permissions_default.go
- pkg/controller/v1beta1/inferenceservice/reconcilers/factory_default.go
- pkg/controller/v1beta1/inferenceservice/rawkube_platform_ocp_test.go
| Consistently(func() bool { | ||
| err := k8sClient.Get(ctx, serviceKey, &netv1.Ingress{}) | ||
| return apierr.IsNotFound(err) | ||
| }, fastTimeout, interval).Should(BeTrue(), "Ingress %s should not exist", serviceKey.Name) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "rawkube_platform_default_test.go" | head -20Repository: opendatahub-io/kserve
Length of output: 139
🏁 Script executed:
cat -n ./pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go | sed -n '35,55p'Repository: opendatahub-io/kserve
Length of output: 1124
🏁 Script executed:
cat -n ./pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go | sed -n '1,20p'Repository: opendatahub-io/kserve
Length of output: 817
🏁 Script executed:
cat -n ./pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go | sed -n '20,32p'Repository: opendatahub-io/kserve
Length of output: 381
🏁 Script executed:
wc -l ./pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.goRepository: opendatahub-io/kserve
Length of output: 143
🏁 Script executed:
cat -n ./pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go | sed -n '53,90p'Repository: opendatahub-io/kserve
Length of output: 1569
🏁 Script executed:
cat -n ./pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.goRepository: opendatahub-io/kserve
Length of output: 4400
Return errors from assertion helpers instead of collapsing to boolean
Lines 40–43 and 47–50 mask API failures (client errors, RBAC denials, network issues) as a simple false return. This makes test failures ambiguous and hides real regressions. Keep NotFound as success, but surface unexpected errors directly to the Gomega matcher.
Proposed fix
func assertPlatformIngressDoesNotExist(ctx context.Context, serviceKey types.NamespacedName) {
- Consistently(func() bool {
- err := k8sClient.Get(ctx, serviceKey, &netv1.Ingress{})
- return apierr.IsNotFound(err)
- }, fastTimeout, interval).Should(BeTrue(), "Ingress %s should not exist", serviceKey.Name)
+ Consistently(func() error {
+ err := k8sClient.Get(ctx, serviceKey, &netv1.Ingress{})
+ if apierr.IsNotFound(err) {
+ return nil
+ }
+ if err != nil {
+ return err
+ }
+ return fmt.Errorf("ingress %s/%s unexpectedly exists", serviceKey.Namespace, serviceKey.Name)
+ }, fastTimeout, interval).Should(Succeed())
}
func assertPlatformIngressDeleted(ctx context.Context, serviceKey types.NamespacedName) {
- Eventually(func() bool {
- err := k8sClient.Get(ctx, serviceKey, &netv1.Ingress{})
- return apierr.IsNotFound(err)
- }, timeout, interval).Should(BeTrue(), "Ingress %s should be deleted", serviceKey.Name)
+ Eventually(func() error {
+ err := k8sClient.Get(ctx, serviceKey, &netv1.Ingress{})
+ if apierr.IsNotFound(err) {
+ return nil
+ }
+ if err != nil {
+ return err
+ }
+ return fmt.Errorf("ingress %s/%s still exists", serviceKey.Namespace, serviceKey.Name)
+ }, timeout, interval).Should(Succeed())
}Also applies to lines 47–50.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/v1beta1/inferenceservice/rawkube_platform_default_test.go`
around lines 40 - 43, Change the Consistently checks to return errors instead of
bools so unexpected API/client errors surface to Gomega: replace the bool lambda
that calls k8sClient.Get and uses apierr.IsNotFound with a func() error that
returns nil when apierr.IsNotFound(err) (success) and returns the actual err for
any other case, then assert with .Should(Succeed(), "Ingress %s should not
exist", serviceKey.Name); apply the same change to the other Consistently block
that checks Ingress existence so k8sClient.Get, apierr.IsNotFound, and the
Consistently matcher expose real errors rather than collapsing to false.
7331b8b to
6bf2415
Compare
|
/retest |
…ro build tags Moves OpenShift Route creation and auth-delegation ClusterRoleBinding management for raw InferenceServices from odh-model-controller into kserve behind `//go:build distro` tags using three hook pairs (_default.go / _ocp.go): 1. Factory hook — selects RawOCPRouteReconciler instead of RawIngressReconciler in distro builds (when Gateway API is disabled) 2. Controller setup hook — registers Route API scheme and Owns watch 3. Platform permissions hook — manages system:auth-delegator CRBs for auth-enabled ISVCs Reverts shared files (kube_ingress_reconciler.go, utils.go) to match upstream kserve/master, eliminating ODH-specific modifications that caused rebase friction. Adapts rawkube_controller_test.go for the distro pattern by splitting platform-specific ingress assertions into build-tagged test helper files. Moves ODH-only auth tests to the distro test file. All shared tests use higher-level helpers that are no-ops in distro mode for cluster-local ISVCs (which don't create Routes). Includes 18 unit tests (10 Route reconciler, 8 CRB) and fixes pre-existing unparam lint warnings in distro test files. Ref: RHOAIENG-56270
- Fix annotation reconciliation loop: compare only managed annotations in semanticRouteEquals and merge (not replace) annotations on update, preventing infinite loop with OpenShift router controller - Remove unconstrained clusterrolebindings RBAC from base role by fixing Makefile controller-gen path to not recurse into distro/ subpackage - Clear isvc.Status.URL and Address on force-stop to avoid stale data - Add status update on reconcilePlatformPermissions failure for consistency - Add warning log when setRouteTargetPort falls back to first port
…otations - Change route timeout from max(predictor, transformer) to sum(predictor, transformer, explainer) to account for full request chain latency - Clear isvc.Status.URL when Route is not yet admitted to prevent stale URLs from being visible to clients - Reconcile managed Route annotations bi-directionally so removed annotations (e.g. haproxy timeout) are pruned from existing Routes instead of persisting forever
25c5e8d to
4d398c2
Compare
Wire the inferenceservice distro RBAC into overlays/odh so kustomize output includes the missing clusterrolebinding permissions required by the controller; include precommit-generated go.sum updates. Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
4d398c2 to
b08f02d
Compare
|
/retest |
|
@VedantMahabaleshwarkar: The following test has Failed: OCI Artifact Browser URLInspecting Test Artifacts ManuallyTo inspect your test artifacts manually, follow these steps:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:kserve-group-test-78f86 |
|
@VedantMahabaleshwarkar: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
Moves OpenShift Route creation and auth-delegation ClusterRoleBinding management for raw InferenceServices from odh-model-controller into kserve behind
//go:build distrotags using three hook pairs (_default.go/_ocp.go):RawOCPRouteReconcilerinstead ofRawIngressReconcilerin distro builds (when Gateway API is disabled)Owns(&routev1.Route{})watchsystem:auth-delegatorClusterRoleBindings for auth-enabled ISVCsReverts shared files (
kube_ingress_reconciler.go,utils.go) to match upstream kserve/master, eliminating ODH-specific modifications that caused rebase friction.Adapts
rawkube_controller_test.gofor the distro pattern by splitting platform-specific ingress assertions into build-tagged test helper files. Moves ODH-only auth tests to the distro test file. All shared tests use higher-level helpers that are no-ops in distro mode for cluster-local ISVCs (which don't create Routes).Which issue(s) this PR fixes:
Ref: RHOAIENG-56270
Feature/Issue validation/testing:
RawOCPRouteReconciler(exposed/cluster-local, auth/no-auth, stopped ISVC, label removal, transformer routing, timeout, route update, not-admitted requeue)reconcilePlatformPermissions(CRB create/update/delete lifecycle, SA variants, deleting-ISVC exclusion)make test GOTAGS="")make test GOTAGS="distro")make test GOTAGS=""make test GOTAGS="distro"Special notes for your reviewer:
assertPlatformIngressExists, etc.) inrawkube_platform_ocp_test.goare currently no-ops for shared tests that create cluster-local ISVCs (which don't produce Routes in distro mode). Full Route assertion coverage for non-auth exposed ISVCs is planned as follow-up work.clusterrolebindingsCRUD) are only in the distro overlay RBAC, not in the upstreamconfig/rbac/role.yaml.Companion PR (odh-model-controller):
The corresponding removal of the now-redundant
KserveRawRouteReconcilerandKserveRawClusterRoleBindingReconcilerfrom odh-model-controller is in opendatahub-io/odh-model-controller#803. Both PRs must ship in the same ODH release to avoid either:isvc.Labelsvs{"inferenceservice-name": isvc.Name})Upgrade safety: Existing Routes (owned by the ISVC via
SetControllerReference) and CRBs (cluster-scoped, identical naming<ns>-<sa>-auth-delegator) survive the transition and are adopted by the new kserve distro reconciler.Checklist:
Release note:
Summary by CodeRabbit
New Features
Behavior Changes
Tests
Documentation