fix: skip INFERENCE_SERVICE_NAME injection on pre-existing deployments to avoid pod restart on upgrade (RHOAIENG-59268)#1435
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 28 minutes and 2 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThe reconcile logic for predictor and transformer pod specs now conditionally injects the Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Security & Code Quality Issues
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/controller/v1beta1/inferenceservice/components/predictor.go`:
- Around line 160-162: The transformer currently unconditionally injects
INFERENCE_SERVICE_NAME causing unnecessary rolling restarts; update the
transformer logic to mirror the predictor behavior by checking the existing
Deployment before calling AddEnvVarToPodSpec: if there's no existing Deployment
(new ISVC) allow injecting, but if an existing Deployment exists only call
AddEnvVarToPodSpec when that Deployment's PodSpec already contains the
INFERENCE_SERVICE_NAME env var. Locate the transformer component code that
invokes AddEnvVarToPodSpec and add the same existence-and-env-presence guard
used in the predictor (check existing Deployment's containers for
INFERENCE_SERVICE_NAME) so injection only happens when safe.
- Around line 163-177: The current logic treats any Get error as "no deployment"
and only checks a single container name, causing incorrect injection decisions;
update p.client.Get(...) error handling to return the error when errGet != nil
and !apierrors.IsNotFound(errGet) (fail closed), and when the Deployment exists
(errGet == nil) compute injectEnvVar by iterating the new pod spec containers
(e.g., podSpec.Containers or the variable used to build the Deployment) and for
each new container find the matching container in
existingDeployment.Spec.Template.Spec.Containers by name and verify
constants.InferenceServiceNameEnvVarKey is present; set injectEnvVar = true only
if every new container has that env var in the existing deployment, otherwise
set injectEnvVar = false and log the skip using p.Log.Info; refer to
constants.PredictorServiceName and constants.InferenceServiceContainerName where
appropriate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: da29cca9-e627-473b-9bd0-722523d798c6
📒 Files selected for processing (1)
pkg/controller/v1beta1/inferenceservice/components/predictor.go
|
/retest |
spolti
left a comment
There was a problem hiding this comment.
Please add some tests as well.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/controller/v1beta1/inferenceservice/components/predictor.go (1)
161-189:⚠️ Potential issue | 🟠 MajorCheck scope (single container) and inject scope (all containers) still disagree.
The existing-deployment probe only inspects
constants.InferenceServiceContainerName(line 173), but the injection on lines 183-188 walks every entry inpodSpec.Containers. In collocation mode (predictor + transformer sidecar) the decision is therefore made on the kserve-container alone and then applied to sidecars whose existing state was never examined. Concretely, if the sidecar in the current Deployment doesn't carryINFERENCE_SERVICE_NAMEbut the kserve-container does, this path setsinjectEnvVar = trueand mutates the sidecar's env — changing the pod template hash and causing the very upgrade restart this PR is trying to avoid.Either narrow the injection to
constants.InferenceServiceContainerNameto match the check, or (preferred) build a per-container map from the existing Deployment and gate eachAddEnvVarToPodSpeccall against its own container's prior state. A concrete diff was proposed in an earlier review round and still applies.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/v1beta1/inferenceservice/components/predictor.go` around lines 161 - 189, The code decides whether to inject INFERENCE_SERVICE_NAME by inspecting only constants.InferenceServiceContainerName in existingDeployment but then applies injection to every container in podSpec.Containers, which can wrongly mutate sidecars and trigger restarts; fix by building a per-container map from existingDeployment (map container name -> hasEnv) using utils.GetEnvVarValue, then in the injection loop call isvcutils.AddEnvVarToPodSpec only for containers where the map shows the env is missing (or alternatively restrict injection to constants.InferenceServiceContainerName to match the check); update references to existingDeployment, constants.InferenceServiceContainerName, injectEnvVar, podSpec.Containers, and isvcutils.AddEnvVarToPodSpec accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/controller/v1beta1/inferenceservice/components/predictor.go`:
- Around line 161-189: The code decides whether to inject INFERENCE_SERVICE_NAME
by inspecting only constants.InferenceServiceContainerName in existingDeployment
but then applies injection to every container in podSpec.Containers, which can
wrongly mutate sidecars and trigger restarts; fix by building a per-container
map from existingDeployment (map container name -> hasEnv) using
utils.GetEnvVarValue, then in the injection loop call
isvcutils.AddEnvVarToPodSpec only for containers where the map shows the env is
missing (or alternatively restrict injection to
constants.InferenceServiceContainerName to match the check); update references
to existingDeployment, constants.InferenceServiceContainerName, injectEnvVar,
podSpec.Containers, and isvcutils.AddEnvVarToPodSpec accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: f6d1ccc8-a343-488b-b79b-901cb3a52fbb
📒 Files selected for processing (2)
pkg/controller/v1beta1/inferenceservice/components/predictor.gopkg/controller/v1beta1/inferenceservice/components/transformer.go
|
/retest-required |
|
/retest |
|
/retest-required |
…s to avoid pod restart on upgrade (RHOAIENG-59268) When upgrading from RHOAI 3.3.2 to 3.4, the new controller unconditionally injected INFERENCE_SERVICE_NAME into all RawDeployment pod templates, causing a rolling restart of every InferenceService in the cluster simultaneously. The fix checks whether the existing Deployment already has the env var. If it does not (pre-upgrade state), injection is skipped to preserve the pod template hash and avoid the restart. New ISVCs always get the env var on creation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Andres Llausas <allausas@redhat.com>
…n (RHOAIENG-59268) - Use apierrors.IsNotFound to distinguish missing deployment from real API errors; return error on non-NotFound failures instead of silently injecting - Apply the same INFERENCE_SERVICE_NAME injection guard to the transformer component to prevent rolling restarts on upgrade Signed-off-by: Andres Llausas <allausas@redhat.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lper (RHOAIENG-59268) Move the upgrade-safety check into shouldInjectInferenceServiceName in component.go so predictor and transformer share the same logic without duplication. Signed-off-by: Andres Llausas <allausas@redhat.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…9268) Covers: new ISVC (no deployment), pre-upgrade (env var absent → skip), post-upgrade (env var present → inject), and unmatched container name. Signed-off-by: Andres Llausas <allausas@redhat.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andres Llausas <allausas@redhat.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andres Llausas <allausas@redhat.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
f1f4c8f to
89a77a1
Compare
|
/retest-required |
|
/retest |
1 similar comment
|
/retest |
|
/retest-required |
|
/retest |
3 similar comments
|
/retest |
|
/retest |
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andresllh, VedantMahabaleshwarkar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/group-test |
3 similar comments
|
/group-test |
|
/group-test |
|
/group-test |
|
@andresllh: The following test has Failed: OCI Artifact Browser URLInspecting Test Artifacts ManuallyTo inspect your test artifacts manually, follow these steps:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:kserve-group-test-gw2vg |
Summary
be2cb412f(feat: new env var INFERENCE_SERVICE_NAME feat: new env var INFERENCE_SERVICE_NAME kserve/kserve#5013) unconditionally injectsINFERENCE_SERVICE_NAMEinto every pod template on reconcile, changing the pod template hash and triggering a rolling restart for every existing deploymentTest plan
INFERENCE_SERVICE_NAMEINFERENCE_SERVICE_NAMEon creationRelated
🤖 Generated with Claude Code
Summary by CodeRabbit