[sync] upstream llm-d/llm-d-router f7e88524 [2026-06-22]#261
Conversation
…m-d#1667) Signed-off-by: Guangya Liu <gyliu513@gmail.com>
…-d#1616) add recommendation for agentic inference requests for CPU and memory. add recommendation for resource consumption for prefix block size add section on envoy proxy resource usage. Signed-off-by: Jacob Murry <jacobmurry@google.com>
Update github.com/llm-d/llm-d-kv-cache from v0.9.0-rc.1 to the released v0.9.0. Signed-off-by: Maroon Ayoub <mayoub@redhat.com>
…m-d#1696) The GA InferencePool API (inference.networking.k8s.io/v1) nests pod labels under spec.selector.matchLabels, while the deprecated alpha API (inference.networking.x-k8s.io/v1alpha2) uses a flat spec.selector map. The allowlist validator read spec.selector unconditionally and fmt.Sprintf'd nested map values into label strings, producing invalid selectors like "map[app.kubernetes.io/name:...]" that Kubernetes rejected for exceeding the 63-character label value limit. Branch the selector path based on the detected API group and use NestedStringMap for type-safe extraction. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
…is disabled (llm-d#1695) When router.inferencePool.create=false the EPP --endpoint-selector is rendered solely from router.modelServers.matchLabels. If matchLabels is omitted the selector renders empty and EPP rejects it at startup, because standalone mode requires either --pool-name or a non-empty --endpoint-selector. The existing matchLabels validation lives in _inferencepool.yaml, which only renders when an InferencePool is created, so the create=false path had no guard and produced a broken EPP deployment silently. Add the check to the standalone validations so helm template fails fast with an actionable message, and add a negative test case to hack/verify-helm.sh covering the reproducer from the issue. Fixes llm-d#1665 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
* test: consolidate and rename e2e runner scripts Signed-off-by: roytman <roytman@il.ibm.com> * fix comments Signed-off-by: roytman <roytman@il.ibm.com> * Update test/scripts/test-e2e-router.sh Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> * Update test/scripts/e2e-common.sh Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
… PR 2905 (llm-d#1600) * fix(standalone): rework agentgateway configuration to use derived pseudo-service Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com> * chore: trigger ci re-run Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com> * fix(standalone): update agentgateway validation error and add verification assertions Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com> --------- Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com>
Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>
…ting (llm-d#1683) Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>
📝 WalkthroughWalkthroughThis PR replaces the scheduler-based e2e test infrastructure with a router-based one: a shared Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 8 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/sidecar/proxy/proxy.go (1)
404-433:⚠️ Potential issue | 🔴 CriticalHarden TLS verification in decoder proxy transport (CWE-295).
Line 421 directly passes user-configurable
insecureSkipVerifytotls.Config.InsecureSkipVerify. Operators can enable this via--tls-insecure-skip-verify=decoderor YAML config, disabling certificate validation on decoder (vLLM) HTTPS traffic. Enables MITM attacks: prompt tampering, response injection, and data theft.Remediation
func (s *Server) newProxyTransport(scheme string, insecureSkipVerify bool) http.RoundTripper { if scheme == schemeHTTPS { + // Certificate verification is mandatory. Operators must use proper CA roots. t.TLSClientConfig = &tls.Config{ - InsecureSkipVerify: insecureSkipVerify, //nolint:gosec + InsecureSkipVerify: false, MinVersion: tls.VersionTLS12,Per
**/*.gosecurity guideline: "No InsecureSkipVerify in TLS configs (enables MITM attacks)."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/sidecar/proxy/proxy.go` around lines 404 - 433, The newProxyTransport method directly passes the user-configurable insecureSkipVerify parameter to tls.Config.InsecureSkipVerify, which violates security guidelines and enables MITM attacks. Remove the insecureSkipVerify parameter from the newProxyTransport function signature and ensure that InsecureSkipVerify in the tls.Config is always set to false (or omitted entirely, since false is the default). This prevents operators from disabling certificate validation on decoder HTTPS traffic regardless of configuration options like --tls-insecure-skip-verify.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/charts/llm-d-router-standalone/templates/_validations.tpl`:
- Around line 21-27: The current validation in the _validations.tpl template
only checks if matchLabels exists, but does not validate the specific
matchLabels.app field which is required for agentgateway backend naming. When
router.proxy.proxyType is set to agentgateway and router.inferencePool.create is
false (standalone mode), add an additional validation check after the existing
matchLabels check to ensure that matchLabels.app is present and valid. This
validation should fail with a clear error message if matchLabels.app is missing
or empty when using agentgateway in standalone mode, preventing silent routing
errors at runtime.
In `@docs/operations.md`:
- Around line 111-136: The Helm configuration example in the documentation
includes a router.epp.resources section that does not exist in the actual chart
schema. Remove the entire EPP container resources block (lines showing epp: with
its nested resources, requests for cpu and memory, and limits) and retain only
the valid router.proxy.resources configuration block. This will ensure users
copying the example get valid YAML that matches the actual chart structure which
only supports router.proxy resources configuration.
In `@hack/verify-helm.sh`:
- Around line 151-186: The code uses `eval` to execute dynamically constructed
command strings, creating a command injection vulnerability. Replace each
command variable (missing_endpoint_selector_command, invalid_proxy_command,
deprecated_agentgateway_service_command,
unsupported_agentgateway_llm_d_router_gateway_command,
unsupported_agentgateway_listener_port_command,
mismatched_agentgateway_listener_target_port_command) with bash arrays instead
of strings, and execute them using the array syntax `if "${cmd_array[@]}"
>/dev/null; then` instead of `if eval "${cmd_string}"; then`. This eliminates
the security risk while preserving the same command execution behavior.
In `@pkg/sidecar/proxy/allowlist_test.go`:
- Around line 46-177: The poolSelector Context is missing regression tests for
empty selector fail-closed behavior. Add two new test cases within the
poolSelector Context: one that verifies the poolSelector method on an
AllowlistValidator returns an error when the GA InferencePool has an empty
matchLabels map in its spec.selector field, and another that verifies it returns
an error when the alpha InferencePool has an empty selector map in its spec
field. Both test cases should follow the existing error case pattern (using the
same pool setup structure and Expect assertions for HaveOccurred and error
message validation) to ensure match-all selector regressions are prevented.
In `@pkg/sidecar/proxy/allowlist.go`:
- Around line 253-257: The early return at lines 253-257 when av.poolSelector
fails bypasses the informer replacement and stop path that occurs at lines
288-295, creating a security vulnerability where a malformed pool update can
leave the previous selector active and keep stale targets authorized. Instead of
returning immediately on selector extraction error, execute the same informer
teardown logic used in the informer replacement path (including stopping and
clearing the informer) plus recompute the target state to ensure the system
fails closed rather than leaving stale authorization rules in place.
- Around line 275-280: The code currently accepts empty selectorData which
results in an empty label selector that matches all resources in a namespace,
creating a security vulnerability. After successfully retrieving selectorData
from the spec using unstructured.NestedStringMap (when the found check passes),
add a validation to reject empty selectorData maps by checking if
len(selectorData) == 0 and returning an appropriate error message before the
return statement that calls labels.Set(selectorData).AsSelector(). This ensures
the allowlist scope is properly enforced and prevents namespace-wide resource
access.
In `@pkg/sidecar/proxy/transport_test.go`:
- Around line 34-35: The test mutates global OTel propagator state via
otel.SetTextMapPropagator without restoring it, causing order-dependent failures
in other tests. Save the original propagator returned by
otel.GetTextMapPropagator before calling otel.SetTextMapPropagator, then restore
it in t.Cleanup() following the pattern already established in
pkg/common/observability/tracing/telemetry_test.go. This ensures the global
state is properly reset after the test completes.
In `@test/scripts/test-e2e-gaie.sh`:
- Around line 24-26: The SIM_IMAGE variable assignment uses the wrong
environment variable reference. On the line where SIM_IMAGE is being set with a
default value, it currently references ${VLLM_IMAGE:-...} as the fallback, but
it should reference ${SIM_IMAGE:-...} instead. Change the variable name in the
parameter expansion from VLLM_IMAGE to SIM_IMAGE so that environment variable
overrides for SIM_IMAGE are properly honored.
In `@test/scripts/test-e2e-router.sh`:
- Around line 24-30: The trap handler for INT and TERM signals hard-codes the
cluster name "e2e-tests" and unconditionally calls e2e_handle_interrupt, which
can delete an unrelated cluster with the same name if the script is running
against an existing K8S_CONTEXT. Gate the cleanup to only occur when the script
actually created the cluster by conditionally setting the trap or passing a flag
to e2e_handle_interrupt that indicates ownership. The cleanup should only be
active when K8S_CONTEXT is unset (meaning the script created the cluster), not
when using an existing context.
---
Outside diff comments:
In `@pkg/sidecar/proxy/proxy.go`:
- Around line 404-433: The newProxyTransport method directly passes the
user-configurable insecureSkipVerify parameter to tls.Config.InsecureSkipVerify,
which violates security guidelines and enables MITM attacks. Remove the
insecureSkipVerify parameter from the newProxyTransport function signature and
ensure that InsecureSkipVerify in the tls.Config is always set to false (or
omitted entirely, since false is the default). This prevents operators from
disabling certificate validation on decoder HTTPS traffic regardless of
configuration options like --tls-insecure-skip-verify.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 1241f0e4-d5b4-4890-a93b-743af2c3acdf
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum,!**/*.sum
📒 Files selected for processing (22)
.github/workflows/ci-pr-checks.yamlMakefileREADME.mdconfig/charts/README.mdconfig/charts/llm-d-router-standalone/templates/_validations.tplconfig/charts/llm-d-router-standalone/templates/agentgateway-service.yamlconfig/charts/llm-d-router-standalone/values.yamlconfig/charts/routerlib/templates/_helpers.tpldocs/architecture.mddocs/operations.mdgo.modhack/push-chart.shhack/verify-helm.shpkg/sidecar/proxy/allowlist.gopkg/sidecar/proxy/allowlist_test.gopkg/sidecar/proxy/proxy.gopkg/sidecar/proxy/transport_test.goscripts/check-latest-tags.shtest/scripts/e2e-common.shtest/scripts/run_e2e.shtest/scripts/test-e2e-gaie.shtest/scripts/test-e2e-router.sh
💤 Files with no reviewable changes (2)
- config/charts/llm-d-router-standalone/templates/agentgateway-service.yaml
- test/scripts/run_e2e.sh
| {{- /* Without an InferencePool the EPP --endpoint-selector is rendered from modelServers.matchLabels; an empty selector is rejected by EPP at startup, so require it here. */ -}} | ||
| {{- $useInferencePool := ne .Values.router.inferencePool.create false -}} | ||
| {{- if not $useInferencePool -}} | ||
| {{- if or (empty .Values.router.modelServers) (not .Values.router.modelServers.matchLabels) -}} | ||
| {{- fail ".Values.router.modelServers.matchLabels is required when .Values.router.inferencePool.create=false: standalone mode renders the EPP --endpoint-selector from matchLabels and cannot start with an empty selector" -}} | ||
| {{- end -}} | ||
| {{- end -}} |
There was a problem hiding this comment.
Require and validate modelServers.matchLabels.app for agentgateway backend naming (CWE-20).
Line 21-Line 27 only enforces that matchLabels exists. In this PR, agentgateway backend identity is derived from matchLabels.app; if app is missing/invalid, routing can silently target the wrong backend name at runtime. Add a hard fail for missing/invalid matchLabels.app when router.proxy.proxyType=agentgateway and router.inferencePool.create=false.
Proposed validation hardening
{{- if not $useInferencePool -}}
{{- if or (empty .Values.router.modelServers) (not .Values.router.modelServers.matchLabels) -}}
{{- fail ".Values.router.modelServers.matchLabels is required when .Values.router.inferencePool.create=false: standalone mode renders the EPP --endpoint-selector from matchLabels and cannot start with an empty selector" -}}
{{- end -}}
+ {{- if eq ($proxy.proxyType | default "envoy") "agentgateway" -}}
+ {{- $appLabel := index (.Values.router.modelServers.matchLabels | default dict) "app" | default "" -}}
+ {{- if empty $appLabel -}}
+ {{- fail ".Values.router.modelServers.matchLabels.app is required when proxyType=agentgateway and inferencePool.create=false" -}}
+ {{- end -}}
+ {{- if or (gt (len $appLabel) 63) (not (regexMatch "^[a-z0-9]([-a-z0-9]*[a-z0-9])?$" $appLabel)) -}}
+ {{- fail ".Values.router.modelServers.matchLabels.app must be a DNS-1123 label for agentgateway backend naming" -}}
+ {{- end -}}
+ {{- end -}}
{{- end -}}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@config/charts/llm-d-router-standalone/templates/_validations.tpl` around
lines 21 - 27, The current validation in the _validations.tpl template only
checks if matchLabels exists, but does not validate the specific matchLabels.app
field which is required for agentgateway backend naming. When
router.proxy.proxyType is set to agentgateway and router.inferencePool.create is
false (standalone mode), add an additional validation check after the existing
matchLabels check to ensure that matchLabels.app is present and valid. This
validation should fail with a clear error message if matchLabels.app is missing
or empty when using agentgateway in standalone mode, preventing silent routing
errors at runtime.
| ```yaml | ||
| router: | ||
| # Endpoint Picker (EPP) Container Resources | ||
| epp: | ||
| resources: | ||
| requests: | ||
| cpu: "32" | ||
| memory: "64Gi" | ||
| limits: | ||
| memory: "128Gi" | ||
|
|
||
| # Envoy Proxy Container Resources | ||
| proxy: | ||
| resources: | ||
| requests: | ||
| cpu: "8" | ||
| memory: "2Gi" | ||
| limits: | ||
| memory: "4Gi" | ||
| ``` | ||
|
|
||
| To apply these values during deployment, run the Helm install or upgrade command with your custom values file: | ||
|
|
||
| ```bash | ||
| helm install optimize-baseline ./config/charts/llm-d-router-standalone -f resource_overrides.yaml | ||
| ``` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify Helm values.yaml schema contains router.epp.resources and router.proxy.resources keys.
# Check for epp resources path
rg -n 'router:' config/charts/llm-d-router-standalone/values.yaml -A 30 | grep -E '(epp:|proxy:|resources:)'
# Confirm exact nesting structure
yq '.router.epp.resources' config/charts/llm-d-router-standalone/values.yaml 2>/dev/null || echo "Path does not exist"
yq '.router.proxy.resources' config/charts/llm-d-router-standalone/values.yaml 2>/dev/null || echo "Path does not exist"Repository: opendatahub-io/llm-d-router
Length of output: 181
🏁 Script executed:
cat -n config/charts/llm-d-router-standalone/values.yaml | head -60Repository: opendatahub-io/llm-d-router
Length of output: 2190
🏁 Script executed:
# Search for 'epp:' in the values.yaml file
rg -n 'epp:' config/charts/llm-d-router-standalone/values.yamlRepository: opendatahub-io/llm-d-router
Length of output: 53
🏁 Script executed:
# Get the complete router section structure
rg -n '^ [a-z]' config/charts/llm-d-router-standalone/values.yaml | head -20Repository: opendatahub-io/llm-d-router
Length of output: 124
🏁 Script executed:
# Get full values.yaml to understand all top-level keys under router
cat config/charts/llm-d-router-standalone/values.yaml | grep -E '^ [a-z]' | sort -uRepository: opendatahub-io/llm-d-router
Length of output: 117
Remove invalid router.epp.resources configuration from Helm example; retain only valid router.proxy.resources.
The chart values.yaml does not contain a router.epp key. The documented configuration at lines 111–136 references router.epp.resources which does not exist in the chart schema. Users copying this example will have invalid YAML that fails deployment. The router.proxy.resources configuration is valid; remove the non-existent EPP block.
Actual chart structure
router:
modelServers:
extraServicePorts:
proxy:
resources: # <-- only this path exists
requests:
cpu: "4"
memory: 8Gi
limits:
memory: 16Gi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/operations.md` around lines 111 - 136, The Helm configuration example in
the documentation includes a router.epp.resources section that does not exist in
the actual chart schema. Remove the entire EPP container resources block (lines
showing epp: with its nested resources, requests for cpu and memory, and limits)
and retain only the valid router.proxy.resources configuration block. This will
ensure users copying the example get valid YAML that matches the actual chart
structure which only supports router.proxy resources configuration.
| missing_endpoint_selector_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.inferencePool.create=false --set router.modelServers.type=vllm --set 'router.modelServers.targetPorts[0].number=8000' >/dev/null" | ||
| echo "Executing: ${missing_endpoint_selector_command}" | ||
| if eval "${missing_endpoint_selector_command}"; then | ||
| echo "Helm template unexpectedly succeeded for inferencePool.create=false without modelServers.matchLabels" | ||
| exit 1 | ||
| fi | ||
|
|
||
| invalid_proxy_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set router.proxy.proxyType=bogus >/dev/null" | ||
| echo "Executing: ${invalid_proxy_command}" | ||
| if eval "${invalid_proxy_command}"; then | ||
| echo "Helm template unexpectedly succeeded for invalid proxyType" | ||
| exit 1 | ||
| fi | ||
|
|
||
| missing_agentgateway_service_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set router.proxy.proxyType=agentgateway >/dev/null" | ||
| echo "Executing: ${missing_agentgateway_service_command}" | ||
| if eval "${missing_agentgateway_service_command}"; then | ||
| echo "Helm template unexpectedly succeeded for missing agentgateway service.name" | ||
| deprecated_agentgateway_service_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=foo >/dev/null" | ||
| echo "Executing: ${deprecated_agentgateway_service_command}" | ||
| if eval "${deprecated_agentgateway_service_command}"; then | ||
| echo "Helm template unexpectedly succeeded for deprecated agentgateway.service configuration" | ||
| exit 1 | ||
| fi | ||
|
|
||
| unsupported_agentgateway_llm_d_router_gateway_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=llm-instance-gateway --set 'router.proxy.agentgateway.service.ports[0]=8000' --set router.inferencePool.create=true --set router.modelServers.matchLabels.app=llm-instance-gateway >/dev/null" | ||
| unsupported_agentgateway_llm_d_router_gateway_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.inferencePool.create=true --set router.modelServers.matchLabels.app=llm-instance-gateway >/dev/null" | ||
| echo "Executing: ${unsupported_agentgateway_llm_d_router_gateway_command}" | ||
| if eval "${unsupported_agentgateway_llm_d_router_gateway_command}"; then | ||
| echo "Helm template unexpectedly succeeded for unsupported agentgateway createInferencePool=true configuration" | ||
| exit 1 | ||
| fi | ||
|
|
||
| mismatched_agentgateway_ports_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=llm-instance-gateway --set 'router.proxy.agentgateway.service.ports[0]=8001' --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' >/dev/null" | ||
| echo "Executing: ${mismatched_agentgateway_ports_command}" | ||
| if eval "${mismatched_agentgateway_ports_command}"; then | ||
| echo "Helm template unexpectedly succeeded for mismatched agentgateway service.ports" | ||
| exit 1 | ||
| fi | ||
|
|
||
| unsupported_agentgateway_listener_port_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=llm-instance-gateway --set 'router.proxy.agentgateway.service.ports[0]=8000' --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' --set 'router.extraServicePorts[0].name=proxy' --set 'router.extraServicePorts[0].port=9000' --set 'router.extraServicePorts[0].protocol=TCP' --set 'router.extraServicePorts[0].targetPort=9000' >/dev/null" | ||
| unsupported_agentgateway_listener_port_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' --set 'router.extraServicePorts[0].name=proxy' --set 'router.extraServicePorts[0].port=9000' --set 'router.extraServicePorts[0].protocol=TCP' --set 'router.extraServicePorts[0].targetPort=9000' >/dev/null" | ||
| echo "Executing: ${unsupported_agentgateway_listener_port_command}" | ||
| if eval "${unsupported_agentgateway_listener_port_command}"; then | ||
| echo "Helm template unexpectedly succeeded without an agentgateway listener Service port named http" | ||
| exit 1 | ||
| fi | ||
|
|
||
| mismatched_agentgateway_listener_target_port_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=llm-instance-gateway --set 'router.proxy.agentgateway.service.ports[0]=8000' --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' --set 'router.extraServicePorts[0].name=http' --set 'router.extraServicePorts[0].port=9000' --set 'router.extraServicePorts[0].protocol=TCP' --set 'router.extraServicePorts[0].targetPort=9001' >/dev/null" | ||
| mismatched_agentgateway_listener_target_port_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' --set 'router.extraServicePorts[0].name=http' --set 'router.extraServicePorts[0].port=9000' --set 'router.extraServicePorts[0].protocol=TCP' --set 'router.extraServicePorts[0].targetPort=9001' >/dev/null" |
There was a problem hiding this comment.
Remove eval command execution in test invocations (CWE-78).
Line 153, Line 160, Line 167, Line 174, and Line 181 execute dynamically built strings with eval. That creates avoidable command-injection surface and quoting fragility in CI scripts. Execute commands via arrays and "$\{cmd[@]\}" instead.
Safer execution pattern (apply to each command block)
-missing_endpoint_selector_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.inferencePool.create=false --set router.modelServers.type=vllm --set 'router.modelServers.targetPorts[0].number=8000' >/dev/null"
-echo "Executing: ${missing_endpoint_selector_command}"
-if eval "${missing_endpoint_selector_command}"; then
+missing_endpoint_selector_cmd=(
+ "${HELM}" template "${SCRIPT_ROOT}/config/charts/llm-d-router-standalone"
+ --set router.inferencePool.create=false
+ --set router.modelServers.type=vllm
+ --set 'router.modelServers.targetPorts[0].number=8000'
+)
+echo "Executing: ${missing_endpoint_selector_cmd[*]}"
+if "${missing_endpoint_selector_cmd[@]}" >/dev/null; then
echo "Helm template unexpectedly succeeded for inferencePool.create=false without modelServers.matchLabels"
exit 1
fiAs per coding guidelines, **/*.sh: “Avoid eval and dynamic command execution” and “Quote all variables to prevent injection.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hack/verify-helm.sh` around lines 151 - 186, The code uses `eval` to execute
dynamically constructed command strings, creating a command injection
vulnerability. Replace each command variable (missing_endpoint_selector_command,
invalid_proxy_command, deprecated_agentgateway_service_command,
unsupported_agentgateway_llm_d_router_gateway_command,
unsupported_agentgateway_listener_port_command,
mismatched_agentgateway_listener_target_port_command) with bash arrays instead
of strings, and execute them using the array syntax `if "${cmd_array[@]}"
>/dev/null; then` instead of `if eval "${cmd_string}"; then`. This eliminates
the security risk while preserving the same command execution behavior.
Source: Coding guidelines
| Context("poolSelector", func() { | ||
| It("should extract selector from GA InferencePool (matchLabels)", func() { | ||
| av := &AllowlistValidator{ | ||
| gvr: schema.GroupVersionResource{ | ||
| Group: routing.InferencePoolAPIGroup, | ||
| Version: "v1", | ||
| Resource: "inferencepools", | ||
| }, | ||
| } | ||
| pool := &unstructured.Unstructured{ | ||
| Object: map[string]interface{}{ | ||
| "apiVersion": "inference.networking.k8s.io/v1", | ||
| "kind": "InferencePool", | ||
| "metadata": map[string]interface{}{"name": "test-pool"}, | ||
| "spec": map[string]interface{}{ | ||
| "selector": map[string]interface{}{ | ||
| "matchLabels": map[string]interface{}{ | ||
| "app.kubernetes.io/name": "my-model", | ||
| "component": "serving", | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| selector, err := av.poolSelector(pool) | ||
| Expect(err).ToNot(HaveOccurred()) | ||
| Expect(selector.String()).To(SatisfyAll( | ||
| ContainSubstring("app.kubernetes.io/name=my-model"), | ||
| ContainSubstring("component=serving"), | ||
| )) | ||
| }) | ||
|
|
||
| It("should extract selector from deprecated alpha InferencePool (flat map)", func() { | ||
| av := &AllowlistValidator{ | ||
| gvr: schema.GroupVersionResource{ | ||
| Group: "inference.networking.x-k8s.io", | ||
| Version: "v1alpha2", | ||
| Resource: "inferencepools", | ||
| }, | ||
| } | ||
| pool := &unstructured.Unstructured{ | ||
| Object: map[string]interface{}{ | ||
| "apiVersion": "inference.networking.x-k8s.io/v1alpha2", | ||
| "kind": "InferencePool", | ||
| "metadata": map[string]interface{}{"name": "test-pool"}, | ||
| "spec": map[string]interface{}{ | ||
| "selector": map[string]interface{}{ | ||
| "app.kubernetes.io/name": "my-model", | ||
| "component": "serving", | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| selector, err := av.poolSelector(pool) | ||
| Expect(err).ToNot(HaveOccurred()) | ||
| Expect(selector.String()).To(SatisfyAll( | ||
| ContainSubstring("app.kubernetes.io/name=my-model"), | ||
| ContainSubstring("component=serving"), | ||
| )) | ||
| }) | ||
|
|
||
| It("should fail for GA pool with flat selector (no matchLabels)", func() { | ||
| av := &AllowlistValidator{ | ||
| gvr: schema.GroupVersionResource{ | ||
| Group: routing.InferencePoolAPIGroup, | ||
| Version: "v1", | ||
| Resource: "inferencepools", | ||
| }, | ||
| } | ||
| pool := &unstructured.Unstructured{ | ||
| Object: map[string]interface{}{ | ||
| "apiVersion": "inference.networking.k8s.io/v1", | ||
| "kind": "InferencePool", | ||
| "metadata": map[string]interface{}{"name": "test-pool"}, | ||
| "spec": map[string]interface{}{ | ||
| "selector": map[string]interface{}{ | ||
| "app": "my-model", | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| _, err := av.poolSelector(pool) | ||
| Expect(err).To(HaveOccurred()) | ||
| Expect(err.Error()).To(ContainSubstring("matchLabels")) | ||
| }) | ||
|
|
||
| It("should fail when spec is missing", func() { | ||
| av := &AllowlistValidator{ | ||
| gvr: schema.GroupVersionResource{ | ||
| Group: routing.InferencePoolAPIGroup, | ||
| Version: "v1", | ||
| Resource: "inferencepools", | ||
| }, | ||
| } | ||
| pool := &unstructured.Unstructured{ | ||
| Object: map[string]interface{}{ | ||
| "apiVersion": "inference.networking.k8s.io/v1", | ||
| "kind": "InferencePool", | ||
| "metadata": map[string]interface{}{"name": "test-pool"}, | ||
| }, | ||
| } | ||
|
|
||
| _, err := av.poolSelector(pool) | ||
| Expect(err).To(HaveOccurred()) | ||
| Expect(err.Error()).To(ContainSubstring("spec")) | ||
| }) | ||
|
|
||
| It("should fail when selector is missing", func() { | ||
| av := &AllowlistValidator{ | ||
| gvr: schema.GroupVersionResource{ | ||
| Group: "inference.networking.x-k8s.io", | ||
| Version: "v1alpha2", | ||
| Resource: "inferencepools", | ||
| }, | ||
| } | ||
| pool := &unstructured.Unstructured{ | ||
| Object: map[string]interface{}{ | ||
| "apiVersion": "inference.networking.x-k8s.io/v1alpha2", | ||
| "kind": "InferencePool", | ||
| "metadata": map[string]interface{}{"name": "test-pool"}, | ||
| "spec": map[string]interface{}{}, | ||
| }, | ||
| } | ||
|
|
||
| _, err := av.poolSelector(pool) | ||
| Expect(err).To(HaveOccurred()) | ||
| Expect(err.Error()).To(ContainSubstring("selector")) | ||
| }) | ||
| }) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add regression tests for empty selector fail-closed behavior (CWE-284).
Current cases validate missing fields but not matchLabels: {} (GA) or selector: {} (alpha). Add explicit assertions that these return errors, so match-all selector regressions are blocked.
Suggested test additions
Context("poolSelector", func() {
+ It("should fail for GA pool with empty matchLabels", func() {
+ av := &AllowlistValidator{
+ gvr: schema.GroupVersionResource{
+ Group: routing.InferencePoolAPIGroup,
+ Version: "v1",
+ Resource: "inferencepools",
+ },
+ }
+ pool := &unstructured.Unstructured{
+ Object: map[string]interface{}{
+ "apiVersion": "inference.networking.k8s.io/v1",
+ "kind": "InferencePool",
+ "metadata": map[string]interface{}{"name": "test-pool"},
+ "spec": map[string]interface{}{
+ "selector": map[string]interface{}{
+ "matchLabels": map[string]interface{}{},
+ },
+ },
+ },
+ }
+ _, err := av.poolSelector(pool)
+ Expect(err).To(HaveOccurred())
+ })
+
+ It("should fail for alpha pool with empty selector", func() {
+ av := &AllowlistValidator{
+ gvr: schema.GroupVersionResource{
+ Group: "inference.networking.x-k8s.io",
+ Version: "v1alpha2",
+ Resource: "inferencepools",
+ },
+ }
+ pool := &unstructured.Unstructured{
+ Object: map[string]interface{}{
+ "apiVersion": "inference.networking.x-k8s.io/v1alpha2",
+ "kind": "InferencePool",
+ "metadata": map[string]interface{}{"name": "test-pool"},
+ "spec": map[string]interface{}{
+ "selector": map[string]interface{}{},
+ },
+ },
+ }
+ _, err := av.poolSelector(pool)
+ Expect(err).To(HaveOccurred())
+ })As per coding guidelines, "**: REVIEW PRIORITIES: Security vulnerabilities (provide severity, exploit scenario, and remediation code)."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/sidecar/proxy/allowlist_test.go` around lines 46 - 177, The poolSelector
Context is missing regression tests for empty selector fail-closed behavior. Add
two new test cases within the poolSelector Context: one that verifies the
poolSelector method on an AllowlistValidator returns an error when the GA
InferencePool has an empty matchLabels map in its spec.selector field, and
another that verifies it returns an error when the alpha InferencePool has an
empty selector map in its spec field. Both test cases should follow the existing
error case pattern (using the same pool setup structure and Expect assertions
for HaveOccurred and error message validation) to ensure match-all selector
regressions are prevented.
Source: Coding guidelines
| selector, err := av.poolSelector(poolObj) | ||
| if err != nil { | ||
| av.logger.Error(err, "failed to extract selector from InferencePool", "name", poolName) | ||
| return | ||
| } |
There was a problem hiding this comment.
Fail-open informer state on selector parse errors (CWE-284).
Line 253-Line 257 returns before the informer replacement/stop path (Line 288-Line 295), so a malformed pool update can leave the previous selector active and keep stale targets authorized by SSRF checks.
Use the same teardown path used for informer replacement (plus target-state recomputation) before returning, so this path fails closed.
As per coding guidelines, "**: REVIEW PRIORITIES: Security vulnerabilities (provide severity, exploit scenario, and remediation code)."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/sidecar/proxy/allowlist.go` around lines 253 - 257, The early return at
lines 253-257 when av.poolSelector fails bypasses the informer replacement and
stop path that occurs at lines 288-295, creating a security vulnerability where
a malformed pool update can leave the previous selector active and keep stale
targets authorized. Instead of returning immediately on selector extraction
error, execute the same informer teardown logic used in the informer replacement
path (including stopping and clearing the informer) plus recompute the target
state to ensure the system fails closed rather than leaving stale authorization
rules in place.
Source: Coding guidelines
| selectorData, found, err := unstructured.NestedStringMap(spec, selectorPath...) | ||
| if err != nil || !found { | ||
| return nil, fmt.Errorf("missing or invalid selector field at %v (found=%t): %w", selectorPath, found, err) | ||
| } | ||
|
|
||
| // Create or update pod informer for this selector | ||
| av.createPodInformer(poolName, labelSelector.AsSelector()) | ||
| return labels.Set(selectorData).AsSelector(), nil |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Kubernetes, does an empty ListOptions.labelSelector match all resources? Please cite official Kubernetes API docs.
💡 Result:
Yes, in the Kubernetes API, an empty label selector effectively matches all resources [1][2][3]. When performing a list or watch operation, the label selector in ListOptions is used to filter resources. If the label selector is not specified or is left empty, the filter is effectively a no-op, meaning it does not restrict the returned set of objects, and all resources are returned [4][5][6]. Official Kubernetes API documentation defines a LabelSelector as a query over a set of resources; specifically, it states that "an empty label selector matches all objects" [1][2][3]. Sources: [1] https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/ [2] https://kubernetes.io/docs/reference/kubernetes-api/definitions/label-selector-v1-meta/ [4] https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ [3] https://kubernetes.io/docs/reference/kubernetes-api/definitions/ [5] kubernetes/kubectl#545 [6] https://kubernetes.website.cncfstack.com/docs/reference/kubernetes-api/common-parameters/common-parameters/
Citations:
- 1: https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/
- 2: https://kubernetes.io/docs/reference/kubernetes-api/definitions/label-selector-v1-meta/
- 3: https://kubernetes.io/docs/reference/kubernetes-api/definitions/
- 4: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
- 5: kubectl get pod -n default -l " " output behaviour kubernetes/kubectl#545
- 6: https://kubernetes.website.cncfstack.com/docs/reference/kubernetes-api/common-parameters/common-parameters/
🏁 Script executed:
cd pkg/sidecar/proxy && wc -l allowlist.goRepository: opendatahub-io/llm-d-router
Length of output: 88
🏁 Script executed:
cd pkg/sidecar/proxy && sed -n '275,310p' allowlist.goRepository: opendatahub-io/llm-d-router
Length of output: 1422
🏁 Script executed:
cd pkg/sidecar/proxy && sed -n '282,320p' allowlist.goRepository: opendatahub-io/llm-d-router
Length of output: 1444
🏁 Script executed:
cd pkg/sidecar/proxy && rg -n "createPodInformer|fromRawExtension" allowlist.go -B 3 -A 3Repository: opendatahub-io/llm-d-router
Length of output: 577
Reject empty selector maps to prevent namespace-wide allowlisting (CWE-284, CWE-918).
Line 280 accepts empty selectorData. An empty selector.String() results in an empty LabelSelector passed to List/Watch operations at lines 300 and 308. Per Kubernetes API documentation, an empty label selector matches all resources in the namespace, bypassing the intended SSRF allowlist scope.
Suggested fix
selectorData, found, err := unstructured.NestedStringMap(spec, selectorPath...)
if err != nil || !found {
return nil, fmt.Errorf("missing or invalid selector field at %v (found=%t): %w", selectorPath, found, err)
}
+if len(selectorData) == 0 {
+ return nil, fmt.Errorf("empty selector field at %v", selectorPath)
+}
return labels.Set(selectorData).AsSelector(), nil🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/sidecar/proxy/allowlist.go` around lines 275 - 280, The code currently
accepts empty selectorData which results in an empty label selector that matches
all resources in a namespace, creating a security vulnerability. After
successfully retrieving selectorData from the spec using
unstructured.NestedStringMap (when the found check passes), add a validation to
reject empty selectorData maps by checking if len(selectorData) == 0 and
returning an appropriate error message before the return statement that calls
labels.Set(selectorData).AsSelector(). This ensures the allowlist scope is
properly enforced and prevents namespace-wide resource access.
Source: Coding guidelines
| otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{})) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find global propagator mutations and check whether each has cleanup/restore.
rg -n -C3 --type=go 'SetTextMapPropagator\s*\('Repository: opendatahub-io/llm-d-router
Length of output: 1840
🏁 Script executed:
#!/bin/bash
# Check if request_test.go has cleanup for its SetTextMapPropagator call
rg -n -A 20 'SetTextMapPropagator' pkg/epp/handlers/request_test.go | head -40Repository: opendatahub-io/llm-d-router
Length of output: 640
🏁 Script executed:
#!/bin/bash
# Look for t.Cleanup patterns in test files to understand cleanup practices
rg -n 't\.Cleanup' --type=go pkg/sidecar/proxy/transport_test.go pkg/epp/handlers/request_test.goRepository: opendatahub-io/llm-d-router
Length of output: 53
🏁 Script executed:
#!/bin/bash
# View the full test function in request_test.go to check for cleanup
sed -n '35,80p' pkg/epp/handlers/request_test.goRepository: opendatahub-io/llm-d-router
Length of output: 1235
🏁 Script executed:
#!/bin/bash
# View the full test function in transport_test.go to confirm the issue
sed -n '31,60p' pkg/sidecar/proxy/transport_test.goRepository: opendatahub-io/llm-d-router
Length of output: 1181
🏁 Script executed:
#!/bin/bash
# Check if there are ANY tests in the codebase using proper cleanup for global state
rg -B5 -A5 'prev.*otel|t\.Cleanup.*otel' --type=goRepository: opendatahub-io/llm-d-router
Length of output: 945
Restore the global OTel propagator after this test.
Line 34 mutates process-global state and does not reset it, creating order-dependent failures in other tests. The same issue exists in pkg/epp/handlers/request_test.go:36. The codebase already establishes the correct pattern in pkg/common/observability/tracing/telemetry_test.go: save the original state and restore it in t.Cleanup().
Suggested remediation
func TestNewProxyTransportInjectsTraceContext(t *testing.T) {
+ prev := otel.GetTextMapPropagator()
+ t.Cleanup(func() { otel.SetTextMapPropagator(prev) })
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{})) | |
| func TestNewProxyTransportInjectsTraceContext(t *testing.T) { | |
| prev := otel.GetTextMapPropagator() | |
| t.Cleanup(func() { otel.SetTextMapPropagator(prev) }) | |
| otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{})) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/sidecar/proxy/transport_test.go` around lines 34 - 35, The test mutates
global OTel propagator state via otel.SetTextMapPropagator without restoring it,
causing order-dependent failures in other tests. Save the original propagator
returned by otel.GetTextMapPropagator before calling otel.SetTextMapPropagator,
then restore it in t.Cleanup() following the pattern already established in
pkg/common/observability/tracing/telemetry_test.go. This ensures the global
state is properly reset after the test completes.
| EPP_IMAGE="${EPP_IMAGE:-ghcr.io/llm-d/llm-d-router-endpoint-picker:dev}" | ||
| SIM_IMAGE="${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}" | ||
| MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../test/testdata/sim-deployment.yaml}" | ||
| MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../testdata/sim-deployment.yaml}" |
There was a problem hiding this comment.
Fix simulator image override key mismatch.
Line 25 assigns SIM_IMAGE from VLLM_IMAGE, so SIM_IMAGE=... overrides are ignored.
Proposed fix
-SIM_IMAGE="${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}"
+SIM_IMAGE="${SIM_IMAGE:-${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| EPP_IMAGE="${EPP_IMAGE:-ghcr.io/llm-d/llm-d-router-endpoint-picker:dev}" | |
| SIM_IMAGE="${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}" | |
| MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../test/testdata/sim-deployment.yaml}" | |
| MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../testdata/sim-deployment.yaml}" | |
| EPP_IMAGE="${EPP_IMAGE:-ghcr.io/llm-d/llm-d-router-endpoint-picker:dev}" | |
| SIM_IMAGE="${SIM_IMAGE:-${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}}" | |
| MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../testdata/sim-deployment.yaml}" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/scripts/test-e2e-gaie.sh` around lines 24 - 26, The SIM_IMAGE variable
assignment uses the wrong environment variable reference. On the line where
SIM_IMAGE is being set with a default value, it currently references
${VLLM_IMAGE:-...} as the fallback, but it should reference ${SIM_IMAGE:-...}
instead. Change the variable name in the parameter expansion from VLLM_IMAGE to
SIM_IMAGE so that environment variable overrides for SIM_IMAGE are properly
honored.
| # Set trap only for interruption signals. | ||
| # Normally kind cluster cleanup is done by AfterSuite; this trap deletes the | ||
| # e2e-tests cluster on Ctrl-C. The delete is unconditional and only meaningful | ||
| # when the suite created that cluster (K8S_CONTEXT unset); when running against | ||
| # an existing context it is a no-op unless a cluster named e2e-tests happens to | ||
| # exist. | ||
| trap 'e2e_handle_interrupt "e2e-tests"' INT TERM |
There was a problem hiding this comment.
Gate interrupt cleanup to owned clusters only (CWE-706).
Line 30 hard-codes e2e-tests in the trap. On SIGINT/SIGTERM, this can delete an unrelated local kind cluster with the same name.
Proposed fix
-# Set trap only for interruption signals.
-# Normally kind cluster cleanup is done by AfterSuite; this trap deletes the
-# e2e-tests cluster on Ctrl-C. The delete is unconditional and only meaningful
-# when the suite created that cluster (K8S_CONTEXT unset); when running against
-# an existing context it is a no-op unless a cluster named e2e-tests happens to
-# exist.
-trap 'e2e_handle_interrupt "e2e-tests"' INT TERM
+# Set trap only for interruption signals.
+# Only mark e2e-tests as owned when not targeting an existing context.
+CREATED_CLUSTER=""
+if [ -z "${K8S_CONTEXT:-}" ]; then
+ CREATED_CLUSTER="e2e-tests"
+fi
+trap 'e2e_handle_interrupt "${CREATED_CLUSTER}"' INT TERM📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Set trap only for interruption signals. | |
| # Normally kind cluster cleanup is done by AfterSuite; this trap deletes the | |
| # e2e-tests cluster on Ctrl-C. The delete is unconditional and only meaningful | |
| # when the suite created that cluster (K8S_CONTEXT unset); when running against | |
| # an existing context it is a no-op unless a cluster named e2e-tests happens to | |
| # exist. | |
| trap 'e2e_handle_interrupt "e2e-tests"' INT TERM | |
| # Set trap only for interruption signals. | |
| # Only mark e2e-tests as owned when not targeting an existing context. | |
| CREATED_CLUSTER="" | |
| if [ -z "${K8S_CONTEXT:-}" ]; then | |
| CREATED_CLUSTER="e2e-tests" | |
| fi | |
| trap 'e2e_handle_interrupt "${CREATED_CLUSTER}"' INT TERM |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/scripts/test-e2e-router.sh` around lines 24 - 30, The trap handler for
INT and TERM signals hard-codes the cluster name "e2e-tests" and unconditionally
calls e2e_handle_interrupt, which can delete an unrelated cluster with the same
name if the script is running against an existing K8S_CONTEXT. Gate the cleanup
to only occur when the script actually created the cluster by conditionally
setting the trap or passing a flag to e2e_handle_interrupt that indicates
ownership. The cleanup should only be active when K8S_CONTEXT is unset (meaning
the script created the cluster), not when using an existing context.
Syncs llm-d/llm-d-router main into opendatahub-io/llm-d-router main.
Upstream commit: llm-d@f7e8852
Summary by CodeRabbit
New Features
Documentation
Chores