[sync] upstream llm-d/llm-d-router f7e88524 [2026-06-22] by zdtsw · Pull Request #261 · opendatahub-io/llm-d-router

zdtsw · 2026-06-22T06:24:49Z

Syncs llm-d/llm-d-router main into opendatahub-io/llm-d-router main.

Upstream commit: llm-d@f7e8852

Summary by CodeRabbit

New Features
- Added OpenTelemetry tracing instrumentation to outbound proxy HTTP requests for improved observability.
Documentation
- New container sizing guide for router deployment with CPU/memory scaling recommendations.
- Updated architecture documentation with additional resource references.
Chores
- Updated llm-d-kv-cache dependency to stable release.
- Refactored E2E testing infrastructure and router validation test suite.
- Simplified Helm chart configuration for standalone deployments.

…m-d#1667) Signed-off-by: Guangya Liu <gyliu513@gmail.com>

…-d#1616) add recommendation for agentic inference requests for CPU and memory. add recommendation for resource consumption for prefix block size add section on envoy proxy resource usage. Signed-off-by: Jacob Murry <jacobmurry@google.com>

Update github.com/llm-d/llm-d-kv-cache from v0.9.0-rc.1 to the released v0.9.0. Signed-off-by: Maroon Ayoub <mayoub@redhat.com>

…m-d#1696) The GA InferencePool API (inference.networking.k8s.io/v1) nests pod labels under spec.selector.matchLabels, while the deprecated alpha API (inference.networking.x-k8s.io/v1alpha2) uses a flat spec.selector map. The allowlist validator read spec.selector unconditionally and fmt.Sprintf'd nested map values into label strings, producing invalid selectors like "map[app.kubernetes.io/name:...]" that Kubernetes rejected for exceeding the 63-character label value limit. Branch the selector path based on the detected API group and use NestedStringMap for type-safe extraction. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

…is disabled (llm-d#1695) When router.inferencePool.create=false the EPP --endpoint-selector is rendered solely from router.modelServers.matchLabels. If matchLabels is omitted the selector renders empty and EPP rejects it at startup, because standalone mode requires either --pool-name or a non-empty --endpoint-selector. The existing matchLabels validation lives in _inferencepool.yaml, which only renders when an InferencePool is created, so the create=false path had no guard and produced a broken EPP deployment silently. Add the check to the standalone validations so helm template fails fast with an actionable message, and add a negative test case to hack/verify-helm.sh covering the reproducer from the issue. Fixes llm-d#1665 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>

* test: consolidate and rename e2e runner scripts Signed-off-by: roytman <roytman@il.ibm.com> * fix comments Signed-off-by: roytman <roytman@il.ibm.com> * Update test/scripts/test-e2e-router.sh Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> * Update test/scripts/e2e-common.sh Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>

… PR 2905 (llm-d#1600) * fix(standalone): rework agentgateway configuration to use derived pseudo-service Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com> * chore: trigger ci re-run Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com> * fix(standalone): update agentgateway validation error and add verification assertions Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com> --------- Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com>

Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>

…ting (llm-d#1683) Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>

coderabbitai · 2026-06-22T06:25:06Z

📝 Walkthrough

Walkthrough

This PR replaces the scheduler-based e2e test infrastructure with a router-based one: a shared e2e-common.sh helper is introduced, a new test-e2e-router.sh script is added, the test-e2e-gaie.sh script is refactored to use the shared helpers, and Makefile targets plus CI workflow are updated accordingly. The agentgateway Helm configuration is refactored to derive backend service names and ports from modelServers labels rather than explicit agentgateway.service.* values, removing the agentgateway Service template and tightening validations. The sidecar proxy's outbound HTTP transport is wrapped with otelhttp.NewTransport for W3C trace context propagation. AllowlistValidator.poolSelector is introduced to handle both GA (spec.selector.matchLabels) and deprecated alpha (spec.selector flat map) InferencePool API shapes. Supporting changes include a new docs/operations.md sizing guide, a go.mod bump to stable v0.9.0, a yq path fix in push-chart.sh, and an OWNED_IMAGE_RE regex fix.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 8 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
No Injection Vectors	⚠️ Warning	CWE-78 shell command injection in hack/verify-helm.sh: Multiple eval statements execute dynamically constructed commands with the HELM environment variable, which can contain arbitrary shell code i...	Replace eval statements with bash arrays and direct execution: cmd=("${HELM}" template ...); "${cmd[@]}" to eliminate shell metacharacter expansion.
Title check	❓ Inconclusive	The PR title uses a generic sync/upstream marker format that does not convey the substantive changes included in this pull request.	Replace with a title describing the main functional changes, such as 'Add router e2e tests, fix InferencePool selector parsing, and inject W3C trace context' or similar.

✅ Passed checks (8 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Contribution Quality And Spam Detection	✅ Passed	Legitimate upstream sync: coherent refactorings across config/code/tests; real security fixes (CWE-284, CWE-78, CWE-706, CWE-20/918) with test coverage; zero spam/AI signals detected.
No Hardcoded Secrets	✅ Passed	No hardcoded secrets, API keys, tokens, passwords, or embedded credentials detected across all modified files; Kubernetes secrets properly referenced.
No Weak Cryptography	✅ Passed	No banned cryptographic primitives (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons found. TLS uses strong ECDHE ciphers with TLS 1...
No Privileged Containers	✅ Passed	PR contains only justified privileged configurations: Istio control plane sidecar proxy debugged conditionally on IPTABLES_TRACE_LOGGING flag, and production containers run as non-root (USER 65532).
No Sensitive Data In Logs	✅ Passed	No logging statements expose passwords, tokens, API keys, PII, session IDs, bearer tokens, or customer data. Structured logging of pod names/IPs and infrastructure identifiers is acceptable for int...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/sidecar/proxy/proxy.go (1)
404-433: ⚠️ Potential issue | 🔴 Critical

Harden TLS verification in decoder proxy transport (CWE-295).

Line 421 directly passes user-configurable insecureSkipVerify to tls.Config.InsecureSkipVerify. Operators can enable this via --tls-insecure-skip-verify=decoder or YAML config, disabling certificate validation on decoder (vLLM) HTTPS traffic. Enables MITM attacks: prompt tampering, response injection, and data theft.
Remediation
 func (s *Server) newProxyTransport(scheme string, insecureSkipVerify bool) http.RoundTripper {
 	if scheme == schemeHTTPS {
+		// Certificate verification is mandatory. Operators must use proper CA roots.
 		t.TLSClientConfig = &tls.Config{
-			InsecureSkipVerify: insecureSkipVerify, //nolint:gosec
+			InsecureSkipVerify: false,
 			MinVersion:         tls.VersionTLS12,
Per **/*.go security guideline: "No InsecureSkipVerify in TLS configs (enables MITM attacks)."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/sidecar/proxy/proxy.go` around lines 404 - 433, The newProxyTransport
method directly passes the user-configurable insecureSkipVerify parameter to
tls.Config.InsecureSkipVerify, which violates security guidelines and enables
MITM attacks. Remove the insecureSkipVerify parameter from the newProxyTransport
function signature and ensure that InsecureSkipVerify in the tls.Config is
always set to false (or omitted entirely, since false is the default). This
prevents operators from disabling certificate validation on decoder HTTPS
traffic regardless of configuration options like --tls-insecure-skip-verify.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/charts/llm-d-router-standalone/templates/_validations.tpl`:
- Around line 21-27: The current validation in the _validations.tpl template
only checks if matchLabels exists, but does not validate the specific
matchLabels.app field which is required for agentgateway backend naming. When
router.proxy.proxyType is set to agentgateway and router.inferencePool.create is
false (standalone mode), add an additional validation check after the existing
matchLabels check to ensure that matchLabels.app is present and valid. This
validation should fail with a clear error message if matchLabels.app is missing
or empty when using agentgateway in standalone mode, preventing silent routing
errors at runtime.

In `@docs/operations.md`:
- Around line 111-136: The Helm configuration example in the documentation
includes a router.epp.resources section that does not exist in the actual chart
schema. Remove the entire EPP container resources block (lines showing epp: with
its nested resources, requests for cpu and memory, and limits) and retain only
the valid router.proxy.resources configuration block. This will ensure users
copying the example get valid YAML that matches the actual chart structure which
only supports router.proxy resources configuration.

In `@hack/verify-helm.sh`:
- Around line 151-186: The code uses `eval` to execute dynamically constructed
command strings, creating a command injection vulnerability. Replace each
command variable (missing_endpoint_selector_command, invalid_proxy_command,
deprecated_agentgateway_service_command,
unsupported_agentgateway_llm_d_router_gateway_command,
unsupported_agentgateway_listener_port_command,
mismatched_agentgateway_listener_target_port_command) with bash arrays instead
of strings, and execute them using the array syntax `if "${cmd_array[@]}"
>/dev/null; then` instead of `if eval "${cmd_string}"; then`. This eliminates
the security risk while preserving the same command execution behavior.

In `@pkg/sidecar/proxy/allowlist_test.go`:
- Around line 46-177: The poolSelector Context is missing regression tests for
empty selector fail-closed behavior. Add two new test cases within the
poolSelector Context: one that verifies the poolSelector method on an
AllowlistValidator returns an error when the GA InferencePool has an empty
matchLabels map in its spec.selector field, and another that verifies it returns
an error when the alpha InferencePool has an empty selector map in its spec
field. Both test cases should follow the existing error case pattern (using the
same pool setup structure and Expect assertions for HaveOccurred and error
message validation) to ensure match-all selector regressions are prevented.

In `@pkg/sidecar/proxy/allowlist.go`:
- Around line 253-257: The early return at lines 253-257 when av.poolSelector
fails bypasses the informer replacement and stop path that occurs at lines
288-295, creating a security vulnerability where a malformed pool update can
leave the previous selector active and keep stale targets authorized. Instead of
returning immediately on selector extraction error, execute the same informer
teardown logic used in the informer replacement path (including stopping and
clearing the informer) plus recompute the target state to ensure the system
fails closed rather than leaving stale authorization rules in place.
- Around line 275-280: The code currently accepts empty selectorData which
results in an empty label selector that matches all resources in a namespace,
creating a security vulnerability. After successfully retrieving selectorData
from the spec using unstructured.NestedStringMap (when the found check passes),
add a validation to reject empty selectorData maps by checking if
len(selectorData) == 0 and returning an appropriate error message before the
return statement that calls labels.Set(selectorData).AsSelector(). This ensures
the allowlist scope is properly enforced and prevents namespace-wide resource
access.

In `@pkg/sidecar/proxy/transport_test.go`:
- Around line 34-35: The test mutates global OTel propagator state via
otel.SetTextMapPropagator without restoring it, causing order-dependent failures
in other tests. Save the original propagator returned by
otel.GetTextMapPropagator before calling otel.SetTextMapPropagator, then restore
it in t.Cleanup() following the pattern already established in
pkg/common/observability/tracing/telemetry_test.go. This ensures the global
state is properly reset after the test completes.

In `@test/scripts/test-e2e-gaie.sh`:
- Around line 24-26: The SIM_IMAGE variable assignment uses the wrong
environment variable reference. On the line where SIM_IMAGE is being set with a
default value, it currently references ${VLLM_IMAGE:-...} as the fallback, but
it should reference ${SIM_IMAGE:-...} instead. Change the variable name in the
parameter expansion from VLLM_IMAGE to SIM_IMAGE so that environment variable
overrides for SIM_IMAGE are properly honored.

In `@test/scripts/test-e2e-router.sh`:
- Around line 24-30: The trap handler for INT and TERM signals hard-codes the
cluster name "e2e-tests" and unconditionally calls e2e_handle_interrupt, which
can delete an unrelated cluster with the same name if the script is running
against an existing K8S_CONTEXT. Gate the cleanup to only occur when the script
actually created the cluster by conditionally setting the trap or passing a flag
to e2e_handle_interrupt that indicates ownership. The cleanup should only be
active when K8S_CONTEXT is unset (meaning the script created the cluster), not
when using an existing context.

---

Outside diff comments:
In `@pkg/sidecar/proxy/proxy.go`:
- Around line 404-433: The newProxyTransport method directly passes the
user-configurable insecureSkipVerify parameter to tls.Config.InsecureSkipVerify,
which violates security guidelines and enables MITM attacks. Remove the
insecureSkipVerify parameter from the newProxyTransport function signature and
ensure that InsecureSkipVerify in the tls.Config is always set to false (or
omitted entirely, since false is the default). This prevents operators from
disabling certificate validation on decoder HTTPS traffic regardless of
configuration options like --tls-insecure-skip-verify.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1241f0e4-d5b4-4890-a93b-743af2c3acdf

📥 Commits

Reviewing files that changed from the base of the PR and between bb5f430 and c085bd7.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum, !**/*.sum

📒 Files selected for processing (22)

.github/workflows/ci-pr-checks.yaml
Makefile
README.md
config/charts/README.md
config/charts/llm-d-router-standalone/templates/_validations.tpl
config/charts/llm-d-router-standalone/templates/agentgateway-service.yaml
config/charts/llm-d-router-standalone/values.yaml
config/charts/routerlib/templates/_helpers.tpl
docs/architecture.md
docs/operations.md
go.mod
hack/push-chart.sh
hack/verify-helm.sh
pkg/sidecar/proxy/allowlist.go
pkg/sidecar/proxy/allowlist_test.go
pkg/sidecar/proxy/proxy.go
pkg/sidecar/proxy/transport_test.go
scripts/check-latest-tags.sh
test/scripts/e2e-common.sh
test/scripts/run_e2e.sh
test/scripts/test-e2e-gaie.sh
test/scripts/test-e2e-router.sh

💤 Files with no reviewable changes (2)

config/charts/llm-d-router-standalone/templates/agentgateway-service.yaml
test/scripts/run_e2e.sh

coderabbitai · 2026-06-22T06:37:18Z

+{{- /* Without an InferencePool the EPP --endpoint-selector is rendered from modelServers.matchLabels; an empty selector is rejected by EPP at startup, so require it here. */ -}}
+{{- $useInferencePool := ne .Values.router.inferencePool.create false -}}
+{{- if not $useInferencePool -}}
+  {{- if or (empty .Values.router.modelServers) (not .Values.router.modelServers.matchLabels) -}}
+    {{- fail ".Values.router.modelServers.matchLabels is required when .Values.router.inferencePool.create=false: standalone mode renders the EPP --endpoint-selector from matchLabels and cannot start with an empty selector" -}}
+  {{- end -}}
+{{- end -}}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Require and validate modelServers.matchLabels.app for agentgateway backend naming (CWE-20).

Line 21-Line 27 only enforces that matchLabels exists. In this PR, agentgateway backend identity is derived from matchLabels.app; if app is missing/invalid, routing can silently target the wrong backend name at runtime. Add a hard fail for missing/invalid matchLabels.app when router.proxy.proxyType=agentgateway and router.inferencePool.create=false.

Proposed validation hardening

{{- if not $useInferencePool -}} {{- if or (empty .Values.router.modelServers) (not .Values.router.modelServers.matchLabels) -}} {{- fail ".Values.router.modelServers.matchLabels is required when .Values.router.inferencePool.create=false: standalone mode renders the EPP --endpoint-selector from matchLabels and cannot start with an empty selector" -}} {{- end -}} + {{- if eq ($proxy.proxyType | default "envoy") "agentgateway" -}} + {{- $appLabel := index (.Values.router.modelServers.matchLabels | default dict) "app" | default "" -}} + {{- if empty $appLabel -}} + {{- fail ".Values.router.modelServers.matchLabels.app is required when proxyType=agentgateway and inferencePool.create=false" -}} + {{- end -}} + {{- if or (gt (len $appLabel) 63) (not (regexMatch "^[a-z0-9]([-a-z0-9]*[a-z0-9])?$" $appLabel)) -}} + {{- fail ".Values.router.modelServers.matchLabels.app must be a DNS-1123 label for agentgateway backend naming" -}} + {{- end -}} + {{- end -}} {{- end -}}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@config/charts/llm-d-router-standalone/templates/_validations.tpl` around lines 21 - 27, The current validation in the _validations.tpl template only checks if matchLabels exists, but does not validate the specific matchLabels.app field which is required for agentgateway backend naming. When router.proxy.proxyType is set to agentgateway and router.inferencePool.create is false (standalone mode), add an additional validation check after the existing matchLabels check to ensure that matchLabels.app is present and valid. This validation should fail with a clear error message if matchLabels.app is missing or empty when using agentgateway in standalone mode, preventing silent routing errors at runtime.

coderabbitai · 2026-06-22T06:37:18Z

+```yaml
+router:
+  # Endpoint Picker (EPP) Container Resources
+  epp:
+    resources:
+      requests:
+        cpu: "32"
+        memory: "64Gi"
+      limits:
+        memory: "128Gi"
+
+  # Envoy Proxy Container Resources
+  proxy:
+    resources:
+      requests:
+        cpu: "8"
+        memory: "2Gi"
+      limits:
+        memory: "4Gi"
+```
+
+To apply these values during deployment, run the Helm install or upgrade command with your custom values file:
+
+```bash
+helm install optimize-baseline ./config/charts/llm-d-router-standalone -f resource_overrides.yaml
+```


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify Helm values.yaml schema contains router.epp.resources and router.proxy.resources keys. # Check for epp resources path rg -n 'router:' config/charts/llm-d-router-standalone/values.yaml -A 30 | grep -E '(epp:|proxy:|resources:)' # Confirm exact nesting structure yq '.router.epp.resources' config/charts/llm-d-router-standalone/values.yaml 2>/dev/null || echo "Path does not exist" yq '.router.proxy.resources' config/charts/llm-d-router-standalone/values.yaml 2>/dev/null || echo "Path does not exist"

Repository: opendatahub-io/llm-d-router

Length of output: 181

🏁 Script executed:

cat -n config/charts/llm-d-router-standalone/values.yaml | head -60

Repository: opendatahub-io/llm-d-router

Length of output: 2190

🏁 Script executed:

# Search for 'epp:' in the values.yaml file rg -n 'epp:' config/charts/llm-d-router-standalone/values.yaml

Repository: opendatahub-io/llm-d-router

Length of output: 53

🏁 Script executed:

# Get the complete router section structure rg -n '^ [a-z]' config/charts/llm-d-router-standalone/values.yaml | head -20

Repository: opendatahub-io/llm-d-router

Length of output: 124

🏁 Script executed:

# Get full values.yaml to understand all top-level keys under router cat config/charts/llm-d-router-standalone/values.yaml | grep -E '^ [a-z]' | sort -u

Repository: opendatahub-io/llm-d-router

Length of output: 117

Remove invalid router.epp.resources configuration from Helm example; retain only valid router.proxy.resources.

The chart values.yaml does not contain a router.epp key. The documented configuration at lines 111–136 references router.epp.resources which does not exist in the chart schema. Users copying this example will have invalid YAML that fails deployment. The router.proxy.resources configuration is valid; remove the non-existent EPP block.

Actual chart structure

router: modelServers: extraServicePorts: proxy: resources: # <-- only this path exists requests: cpu: "4" memory: 8Gi limits: memory: 16Gi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/operations.md` around lines 111 - 136, The Helm configuration example in the documentation includes a router.epp.resources section that does not exist in the actual chart schema. Remove the entire EPP container resources block (lines showing epp: with its nested resources, requests for cpu and memory, and limits) and retain only the valid router.proxy.resources configuration block. This will ensure users copying the example get valid YAML that matches the actual chart structure which only supports router.proxy resources configuration.

coderabbitai · 2026-06-22T06:37:18Z

+missing_endpoint_selector_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.inferencePool.create=false --set router.modelServers.type=vllm --set 'router.modelServers.targetPorts[0].number=8000' >/dev/null"
+echo "Executing: ${missing_endpoint_selector_command}"
+if eval "${missing_endpoint_selector_command}"; then
+  echo "Helm template unexpectedly succeeded for inferencePool.create=false without modelServers.matchLabels"
+  exit 1
+fi
+
 invalid_proxy_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set router.proxy.proxyType=bogus >/dev/null"
 echo "Executing: ${invalid_proxy_command}"
 if eval "${invalid_proxy_command}"; then
  echo "Helm template unexpectedly succeeded for invalid proxyType"
  exit 1
 fi

-missing_agentgateway_service_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set router.proxy.proxyType=agentgateway >/dev/null"
-echo "Executing: ${missing_agentgateway_service_command}"
-if eval "${missing_agentgateway_service_command}"; then
-  echo "Helm template unexpectedly succeeded for missing agentgateway service.name"
+deprecated_agentgateway_service_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=foo >/dev/null"
+echo "Executing: ${deprecated_agentgateway_service_command}"
+if eval "${deprecated_agentgateway_service_command}"; then
+  echo "Helm template unexpectedly succeeded for deprecated agentgateway.service configuration"
  exit 1
 fi

-unsupported_agentgateway_llm_d_router_gateway_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=llm-instance-gateway --set 'router.proxy.agentgateway.service.ports[0]=8000' --set router.inferencePool.create=true --set router.modelServers.matchLabels.app=llm-instance-gateway >/dev/null"
+unsupported_agentgateway_llm_d_router_gateway_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.inferencePool.create=true --set router.modelServers.matchLabels.app=llm-instance-gateway >/dev/null"
 echo "Executing: ${unsupported_agentgateway_llm_d_router_gateway_command}"
 if eval "${unsupported_agentgateway_llm_d_router_gateway_command}"; then
  echo "Helm template unexpectedly succeeded for unsupported agentgateway createInferencePool=true configuration"
  exit 1
 fi

-mismatched_agentgateway_ports_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=llm-instance-gateway --set 'router.proxy.agentgateway.service.ports[0]=8001' --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' >/dev/null"
-echo "Executing: ${mismatched_agentgateway_ports_command}"
-if eval "${mismatched_agentgateway_ports_command}"; then
-  echo "Helm template unexpectedly succeeded for mismatched agentgateway service.ports"
-  exit 1
-fi
-
-unsupported_agentgateway_listener_port_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=llm-instance-gateway --set 'router.proxy.agentgateway.service.ports[0]=8000' --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' --set 'router.extraServicePorts[0].name=proxy' --set 'router.extraServicePorts[0].port=9000' --set 'router.extraServicePorts[0].protocol=TCP' --set 'router.extraServicePorts[0].targetPort=9000' >/dev/null"
+unsupported_agentgateway_listener_port_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' --set 'router.extraServicePorts[0].name=proxy' --set 'router.extraServicePorts[0].port=9000' --set 'router.extraServicePorts[0].protocol=TCP' --set 'router.extraServicePorts[0].targetPort=9000' >/dev/null"
 echo "Executing: ${unsupported_agentgateway_listener_port_command}"
 if eval "${unsupported_agentgateway_listener_port_command}"; then
  echo "Helm template unexpectedly succeeded without an agentgateway listener Service port named http"
  exit 1
 fi

-mismatched_agentgateway_listener_target_port_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.proxy.agentgateway.service.name=llm-instance-gateway --set 'router.proxy.agentgateway.service.ports[0]=8000' --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' --set 'router.extraServicePorts[0].name=http' --set 'router.extraServicePorts[0].port=9000' --set 'router.extraServicePorts[0].protocol=TCP' --set 'router.extraServicePorts[0].targetPort=9001' >/dev/null"
+mismatched_agentgateway_listener_target_port_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.proxy.proxyType=agentgateway --set router.modelServers.matchLabels.app=llm-instance-gateway --set router.inferencePool.create=false --set 'router.modelServers.targetPorts[0].number=8000' --set 'router.extraServicePorts[0].name=http' --set 'router.extraServicePorts[0].port=9000' --set 'router.extraServicePorts[0].protocol=TCP' --set 'router.extraServicePorts[0].targetPort=9001' >/dev/null"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove eval command execution in test invocations (CWE-78).

Line 153, Line 160, Line 167, Line 174, and Line 181 execute dynamically built strings with eval. That creates avoidable command-injection surface and quoting fragility in CI scripts. Execute commands via arrays and "$\{cmd[@]\}" instead.

Safer execution pattern (apply to each command block)

-missing_endpoint_selector_command="${HELM} template ${SCRIPT_ROOT}/config/charts/llm-d-router-standalone --set router.inferencePool.create=false --set router.modelServers.type=vllm --set 'router.modelServers.targetPorts[0].number=8000' >/dev/null" -echo "Executing: ${missing_endpoint_selector_command}" -if eval "${missing_endpoint_selector_command}"; then +missing_endpoint_selector_cmd=( + "${HELM}" template "${SCRIPT_ROOT}/config/charts/llm-d-router-standalone" + --set router.inferencePool.create=false + --set router.modelServers.type=vllm + --set 'router.modelServers.targetPorts[0].number=8000' +) +echo "Executing: ${missing_endpoint_selector_cmd[*]}" +if "${missing_endpoint_selector_cmd[@]}" >/dev/null; then echo "Helm template unexpectedly succeeded for inferencePool.create=false without modelServers.matchLabels" exit 1 fi

As per coding guidelines, **/*.sh: “Avoid eval and dynamic command execution” and “Quote all variables to prevent injection.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hack/verify-helm.sh` around lines 151 - 186, The code uses `eval` to execute dynamically constructed command strings, creating a command injection vulnerability. Replace each command variable (missing_endpoint_selector_command, invalid_proxy_command, deprecated_agentgateway_service_command, unsupported_agentgateway_llm_d_router_gateway_command, unsupported_agentgateway_listener_port_command, mismatched_agentgateway_listener_target_port_command) with bash arrays instead of strings, and execute them using the array syntax `if "${cmd_array[@]}" >/dev/null; then` instead of `if eval "${cmd_string}"; then`. This eliminates the security risk while preserving the same command execution behavior.

Source: Coding guidelines

coderabbitai · 2026-06-22T06:37:18Z

+	Context("poolSelector", func() {
+		It("should extract selector from GA InferencePool (matchLabels)", func() {
+			av := &AllowlistValidator{
+				gvr: schema.GroupVersionResource{
+					Group:    routing.InferencePoolAPIGroup,
+					Version:  "v1",
+					Resource: "inferencepools",
+				},
+			}
+			pool := &unstructured.Unstructured{
+				Object: map[string]interface{}{
+					"apiVersion": "inference.networking.k8s.io/v1",
+					"kind":       "InferencePool",
+					"metadata":   map[string]interface{}{"name": "test-pool"},
+					"spec": map[string]interface{}{
+						"selector": map[string]interface{}{
+							"matchLabels": map[string]interface{}{
+								"app.kubernetes.io/name": "my-model",
+								"component":              "serving",
+							},
+						},
+					},
+				},
+			}
+
+			selector, err := av.poolSelector(pool)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(selector.String()).To(SatisfyAll(
+				ContainSubstring("app.kubernetes.io/name=my-model"),
+				ContainSubstring("component=serving"),
+			))
+		})
+
+		It("should extract selector from deprecated alpha InferencePool (flat map)", func() {
+			av := &AllowlistValidator{
+				gvr: schema.GroupVersionResource{
+					Group:    "inference.networking.x-k8s.io",
+					Version:  "v1alpha2",
+					Resource: "inferencepools",
+				},
+			}
+			pool := &unstructured.Unstructured{
+				Object: map[string]interface{}{
+					"apiVersion": "inference.networking.x-k8s.io/v1alpha2",
+					"kind":       "InferencePool",
+					"metadata":   map[string]interface{}{"name": "test-pool"},
+					"spec": map[string]interface{}{
+						"selector": map[string]interface{}{
+							"app.kubernetes.io/name": "my-model",
+							"component":              "serving",
+						},
+					},
+				},
+			}
+
+			selector, err := av.poolSelector(pool)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(selector.String()).To(SatisfyAll(
+				ContainSubstring("app.kubernetes.io/name=my-model"),
+				ContainSubstring("component=serving"),
+			))
+		})
+
+		It("should fail for GA pool with flat selector (no matchLabels)", func() {
+			av := &AllowlistValidator{
+				gvr: schema.GroupVersionResource{
+					Group:    routing.InferencePoolAPIGroup,
+					Version:  "v1",
+					Resource: "inferencepools",
+				},
+			}
+			pool := &unstructured.Unstructured{
+				Object: map[string]interface{}{
+					"apiVersion": "inference.networking.k8s.io/v1",
+					"kind":       "InferencePool",
+					"metadata":   map[string]interface{}{"name": "test-pool"},
+					"spec": map[string]interface{}{
+						"selector": map[string]interface{}{
+							"app": "my-model",
+						},
+					},
+				},
+			}
+
+			_, err := av.poolSelector(pool)
+			Expect(err).To(HaveOccurred())
+			Expect(err.Error()).To(ContainSubstring("matchLabels"))
+		})
+
+		It("should fail when spec is missing", func() {
+			av := &AllowlistValidator{
+				gvr: schema.GroupVersionResource{
+					Group:    routing.InferencePoolAPIGroup,
+					Version:  "v1",
+					Resource: "inferencepools",
+				},
+			}
+			pool := &unstructured.Unstructured{
+				Object: map[string]interface{}{
+					"apiVersion": "inference.networking.k8s.io/v1",
+					"kind":       "InferencePool",
+					"metadata":   map[string]interface{}{"name": "test-pool"},
+				},
+			}
+
+			_, err := av.poolSelector(pool)
+			Expect(err).To(HaveOccurred())
+			Expect(err.Error()).To(ContainSubstring("spec"))
+		})
+
+		It("should fail when selector is missing", func() {
+			av := &AllowlistValidator{
+				gvr: schema.GroupVersionResource{
+					Group:    "inference.networking.x-k8s.io",
+					Version:  "v1alpha2",
+					Resource: "inferencepools",
+				},
+			}
+			pool := &unstructured.Unstructured{
+				Object: map[string]interface{}{
+					"apiVersion": "inference.networking.x-k8s.io/v1alpha2",
+					"kind":       "InferencePool",
+					"metadata":   map[string]interface{}{"name": "test-pool"},
+					"spec":       map[string]interface{}{},
+				},
+			}
+
+			_, err := av.poolSelector(pool)
+			Expect(err).To(HaveOccurred())
+			Expect(err.Error()).To(ContainSubstring("selector"))
+		})
+	})


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add regression tests for empty selector fail-closed behavior (CWE-284).

Current cases validate missing fields but not matchLabels: {} (GA) or selector: {} (alpha). Add explicit assertions that these return errors, so match-all selector regressions are blocked.

Suggested test additions

Context("poolSelector", func() { + It("should fail for GA pool with empty matchLabels", func() { + av := &AllowlistValidator{ + gvr: schema.GroupVersionResource{ + Group: routing.InferencePoolAPIGroup, + Version: "v1", + Resource: "inferencepools", + }, + } + pool := &unstructured.Unstructured{ + Object: map[string]interface{}{ + "apiVersion": "inference.networking.k8s.io/v1", + "kind": "InferencePool", + "metadata": map[string]interface{}{"name": "test-pool"}, + "spec": map[string]interface{}{ + "selector": map[string]interface{}{ + "matchLabels": map[string]interface{}{}, + }, + }, + }, + } + _, err := av.poolSelector(pool) + Expect(err).To(HaveOccurred()) + }) + + It("should fail for alpha pool with empty selector", func() { + av := &AllowlistValidator{ + gvr: schema.GroupVersionResource{ + Group: "inference.networking.x-k8s.io", + Version: "v1alpha2", + Resource: "inferencepools", + }, + } + pool := &unstructured.Unstructured{ + Object: map[string]interface{}{ + "apiVersion": "inference.networking.x-k8s.io/v1alpha2", + "kind": "InferencePool", + "metadata": map[string]interface{}{"name": "test-pool"}, + "spec": map[string]interface{}{ + "selector": map[string]interface{}{}, + }, + }, + } + _, err := av.poolSelector(pool) + Expect(err).To(HaveOccurred()) + })

As per coding guidelines, "**: REVIEW PRIORITIES: Security vulnerabilities (provide severity, exploit scenario, and remediation code)."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/sidecar/proxy/allowlist_test.go` around lines 46 - 177, The poolSelector Context is missing regression tests for empty selector fail-closed behavior. Add two new test cases within the poolSelector Context: one that verifies the poolSelector method on an AllowlistValidator returns an error when the GA InferencePool has an empty matchLabels map in its spec.selector field, and another that verifies it returns an error when the alpha InferencePool has an empty selector map in its spec field. Both test cases should follow the existing error case pattern (using the same pool setup structure and Expect assertions for HaveOccurred and error message validation) to ensure match-all selector regressions are prevented.

Source: Coding guidelines

coderabbitai · 2026-06-22T06:37:18Z

+	selector, err := av.poolSelector(poolObj)
+	if err != nil {
+		av.logger.Error(err, "failed to extract selector from InferencePool", "name", poolName)
 		return
 	}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Fail-open informer state on selector parse errors (CWE-284).

Line 253-Line 257 returns before the informer replacement/stop path (Line 288-Line 295), so a malformed pool update can leave the previous selector active and keep stale targets authorized by SSRF checks.

Use the same teardown path used for informer replacement (plus target-state recomputation) before returning, so this path fails closed.

As per coding guidelines, "**: REVIEW PRIORITIES: Security vulnerabilities (provide severity, exploit scenario, and remediation code)."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/sidecar/proxy/allowlist.go` around lines 253 - 257, The early return at lines 253-257 when av.poolSelector fails bypasses the informer replacement and stop path that occurs at lines 288-295, creating a security vulnerability where a malformed pool update can leave the previous selector active and keep stale targets authorized. Instead of returning immediately on selector extraction error, execute the same informer teardown logic used in the informer replacement path (including stopping and clearing the informer) plus recompute the target state to ensure the system fails closed rather than leaving stale authorization rules in place.

Source: Coding guidelines

coderabbitai · 2026-06-22T06:37:18Z

+	selectorData, found, err := unstructured.NestedStringMap(spec, selectorPath...)
+	if err != nil || !found {
+		return nil, fmt.Errorf("missing or invalid selector field at %v (found=%t): %w", selectorPath, found, err)
 	}

-	// Create or update pod informer for this selector
-	av.createPodInformer(poolName, labelSelector.AsSelector())
+	return labels.Set(selectorData).AsSelector(), nil


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In Kubernetes, does an empty ListOptions.labelSelector match all resources? Please cite official Kubernetes API docs.

💡 Result:

Yes, in the Kubernetes API, an empty label selector effectively matches all resources [1][2][3]. When performing a list or watch operation, the label selector in ListOptions is used to filter resources. If the label selector is not specified or is left empty, the filter is effectively a no-op, meaning it does not restrict the returned set of objects, and all resources are returned [4][5][6]. Official Kubernetes API documentation defines a LabelSelector as a query over a set of resources; specifically, it states that "an empty label selector matches all objects" [1][2][3]. Sources: [1] https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/ [2] https://kubernetes.io/docs/reference/kubernetes-api/definitions/label-selector-v1-meta/ [4] https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ [3] https://kubernetes.io/docs/reference/kubernetes-api/definitions/ [5] kubernetes/kubectl#545 [6] https://kubernetes.website.cncfstack.com/docs/reference/kubernetes-api/common-parameters/common-parameters/

Citations:

1: https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/

2: https://kubernetes.io/docs/reference/kubernetes-api/definitions/label-selector-v1-meta/

3: https://kubernetes.io/docs/reference/kubernetes-api/definitions/

4: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

5: kubectl get pod -n default -l " " output behaviour kubernetes/kubectl#545

6: https://kubernetes.website.cncfstack.com/docs/reference/kubernetes-api/common-parameters/common-parameters/

🏁 Script executed:

cd pkg/sidecar/proxy && wc -l allowlist.go

Repository: opendatahub-io/llm-d-router

Length of output: 88

🏁 Script executed:

cd pkg/sidecar/proxy && sed -n '275,310p' allowlist.go

Repository: opendatahub-io/llm-d-router

Length of output: 1422

🏁 Script executed:

cd pkg/sidecar/proxy && sed -n '282,320p' allowlist.go

Repository: opendatahub-io/llm-d-router

Length of output: 1444

🏁 Script executed:

cd pkg/sidecar/proxy && rg -n "createPodInformer|fromRawExtension" allowlist.go -B 3 -A 3

Repository: opendatahub-io/llm-d-router

Length of output: 577

Reject empty selector maps to prevent namespace-wide allowlisting (CWE-284, CWE-918).

Line 280 accepts empty selectorData. An empty selector.String() results in an empty LabelSelector passed to List/Watch operations at lines 300 and 308. Per Kubernetes API documentation, an empty label selector matches all resources in the namespace, bypassing the intended SSRF allowlist scope.

Suggested fix

selectorData, found, err := unstructured.NestedStringMap(spec, selectorPath...) if err != nil || !found { return nil, fmt.Errorf("missing or invalid selector field at %v (found=%t): %w", selectorPath, found, err) } +if len(selectorData) == 0 { + return nil, fmt.Errorf("empty selector field at %v", selectorPath) +} return labels.Set(selectorData).AsSelector(), nil

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/sidecar/proxy/allowlist.go` around lines 275 - 280, The code currently accepts empty selectorData which results in an empty label selector that matches all resources in a namespace, creating a security vulnerability. After successfully retrieving selectorData from the spec using unstructured.NestedStringMap (when the found check passes), add a validation to reject empty selectorData maps by checking if len(selectorData) == 0 and returning an appropriate error message before the return statement that calls labels.Set(selectorData).AsSelector(). This ensures the allowlist scope is properly enforced and prevents namespace-wide resource access.

Source: Coding guidelines

coderabbitai · 2026-06-22T06:37:18Z

+	otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find global propagator mutations and check whether each has cleanup/restore. rg -n -C3 --type=go 'SetTextMapPropagator\s*\('

Repository: opendatahub-io/llm-d-router

Length of output: 1840

🏁 Script executed:

#!/bin/bash # Check if request_test.go has cleanup for its SetTextMapPropagator call rg -n -A 20 'SetTextMapPropagator' pkg/epp/handlers/request_test.go | head -40

Repository: opendatahub-io/llm-d-router

Length of output: 640

🏁 Script executed:

#!/bin/bash # Look for t.Cleanup patterns in test files to understand cleanup practices rg -n 't\.Cleanup' --type=go pkg/sidecar/proxy/transport_test.go pkg/epp/handlers/request_test.go

Repository: opendatahub-io/llm-d-router

Length of output: 53

🏁 Script executed:

#!/bin/bash # View the full test function in request_test.go to check for cleanup sed -n '35,80p' pkg/epp/handlers/request_test.go

Repository: opendatahub-io/llm-d-router

Length of output: 1235

🏁 Script executed:

#!/bin/bash # View the full test function in transport_test.go to confirm the issue sed -n '31,60p' pkg/sidecar/proxy/transport_test.go

Repository: opendatahub-io/llm-d-router

Length of output: 1181

🏁 Script executed:

#!/bin/bash # Check if there are ANY tests in the codebase using proper cleanup for global state rg -B5 -A5 'prev.*otel|t\.Cleanup.*otel' --type=go

Repository: opendatahub-io/llm-d-router

Length of output: 945

Restore the global OTel propagator after this test.

Line 34 mutates process-global state and does not reset it, creating order-dependent failures in other tests. The same issue exists in pkg/epp/handlers/request_test.go:36. The codebase already establishes the correct pattern in pkg/common/observability/tracing/telemetry_test.go: save the original state and restore it in t.Cleanup().

Suggested remediation

func TestNewProxyTransportInjectsTraceContext(t *testing.T) { + prev := otel.GetTextMapPropagator() + t.Cleanup(func() { otel.SetTextMapPropagator(prev) }) otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))

func TestNewProxyTransportInjectsTraceContext(t *testing.T) {

prev := otel.GetTextMapPropagator()

t.Cleanup(func() { otel.SetTextMapPropagator(prev) })

otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/sidecar/proxy/transport_test.go` around lines 34 - 35, The test mutates global OTel propagator state via otel.SetTextMapPropagator without restoring it, causing order-dependent failures in other tests. Save the original propagator returned by otel.GetTextMapPropagator before calling otel.SetTextMapPropagator, then restore it in t.Cleanup() following the pattern already established in pkg/common/observability/tracing/telemetry_test.go. This ensures the global state is properly reset after the test completes.

coderabbitai · 2026-06-22T06:37:19Z

 EPP_IMAGE="${EPP_IMAGE:-ghcr.io/llm-d/llm-d-router-endpoint-picker:dev}"
 SIM_IMAGE="${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}"
-MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../test/testdata/sim-deployment.yaml}"
+MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../testdata/sim-deployment.yaml}"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix simulator image override key mismatch.

Line 25 assigns SIM_IMAGE from VLLM_IMAGE, so SIM_IMAGE=... overrides are ignored.

Proposed fix

-SIM_IMAGE="${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}" +SIM_IMAGE="${SIM_IMAGE:-${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

EPP_IMAGE="${EPP_IMAGE:-ghcr.io/llm-d/llm-d-router-endpoint-picker:dev}"

SIM_IMAGE="${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}"

MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../test/testdata/sim-deployment.yaml}"

MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../testdata/sim-deployment.yaml}"

EPP_IMAGE="${EPP_IMAGE:-ghcr.io/llm-d/llm-d-router-endpoint-picker:dev}"

SIM_IMAGE="${SIM_IMAGE:-${VLLM_IMAGE:-ghcr.io/llm-d/llm-d-inference-sim:v0.9.2}}"

MANIFEST_PATH="${MANIFEST_PATH:-${DIR}/../testdata/sim-deployment.yaml}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/scripts/test-e2e-gaie.sh` around lines 24 - 26, The SIM_IMAGE variable assignment uses the wrong environment variable reference. On the line where SIM_IMAGE is being set with a default value, it currently references ${VLLM_IMAGE:-...} as the fallback, but it should reference ${SIM_IMAGE:-...} instead. Change the variable name in the parameter expansion from VLLM_IMAGE to SIM_IMAGE so that environment variable overrides for SIM_IMAGE are properly honored.

coderabbitai · 2026-06-22T06:37:19Z

+# Set trap only for interruption signals.
+# Normally kind cluster cleanup is done by AfterSuite; this trap deletes the
+# e2e-tests cluster on Ctrl-C. The delete is unconditional and only meaningful
+# when the suite created that cluster (K8S_CONTEXT unset); when running against
+# an existing context it is a no-op unless a cluster named e2e-tests happens to
+# exist.
+trap 'e2e_handle_interrupt "e2e-tests"' INT TERM


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Gate interrupt cleanup to owned clusters only (CWE-706).

Line 30 hard-codes e2e-tests in the trap. On SIGINT/SIGTERM, this can delete an unrelated local kind cluster with the same name.

Proposed fix

-# Set trap only for interruption signals. -# Normally kind cluster cleanup is done by AfterSuite; this trap deletes the -# e2e-tests cluster on Ctrl-C. The delete is unconditional and only meaningful -# when the suite created that cluster (K8S_CONTEXT unset); when running against -# an existing context it is a no-op unless a cluster named e2e-tests happens to -# exist. -trap 'e2e_handle_interrupt "e2e-tests"' INT TERM +# Set trap only for interruption signals. +# Only mark e2e-tests as owned when not targeting an existing context. +CREATED_CLUSTER="" +if [ -z "${K8S_CONTEXT:-}" ]; then + CREATED_CLUSTER="e2e-tests" +fi +trap 'e2e_handle_interrupt "${CREATED_CLUSTER}"' INT TERM

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Set trap only for interruption signals.

# Normally kind cluster cleanup is done by AfterSuite; this trap deletes the

# e2e-tests cluster on Ctrl-C. The delete is unconditional and only meaningful

# when the suite created that cluster (K8S_CONTEXT unset); when running against

# an existing context it is a no-op unless a cluster named e2e-tests happens to

# exist.

trap 'e2e_handle_interrupt "e2e-tests"' INT TERM

# Set trap only for interruption signals.

# Only mark e2e-tests as owned when not targeting an existing context.

CREATED_CLUSTER=""

if [ -z "${K8S_CONTEXT:-}" ]; then

CREATED_CLUSTER="e2e-tests"

fi

trap 'e2e_handle_interrupt "${CREATED_CLUSTER}"' INT TERM

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/scripts/test-e2e-router.sh` around lines 24 - 30, The trap handler for INT and TERM signals hard-codes the cluster name "e2e-tests" and unconditionally calls e2e_handle_interrupt, which can delete an unrelated cluster with the same name if the script is running against an existing K8S_CONTEXT. Gate the cleanup to only occur when the script actually created the cluster by conditionally setting the trap or passing a flag to e2e_handle_interrupt that indicates ownership. The cleanup should only be active when K8S_CONTEXT is unset (meaning the script created the cluster), not when using an existing context.

gyliu513 and others added 9 commits June 18, 2026 15:44

fix(sidecar): inject W3C trace context on outbound proxy requests (ll…

bdbf7cb

…m-d#1667) Signed-off-by: Guangya Liu <gyliu513@gmail.com>

deps(go): bump llm-d-kv-cache to v0.9.0 (llm-d#1691)

48b7dff

Update github.com/llm-d/llm-d-kv-cache from v0.9.0-rc.1 to the released v0.9.0. Signed-off-by: Maroon Ayoub <mayoub@redhat.com>

fix(scripts): support single quotes in owned image check (llm-d#1676)

8b6f5fe

Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>

fix(chart): update latency predictor image path to match subchart nes…

f7e8852

…ting (llm-d#1683) Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>

github-actions Bot added the size/XL label Jun 22, 2026

Sync upstream llm-d/llm-d-router f7e8852

c085bd7

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sync] upstream llm-d/llm-d-router f7e88524 [2026-06-22]#261

[sync] upstream llm-d/llm-d-router f7e88524 [2026-06-22]#261
zdtsw wants to merge 10 commits into
opendatahub-io:mainfrom
zdtsw-forking:sync/upstream-f7e88524

zdtsw commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

		otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))

-	otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
+func TestNewProxyTransportInjectsTraceContext(t *testing.T) {
+	prev := otel.GetTextMapPropagator()
+	t.Cleanup(func() { otel.SetTextMapPropagator(prev) })
+	otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))

Conversation

zdtsw commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

zdtsw commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading