[sync] upstream llm-d/llm-d-router 1fa3803d [2026-06-28] by zdtsw · Pull Request #262 · opendatahub-io/llm-d-router

zdtsw · 2026-06-28T07:50:52Z

Syncs llm-d/llm-d-router main into opendatahub-io/llm-d-router main.

Upstream commit: llm-d@1fa3803

Summary by CodeRabbit

New Features
- Added support for leader election/HA deployment settings and release artifact uploads in the release workflow.
- Expanded flow-control with a new priority queue, clearer queue semantics, and support for distinguishing “no endpoints available” rejections.
- Added request max-output-token handling across supported parsers and token estimation.
Bug Fixes
- Improved metrics naming and release-note generation consistency.
- Updated CI and docs for newer checkout behavior and refreshed release guidance.

Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>

…d#1572) GitHub Actions runners share public IPs that frequently hit the HF Hub anonymous API quota, causing 429 errors during vllm-render model downloads. This accounts for a large fraction of e2e flakes. Propagate HF_TOKEN from a repo secret into the e2e builder container so the vllm-render sidecar and the model-existence check authenticate against HF Hub. Authenticated requests have a substantially higher rate limit than anonymous ones. Changes: - Add HF_TOKEN to the Makefile E2E_ENV_VARS forwarding list - Read HF_TOKEN from the environment in setup_test.go instead of hardcoding an empty string - Attach a Bearer token header to the isModelReal HF API call in utils_test.go - Pass secrets.HF_TOKEN in both e2e-gaie and e2e-router CI steps The secret must be created as a GitHub repo secret. If the secret is not set, behavior is unchanged (empty token, anonymous requests). Closes llm-d#1527 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>

Signed-off-by: Cong Liu <conliu@google.com>

…3 updates (llm-d#1715) Bumps the github-actions group with 3 updates in the / directory: [actions/checkout](https://github.com/actions/checkout), [actions/setup-go](https://github.com/actions/setup-go) and [azure/setup-kubectl](https://github.com/azure/setup-kubectl). Updates `actions/checkout` from 6 to 7 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v6...v7) Updates `actions/setup-go` from 5 to 6 - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@v5...v6) Updates `azure/setup-kubectl` from 4 to 5 - [Release notes](https://github.com/azure/setup-kubectl/releases) - [Changelog](https://github.com/Azure/setup-kubectl/blob/main/CHANGELOG.md) - [Commits](Azure/setup-kubectl@v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/setup-go dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: azure/setup-kubectl dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps the go-dependencies group with 1 update: [github.com/jellydator/ttlcache/v3](https://github.com/jellydator/ttlcache). Updates `github.com/jellydator/ttlcache/v3` from 3.4.0 to 3.4.1 - [Release notes](https://github.com/jellydator/ttlcache/releases) - [Commits](jellydator/ttlcache@v3.4.0...v3.4.1) --- updated-dependencies: - dependency-name: github.com/jellydator/ttlcache/v3 dependency-version: 3.4.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…#1720) Bumps golang.org/x/crypto from v0.51.0 to v0.52.0 to fix 8 HIGH severity CVEs (CVE-2026-39827 through CVE-2026-46598) that caused the sidecar image Trivy scan to fail in CI. Signed-off-by: Cong Liu <conliu@google.com>

@shmuelk

llm-d#1690) * refactor: single-ended queue backed by heap The dispatch SafeQueue was double-ended, but nothing consumed PeekTail other than the flowQueueAccessor passthrough itself: dispatch needs only the head, and eviction uses an independent ordering. With PeekTail gone the queue is single-ended, so the bespoke max-min heap that gave O(1) access to both ends is no longer justified. - Remove the unused PeekTail from the SafeQueue contract and all implementers, mocks, and tests; rename PeekHead -> Peek now that there is a single end. - Replace the hand-rolled max-min heap (maxminheap.go) with a standard library container/heap priority queue (priorityqueue.go), eliminating the custom bubble-up/down logic and the latent bug class it carried. - Drop the side handle map: membership is validated by index identity (a *heapItem lives in exactly one queue), preserving the contract's Remove error semantics with less state. O(log n) targeted removal is retained via per-item index tracking. - Rename the registered queue MaxMinHeap -> PriorityQueue. Signed-off-by: Luke Van Drie <lukevandrie@google.com> * refactor(flowcontrol): apply review feedback on the priority queue Addresses review comments from @shmuelk on llm-d#1690: - Correct the new-file license headers to "2026 The llm-d Authors". - Clarify the PriorityQueue doc: ordering is maintained by an internal container/heap (the queue does not itself implement heap.Interface). - Narrow the Add and Remove critical sections to the shared heap mutation. Precondition checks (nil, type assertion) and the atomic byteSize update move outside the lock; the isInvalidated and index reads stay inside it, since those fields are written under the lock by Remove/Cleanup/Drain and reading them lock-free would race. - Compact Cleanup survivors in place instead of allocating a second slice, clearing the vacated tail so removed items are not retained. Signed-off-by: Luke Van Drie <lukevandrie@google.com> --------- Signed-off-by: Luke Van Drie <lukevandrie@google.com>

Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>

…1723) - remove explicitly config to use 8200 in deployment and test - keep local dev test with 8001 Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Evaluate inotify limit on the kind node, not the host The Prometheus config-reloader inotify pre-flight check ran an unconditional cat of /proc/sys/fs/inotify/max_user_instances on the host, before the cluster existed. On macOS that proc file is absent (kind runs inside the container-runtime VM), so the check aborted env-dev-kind with PROM_ENABLED=true. The limit that applies to pods is the kind node's kernel: the host kernel on Linux, the runtime VM kernel on macOS. Move the check to after the node container is created and read the value from inside it, which is correct on both platforms. Update the remediation message to cover raising the limit on the host (Linux) and in the runtime VM (macOS). Signed-off-by: roytman <roytman@il.ibm.com> * fix copilot comments Signed-off-by: roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com>

* feat(inflightloadproducer): allow configurable output ratio Signed-off-by: mohammadkhan <mohammadkhan@digitalocean.com> * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Mohammad Ashar Khan <140482588+asharkhan3101@users.noreply.github.com> * ci: fix copilot line removal Signed-off-by: mohammadkhan <mohammadkhan@digitalocean.com> --------- Signed-off-by: mohammadkhan <mohammadkhan@digitalocean.com> Signed-off-by: Mohammad Ashar Khan <140482588+asharkhan3101@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>

…tive (llm-d#1728) * Support enabling and disabling feature flags Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Removed deprecated dataLayer feature flag Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated feature flag registration with default value Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated tests WRT new feature flag features Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

…ences (llm-d#1733) * fix(metrics): reset flow control metrics and correct deprecated references Add Reset() calls for flowControlDispatchCycleDuration and llmdFlowControlDispatchCycleDuration, which were registered but never reset. Update help strings of deprecated metrics to point to the actual metric names (e.g. using llm_d_epp_request_size_bytes instead of llm_d_epp_request_sizes). Signed-off-by: Ishwar <ishwarcm@iitbhilai.ac.in> * test(metrics): update expectations for deprecated metric references Signed-off-by: Ishwar <ishwarcm@iitbhilai.ac.in> --------- Signed-off-by: Ishwar <ishwarcm@iitbhilai.ac.in>

…llm-d#1747) * test(flowcontrol): add regression test for eager ManagedQueue resolution Verify that processAllQueuesConcurrently resolves ManagedQueue handles in Phase 1 (during IterateQueues) rather than deferring to Phase 3 workers. Deferred resolution races with flow GC, which can delete the flow between collection and lookup. Uses a phase-aware mock: ManagedQueueFunc succeeds while IterateQueues is running and fails after it returns, so the test passes only when resolution happens eagerly inside the callback. Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * fix(flowcontrol): resolve ManagedQueue eagerly and fix FlowKey.String for negative priorities Resolve ManagedQueue handles in Phase 1 of processAllQueuesConcurrently (inside the IterateQueues callback) instead of deferring to Phase 3 workers. This eliminates a race where flow GC deletes a flow between collection and lookup, producing spurious error logs. Fix FlowKey.String() to use strconv.Itoa instead of FormatUint so negative priorities render correctly in logs. Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * fix(flowcontrol): improve variable naming and trim test comment Rename work to resolvedQueues and t to task for clarity. Condense test comment to remove pseudocode arrows. Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * fix(flowcontrol): log skipped GC'd flows at DEBUG level Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * fix(flowcontrol): include error in debug log for skipped queues Address review feedback: include the error value in the debug log so ErrPriorityBandNotFound (real inconsistency) is distinguishable from ErrFlowInstanceNotFound (benign GC race). Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> --------- Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

…lm-d#1716) Signed-off-by: Cong Liu <conliu@google.com>

* release: set LATENCY_PREDICTOR_TAG to v0.8.0-rc.1 Signed-off-by: Cong Liu <conliu@google.com> * deps(go): bump llm-d-kv-cache to v0.9.0 (llm-d#1691) Update github.com/llm-d/llm-d-kv-cache from v0.9.0-rc.1 to the released v0.9.0. Signed-off-by: Maroon Ayoub <mayoub@redhat.com> Signed-off-by: Cong Liu <conliu@google.com> * fix(chart): update latency predictor image path to match subchart nesting (llm-d#1683) Signed-off-by: weizhoublue <weizhou.lan@daocloud.io> Signed-off-by: Cong Liu <conliu@google.com> * consume official latency predictor v0.8 release (llm-d#1729) Signed-off-by: greg pereira <grpereir@redhat.com> * ci: add artifacts target and upload-artifacts job for CRDs (llm-d#1713) Signed-off-by: Cong Liu <conliu@google.com> * fix: bump golang.org/x/crypto to v0.52.0 to resolve Trivy CVEs (llm-d#1720) Bumps golang.org/x/crypto from v0.51.0 to v0.52.0 to fix 8 HIGH severity CVEs (CVE-2026-39827 through CVE-2026-46598) that caused the sidecar image Trivy scan to fail in CI. Signed-off-by: Cong Liu <conliu@google.com> * docs: assemble release notes for v0.9.0 Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> * revert the latency predictor change Signed-off-by: Cong Liu <conliu@google.com> * Update push-chart.sh Signed-off-by: Cong Liu <conliu@google.com> --------- Signed-off-by: Cong Liu <conliu@google.com> Signed-off-by: Maroon Ayoub <mayoub@redhat.com> Signed-off-by: weizhoublue <weizhou.lan@daocloud.io> Signed-off-by: greg pereira <grpereir@redhat.com> Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: Cong Liu <conliu@google.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: weizhoublue <45163302+weizhoublue@users.noreply.github.com> Co-authored-by: Greg Pereira <grpereir@redhat.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>

Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>

…ap (llm-d#1739) * fix(inflightload): prevent in-flight counter underflow on endpoint flap The per-endpoint in-flight request/token counters are a +1/-1 ledger keyed only by NamespacedName. PreRequest increments via the tracker map; release (OnEvicted/releaseTokensEarly) re-looked-up the counter by endpoint ID and decremented whatever instance was currently mapped. When an endpoint flaps (NotReady -> EventDelete drops the counter -> rejoins under the same NamespacedName -> new traffic recreates it), an in-flight request's release landed on the recreated counter instead of the one it incremented, driving it negative. This surfaced as negative pool saturation (observed ~ -1.4), amplified when a pod crashed while holding many in-flight requests. The existing registeredEndpoints pointer-identity guard only drops out-of-order delete events for an already-replaced pointer; it does not cover the normal flap sequence, so the underflow persisted. - Capture a direct pointer to the exact *atomic.Int64 each request increments, and decrement that instance on release. A stale release after a flap now lands harmlessly on the orphaned counter; the live (recreated) counter only ever sees its own request's +1/-1. - Remove the now-unnecessary addIfPresent/decIfPresent 'is it present' guard. - Add a decrementClamped CAS helper (mirroring the canonical predictedlatency.decrementEndpointCounter primitive) that floors decrements at 0 as defense-in-depth, so saturation can never go negative even under a future regression. - Add regression tests for the delete-then-recreate flap and the crash-with-high-load drain (both fail without the fix), plus coverage for the stale-delete guard. Tests use the realistic AddOrUpdate-then-Delete ordering. Signed-off-by: Luke Van Drie <lukevandrie@google.com> * test(inflightload): table-driven flap test covering token counters Consolidate the per-request flap regression into a single table-driven test and extend coverage to the token counter, which receives the same instance-capture fix but was previously only asserted via the request counter. Cases cover both counters across both release paths: - request counter via the EndOfStream eviction path (OnEvicted) - token counter via the EndOfStream eviction path (OnEvicted) - token counter via the StartOfStream early-release path (releaseTokensEarly) Each case runs the same flap sequence (join, track A, delete + rejoin, track B, release A then B) and asserts A's release lands on the orphaned counter, leaving B's live counter accurate and never negative. Signed-off-by: Luke Van Drie <lukevandrie@google.com> --------- Signed-off-by: Luke Van Drie <lukevandrie@google.com>

…#1751) * fix(flowcontrol): return 503 (not 429) when no endpoints exist When the candidate pool has zero endpoints (e.g. scaled to zero), the flow control queue acts as a scale-from-zero waiting room: requests are buffered until capacity comes online. Previously, a request that overflowed the queue in that state was finalized as QueueOutcomeRejectedCapacity and surfaced as HTTP 429, painting a picture of partial availability when there is none. The mix of 429 (capacity overflow) and 503 (TTL expiry) for the same "no backends" condition was confusing for upstreams. Distinguish this case with a new QueueOutcomeRejectedNoEndpoints outcome that the admission layer maps to 503 Service Unavailable (with an x-llm-d-request-dropped-reason: rejected-no-endpoints header), matching upstream EPP behavior. The processor caches pool emptiness from the dispatch cycle and consults it when rejecting at capacity. Scale-from-zero buffering is preserved; only the terminal error code changes, and only when the pool is empty. TTL eviction already returns 503 and is unchanged. Fixes llm-d#1735 Signed-off-by: Luke Van Drie <lukevandrie@google.com> * test(flowcontrol): cover empty-pool no-endpoints classification The capacity-enforcement integration tests block dispatch with a saturated detector and an empty endpoint pool, then assert QueueOutcomeRejectedCapacity. With the no-endpoints fix, a queue-capacity rejection on an empty pool now classifies as QueueOutcomeRejectedNoEndpoints, so these counts broke. The detector ignores the pool, so giving those tests a single placeholder endpoint keeps dispatch blocked while restoring the RejectedCapacity classification they intend to exercise. Add TestEmptyPoolRejectsAsNoEndpoints to cover the scale-from-zero path end-to-end: with an empty pool, a full-queue rejection is surfaced as RejectedNoEndpoints (wrapping ErrNoEndpoints), never RejectedCapacity. Signed-off-by: Luke Van Drie <lukevandrie@google.com> --------- Signed-off-by: Luke Van Drie <lukevandrie@google.com>

…r caps (llm-d#1750) * feat(requesthandling): normalize client max output tokens across parsers Add InferenceRequestBody.MaxOutputTokens, a single normalized view of the client-requested output-token cap, populated by each parser from its API-specific field: - OpenAI completions: max_tokens - OpenAI chat completions: max_completion_tokens, falling back to max_tokens - OpenAI responses: max_output_tokens - Anthropic messages: max_tokens - vLLM gRPC/HTTP generate: SamplingParams.max_tokens A shared MaxOutputTokensFromPayload helper reads the value from the already -retained payload map with per-API key precedence, treating an explicit 0 as a binding cap and ignoring absent/negative/non-integral values. Downstream plugins can now bound behavior by the client's requested output limit. Signed-off-by: Luke Van Drie <lukevandrie@google.com> * feat(inflightload): bound output estimate by client and operator caps The in-flight load producer estimated output tokens as round(input * ratio), which badly overstates output for high-input/low-output workloads (e.g. long -context summarization). Bound the estimate two ways: - Client cap: when the request carries an output limit (now surfaced as Body.MaxOutputTokens), use it as an upper bound on the estimate. - Operator cap: new maxEstimatedOutputTokens config caps the estimate regardless of input or client value (validated non-negative). EstimateOutput now takes the client cap; the producer threads the request through estimateRequestTokens. Estimate becomes min(round(input*ratio), clientCap?, operatorCap?). Signed-off-by: Luke Van Drie <lukevandrie@google.com> * refactor(requesthandling): polish max-output-tokens handling Address review feedback on the max-output-tokens change: - MaxOutputTokensFromPayload now falls back to the next key when a present key holds a malformed value (wrong type, negative, non-integral) instead of returning nil. A chat request with an invalid max_completion_tokens but a valid max_tokens now correctly uses the latter. - Resolve the OpenAI API type once in ParseRequest and pass it to extractRequestBody, removing a duplicate path parse + classification on the request hot path. - Use k8s.io/utils/ptr.To for *int64/*uint32 in tests instead of bespoke per-file pointer helpers, matching the rest of the codebase. Signed-off-by: Luke Van Drie <lukevandrie@google.com> --------- Signed-off-by: Luke Van Drie <lukevandrie@google.com>

Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>

…m-d#1699) * Added the ability to set the number of replicas and to enable leader selection Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added RBAC for leader selection Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Set extra variables for development runs Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Use the same version of kind in both e2e tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored functions for more flexibility in tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored metrics fetching Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Aliased additional IGW test utilities Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refcatored EPP creation for leadership tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored tests, added encoding and leadership tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Fixed lint error Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

…diffs (llm-d#1827) * fix: base release-notes assembly branch on main to exclude unrelated diffs The workflow previously created its branch from the tagged commit. When the tag diverged from main HEAD, the PR diff included unrelated changes (e.g. hack/push-chart.sh in llm-d#1688). Split the script into three phases: 1. Read fragments from the tagged commit (preserves the guard against post-tag fragments). 2. Fetch and branch from origin/main so the PR only diffs release-notes files against main. 3. Apply changes and remove only the fragments that exist on main. Signed-off-by: Cong Liu <conliu@google.com> * fix: add comment explaining fragment removal loop Signed-off-by: Cong Liu <conliu@google.com> * fix: skip release-notes assembly for RC tags Signed-off-by: Cong Liu <conliu@google.com> * fix: add comment for RC tag exclusion Signed-off-by: Cong Liu <conliu@google.com> --------- Signed-off-by: Cong Liu <conliu@google.com>

- 0.9.0 has main-dev set in annoataion llm-d.ai/bundle-version Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

Add documentation for the seven flow control metrics exposed when the flowControl feature gate is enabled: queue duration, dispatch cycle duration, enqueue duration, queue size, queue bytes, pool saturation, and requests total. Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> Co-authored-by: Luke Van Drie <lukevandrie@google.com>

…m-d#1701) Replace the string-based --endpoint-selector flag with Kubernetes-native labels.Selector to support set-based label expressions (e.g., 'env in (prod,staging),tier!=frontend', 'key,!deprecated') in standalone mode. - EndpointSelector type: string -> labels.Selector - Complete(): parse flag string via labels.Parse() - EndpointPool.Selector: map[string]string -> labels.Selector - PoolLabelsMatch simplified with native Selector.Matches() - podResyncAll passes Selector directly to K8s client - All callers and tests updated for new type Signed-off-by: shichaooutlook <shichao.outlook@gmail.com> Co-authored-by: shichaooutlook <268144704+shichaooutlook@users.noreply.github.com>

* fix: clear KV block index entries on endpoint removal On EventDelete, call Index.Clear with the pod's IP address to remove stale entries that reference blocks no longer cached on the removed pod. Fixes llm-d#1736 Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * Update pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go Co-authored-by: Bongwoo Bak <bongwoobak@gmail.com> Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com> * fix(test): match Clear assertion to IP:Port PodIdentifier format Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * Update pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go Co-authored-by: Maroon Ayoub <Maroonay@gmail.com> Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com> * Update pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go Co-authored-by: Maroon Ayoub <Maroonay@gmail.com> Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com> --------- Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com> Co-authored-by: Bongwoo Bak <bongwoobak@gmail.com> Co-authored-by: Maroon Ayoub <Maroonay@gmail.com>

coderabbitai · 2026-06-28T07:51:22Z

📝 Walkthrough

Walkthrough

This PR prepares the v0.9.0 release with several independent but coordinated changes: actions/checkout bumped to v7 across all workflows; EndpointPool.Selector changed from map[string]string to labels.Selector; RegisterFeatureGate extended with an isEnabledByDefault bool and name=value parsing; ExperimentalDatalayerFeatureGate removed. Queue interface consolidates PeekHead/PeekTail into Peek; MaxMinHeap replaced by a container/heap-backed PriorityQueue. Flow control adds ErrNoEndpoints/QueueOutcomeRejectedNoEndpoints mapping to HTTP 503. MaxOutputTokens propagated through all parsers and the token estimator. InFlightLoadProducer captures counter pointers at increment time to prevent underflow on endpoint flap. HA leader election added to deploy manifests with RBAC. BundleVersion becomes a var; artifacts Makefile target and a ci-release upload job added. E2E infrastructure gains leader-election failover tests and verifyMetrics. Metrics help strings updated to renamed llm_d_epp_request_* names.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

coderabbitai

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/epp/flowcontrol/contracts/mocks/mocks.go (1)

364-373: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make MockManagedQueue.Peek() return the actual head item.

This implementation violates the new Peek() contract by returning whichever item Go yields first from m.items. Map iteration order is nondeterministic, so any test that uses this mock to exercise queue ordering can become flaky or validate the wrong winner.

Possible fix

 func (m *MockManagedQueue) Peek() flowcontrol.QueueItemAccessor {
 	m.mu.Lock()
 	defer m.mu.Unlock()
 	m.init()
-	for _, item := range m.items {
-		return item // Return first item found
-	}
-	return nil // Queue is empty
+	var best flowcontrol.QueueItemAccessor
+	policy := m.OrderingPolicy()
+	for _, item := range m.items {
+		if best == nil {
+			best = item
+			continue
+		}
+		if policy != nil && policy.Less(item, best) {
+			best = item
+		}
+	}
+	return best
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/contracts/mocks/mocks.go` around lines 364 - 373,
`MockManagedQueue.Peek()` is using nondeterministic map iteration instead of
returning the queue head, which can make ordering tests flaky. Update the mock
in `MockManagedQueue.Peek` to select the same item that `Pop`/queue order would
treat as first, using the mock’s existing ordering state rather than ranging
over `m.items`. Keep the behavior aligned with the new `Peek()` contract and
preserve the nil case when the queue is empty.

🟠 Major comments (14)

.github/workflows/ci-release.yaml-55-57 (1)
55-57: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Gate upload-artifacts on build-and-push.

This job only depends on set-params, so it can create or update the GitHub release even when image/chart publication fails. That leaves release state split across systems: GitHub has tagged assets, while the registry publish path failed. Add build-and-push to needs so the release is only mutated after the runtime artifacts succeed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-release.yaml around lines 55 - 57, The upload-artifacts
job is only gated on set-params, so it can update the GitHub release before the
build-and-push path succeeds. Update the ci-release workflow so upload-artifacts
also depends on build-and-push, ensuring the release mutation only happens after
runtime artifact publication completes successfully.
.github/ISSUE_TEMPLATE/new-release.md-73-74 (1)
73-74: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

These commands never switch to the release branch.

git reset --hard ${REMOTE}/release-${BRANCH_VERSION} rewrites whatever branch the operator currently has checked out. If they are still on main, this resets main instead of checking out release-${BRANCH_VERSION}.
Suggested fix
- git fetch ${REMOTE}
- git reset --hard ${REMOTE}/release-${BRANCH_VERSION}
+ git fetch "${REMOTE}"
+ git checkout -B "release-${BRANCH_VERSION}" "${REMOTE}/release-${BRANCH_VERSION}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/ISSUE_TEMPLATE/new-release.md around lines 73 - 74, The release
setup commands currently reset the checked-out branch instead of switching to
the release branch, so update the workflow in the new-release template to
explicitly check out release-${BRANCH_VERSION} after git fetch ${REMOTE}. Adjust
the command sequence in the template section containing git fetch and git reset
so it creates or switches to the intended release branch rather than mutating
whatever branch is active.
.github/workflows/ci-release.yaml-74-91 (1)
74-91: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Move needs.set-params.outputs.tag into env:. ${{ needs.set-params.outputs.tag }} is interpolated directly into run: in both steps, creating a CWE-94 injection boundary for ref-derived data. Use a quoted shell variable instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-release.yaml around lines 74 - 91, The workflow step
uses needs.set-params.outputs.tag directly inside the run script, which creates
an injection risk for ref-derived data. Move the tag value into env for both
Build Artifacts and Upload Release Assets, then reference a quoted shell
variable inside the script instead of interpolating it inline. Update the shell
commands around the artifact build and gh release upload/create logic to use
that env-backed variable consistently.
.github/workflows/release-notes-assemble.yaml-92-106 (1)
92-106: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make the assembly step idempotent before prepending the new section.

This block always rewrites RELEASE-NOTES.md by prepending RELEASE ${TAG}. A manual rerun after the assembly PR has already merged will add the same tag section again instead of cleanly no-oping.
Remediation
           # Phase 3: apply release-notes changes on top of main
+          if grep -q "^RELEASE ${TAG} " RELEASE-NOTES.md 2>/dev/null; then
+            echo "Release notes for ${TAG} already assembled; exiting."
+            exit 0
+          fi
           touch RELEASE-NOTES.md
           cat "$new_section" RELEASE-NOTES.md > RELEASE-NOTES.md.new
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release-notes-assemble.yaml around lines 92 - 106, The
release-notes assembly step in the workflow should be idempotent: before the
`cat "$new_section" RELEASE-NOTES.md > RELEASE-NOTES.md.new` prepend in the
Phase 3 block, check whether `RELEASE-NOTES.md` already contains the `RELEASE
${TAG}` section and skip the rewrite if it does. Update the logic around the
existing `new_section`/`RELEASE-NOTES.md` merge so rerunning the job after the
assembly PR has merged becomes a clean no-op instead of duplicating the same
release section.
.github/workflows/unstale.yaml-31-32 (1)
31-32: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Remove the unused checkout step (CWE-829) at .github/workflows/unstale.yaml:31-32. gh issue edit does not need a repository checkout, so actions/checkout@v7 only adds an avoidable unpinned third-party action to a job with issues: write and pull-requests: write. Delete it; if it must stay, pin it by full SHA.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/unstale.yaml around lines 31 - 32, The unstale workflow
includes an unnecessary repository checkout step, which adds an extra
third-party action without being used by the job. Remove the checkout step from
the workflow job that runs gh issue edit, or if the checkout is still required
for some other reason, update the actions/checkout usage in unstale.yaml to a
full commit SHA. Use the checkout step label and the gh issue edit job context
to locate the change.
Source: Path instructions
pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue.go-159-168 (1)
159-168: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Serialize byteSize with the heap mutation (CWE-362).

Add releases pq.mu before updating pq.byteSize. If Drain or Cleanup acquires the lock between Line 165 and Line 167, the heap can be emptied and byteSize reset to 0, then Add restores a stale byte count on an empty queue. That corrupts max-bytes accounting and can trigger false capacity rejections. Move the byte update into the same critical section as heap.Push.
Proposed fix
 func (pq *priorityQueue) Add(item flowcontrol.QueueItemAccessor) {
 	hi := &heapItem{item: item}
 	item.SetHandle(hi)

 	pq.mu.Lock()
 	heap.Push(pq.heap, hi)
-	pq.mu.Unlock()
-
 	pq.byteSize.Add(item.OriginalRequest().ByteSize())
+	pq.mu.Unlock()
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue.go` around lines
159 - 168, The Add method in priorityQueue updates byteSize outside the pq.mu
critical section, which can race with Drain or Cleanup and leave stale
accounting on an empty heap. Move the pq.byteSize.Add call into the same locked
section as heap.Push in priorityQueue.Add, using the existing pq.mu and heap
mutation path so heap state and byte accounting stay serialized.
pkg/epp/flowcontrol/controller/internal/processor.go-289-296 (1)
289-296: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Refresh endpoint emptiness before classifying capacity rejection.

poolEmpty is only updated in dispatchCycle, but Run calls enqueue before dispatchCycle for new items. Startup or endpoint flaps can use stale state here and return the wrong rejection outcome, mapping no-endpoints to capacity or capacity to no-endpoints.
Proposed fix
-			sp.enqueue(item)
+			sp.enqueue(ctx, item)
 			sp.dispatchCycle(ctx) // Process immediately when an item arrives
-func (sp *Processor) enqueue(item *FlowItem) {
+func (sp *Processor) enqueue(ctx context.Context, item *FlowItem) {
 	if ok, stats := sp.hasCapacity(key.Priority, req.ByteSize()); !ok {
+		pool := sp.endpointCandidates.Locate(ctx, nil)
+		sp.poolEmpty = len(pool) == 0
 		// When the pool has no endpoints, the queue is acting as a scale-from-zero waiting room. A capacity rejection in
 		// that state reflects genuine unavailability (surfaced as 503), not backpressure against a contended pool (429).
 		if sp.poolEmpty {
Update direct unit-test calls to pass context.Background().
Also applies to: 363-364
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/controller/internal/processor.go` around lines 289 - 296,
The rejection path in processor.go is using a stale sp.poolEmpty value inside
enqueue/classification logic, so update the endpoint-emptiness check before
deciding QueueOutcomeRejectedNoEndpoints versus capacity rejection. Refresh the
pool state in the path that handles new items in Run/enqueue, or call the same
state update used by dispatchCycle before the capacity check, so the decision is
based on current endpoint availability. Also update the direct unit-test calls
mentioned in the comment to pass context.Background() where required.
pkg/epp/framework/interface/requesthandling/types.go-114-148 (1)
114-148: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Reject lossy or overflowing token caps before converting to int64. json.Unmarshal gives JSON numbers as float64, so large client caps can round before this helper sees them, and out-of-range values can still hit implementation-dependent float-to-int behavior (CWE-190). Decode with json.Decoder.UseNumber() upstream or bound-check before returning the cap.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/framework/interface/requesthandling/types.go` around lines 114 - 148,
MaxOutputTokensFromPayload currently accepts float64/json.Number values and
converts them to int64 without rejecting lossy or out-of-range caps. Update the
parsing path in MaxOutputTokensFromPayload to preserve numeric precision and
verify bounds before conversion, and ensure callers decode request JSON with
json.Decoder.UseNumber where the payload enters this helper. Keep the existing
key precedence and validation behavior, but reject values that cannot be
represented exactly or safely as int64 before returning the cap.
pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go-55-60 (1)
55-60: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Propagate KV index cleanup failures from endpoint deletes.

If KVBlockIndex().Clear() fails here, Extract still returns nil after removing the subscriber. That leaves stale pod entries in the prefix-cache index with no retry signal, so later scheduling can keep reading deleted endpoint metadata until some unrelated event rebuilds the index.
Proposed fix
 	case fwkdl.EventDelete:
 		p.subscribersManager.RemoveSubscriber(ctx, endpointKey)
 		if meta.Address != "" {
 			if err := p.kvCacheIndexer.KVBlockIndex().Clear(ctx, fmt.Sprintf("%s:%s", meta.Address, meta.Port)); err != nil {
 				logger.Error(err, "Failed to clear index entries for removed endpoint",
 					"endpoint", endpointKey, "address", meta.Address, "port", meta.Port)
+				return fmt.Errorf("clear kv index for %s: %w", endpointKey, err)
 			}
 		}
 		logger.V(logging.DEBUG).Info("Removed KV-events subscriber", "endpoint", endpointKey)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go`
around lines 55 - 60, Propagate the KV index cleanup failure from the
endpoint-delete path in Extract instead of only logging it. In the block that
calls p.kvCacheIndexer.KVBlockIndex().Clear(), return the error (or wrap it with
context) so the caller sees the failure after the subscriber is removed, and
keep the existing logger.Error call for observability. Use the Extract method
and KVBlockIndex().Clear as the key symbols to update.
scripts/kind-dev-env.sh-289-289 (1)
289-289: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Quote CONTAINER_RUNTIME before executing it.

Line 289 runs ${CONTAINER_RUNTIME} unquoted. If that environment variable is tampered with, the bootstrap path can execute unintended shell syntax during cluster setup (CWE-78). Use "${CONTAINER_RUNTIME}" exec ... here.

As per path instructions, "**/*.sh: Quote all variables to prevent injection".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/kind-dev-env.sh` at line 289, The cluster bootstrap command in the
inotify check is using CONTAINER_RUNTIME without quotes, which can allow shell
injection if the variable is manipulated. Update the command in the script’s
inotify_limit check to use the quoted CONTAINER_RUNTIME invocation style,
keeping the exec call to CONTAINER_NAME and the
/proc/sys/fs/inotify/max_user_instances read intact.
Source: Path instructions
scripts/kind-dev-env.sh-409-412 (1)
409-412: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Do not overwrite caller-provided HA settings.

These exports force every invocation back to EPP_REPLICA_COUNT=1 and ENABLE_LEADER_ELECTION=false, which disables the HA path this cohort is trying to exercise. Default them only when unset.
Proposed fix
-export EPP_REPLICA_COUNT=1
+export EPP_REPLICA_COUNT="${EPP_REPLICA_COUNT:-1}"
-# Some end to end tests enable leader election
-export ENABLE_LEADER_ELECTION=false
+# Some end to end tests enable leader election
+export ENABLE_LEADER_ELECTION="${ENABLE_LEADER_ELECTION:-false}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/kind-dev-env.sh` around lines 409 - 412, The HA defaults in
kind-dev-env.sh are being forced on every run, which overwrites caller-provided
settings. Update the setup around the EPP_REPLICA_COUNT and
ENABLE_LEADER_ELECTION exports so they only apply when those variables are
unset, preserving any values passed in by the caller while still providing
defaults for local runs.
deploy/components/inference-gateway/rbac.yaml-71-73 (1)
71-73: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Scope the lease RBAC to least privilege.

Lines 71-73 grant the EPP ServiceAccount write access to every Lease in the namespace, including delete. If that identity is abused, an attacker can seize or remove other controllers' locks and force failover or denial of service (CWE-732). Restrict the rule to the single leader-election lease name and drop delete unless the client library proves it needs it.
Remediation sketch
 - apiGroups: [ "coordination.k8s.io" ]
   resources: [ "leases" ]
-  verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]
+  verbs: [ "get", "create", "update", "patch" ]
+  # If the lock name is fixed, also scope this rule with resourceNames.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/components/inference-gateway/rbac.yaml` around lines 71 - 73, The RBAC
rule for lease access in the inference-gateway config is too broad; update the
Lease permissions on the relevant Role/RoleBinding entry to follow least
privilege. In the RBAC manifest, scope the leases rule used by the EPP
ServiceAccount to the single leader-election Lease name via resourceNames, and
remove delete unless the leader-election client in this codebase explicitly
requires it. Use the existing lease rule block in rbac.yaml to locate and
tighten the permissions.
test/scripts/test-e2e-gaie.sh-28-28 (1)
28-28: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Pin KIND_NODE_IMAGE by digest.
kindest/node:v1.31.12 is a mutable reference (CWE-829); a retagged image can swap the Kubernetes node runtime for every E2E cluster. Use kindest/node@sha256:... instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/scripts/test-e2e-gaie.sh` at line 28, The KIND node image reference in
the E2E script is mutable and should be pinned to an immutable digest. Update
the KIND_NODE_IMAGE default in test-e2e-gaie.sh to use the kindest/node image by
sha256 digest instead of the v1.31.12 tag, so the cluster setup in the script
always pulls the exact intended node image.
test/e2e/e2e_test.go-69-149 (1)
69-149: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Register cleanup before the first assertion.

These specs only clean up at the tail. Any failed Expect or timeout before Lines 71, 87, 106, or 146 leaves EPP or model-server objects behind, and the next ordered spec can see the wrong pod count or stale metrics. Use ginkgo.DeferCleanup immediately after each create call.
Remediation sketch
 modelServers := createModelServersDecode(1)
+ginkgo.DeferCleanup(testutils.DeleteObjects, testConfig, modelServers)

 epp := createEndPointPickerHelper(simpleConfig, numOfPods, true, false)
+ginkgo.DeferCleanup(testutils.DeleteObjects, testConfig, epp)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/e2e_test.go` around lines 69 - 149, Register cleanup immediately
after each resource creation in these Ginkgo specs so failures before the tail
cleanup do not leak objects into later ordered tests. In the affected test
cases, add ginkgo.DeferCleanup right after createInferencePool,
createModelServersDecode, createEndPointPicker, and createEndPointPickerHelper,
and remove reliance on the final testutils.DeleteObjects calls for cleanup. This
keeps the generated resources tied to the lifecycle of each spec and avoids
stale pod counts or metrics between runs.

🟡 Minor comments (11)

docs/metrics.md-49-55 (1)

49-55: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Add RejectedNoEndpoints to the outcome label contract.

The new empty-pool 503 path introduces a terminal outcome that is missing from this metric definition. Consumers will miss a real bucket unless the docs match the implemented label set.

Suggested edit

-    *   `outcome`: string (`Dispatched`, `RejectedCapacity`, `RejectedOther`, `EvictedTTL`, `EvictedContextCancelled`, `EvictedOther`)
+    *   `outcome`: string (`Dispatched`, `RejectedCapacity`, `RejectedNoEndpoints`, `RejectedOther`, `EvictedTTL`, `EvictedContextCancelled`, `EvictedOther`)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/metrics.md` around lines 49 - 55, The
`flow_control_request_queue_duration_seconds` metric label contract is missing
the new terminal outcome introduced by the empty-pool 503 path. Update the
`outcome` label list in `docs/metrics.md` to include `RejectedNoEndpoints`
alongside the existing values so the documented contract matches the implemented
labels.

AGENTS.md-33-33 (1)

33-33: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Fix the typo in the PR checklist.

passs should be passes.

Remediation

-- Verify the code passs `make presubmit` locally before submitting a PR.
+- Verify the code passes `make presubmit` locally before submitting a PR.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@AGENTS.md` at line 33, The PR checklist text in AGENTS.md has a typo in the
presubmit verification item; update the checklist entry so it says the code
passes make presubmit locally before submitting a PR, correcting the misspelling
in that sentence.

Source: Linters/SAST tools

pkg/epp/config/loader/configloader.go-316-324 (1)

316-324: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject malformed feature-gate overrides. CWE-20: strings.Split accepts name=true=false and drops the trailing segment, so a malformed override is treated as valid input. Use strings.Cut or SplitN(..., 2) and reject any value containing a second =.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/config/loader/configloader.go` around lines 316 - 324, The
feature-gate parsing in config loader is too permissive because the current
string split accepts malformed overrides like name=true=false and silently
truncates them. Update the parsing in configloader.go to use a two-part split
via strings.Cut or strings.SplitN(..., 2) in the gate handling logic, and
explicitly reject any override whose value still contains another = so malformed
input returns an error instead of being accepted.

pkg/epp/config/loader/configloader_test.go-79-80 (1)

79-80: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Do not mutate the global feature-gate registry from a parallel test.

The writes on Lines 79-80 make the existing t.Parallel() on Line 76 unsafe. This file already keeps TestInstantiateAndConfigure serial for another global registry, so TestLoadRawConfiguration can become order-dependent as more loader tests are added. Make this test serial or snapshot and restore the feature-gate registry.
Proposed fix
 func TestLoadRawConfiguration(t *testing.T) {
-	t.Parallel()
-
 	// Register known feature gates for validation.
 	RegisterFeatureGate(testFeatureGate, true)
 	RegisterFeatureGate(flowcontrol.FeatureGate, false)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/config/loader/configloader_test.go` around lines 79 - 80,
TestLoadRawConfiguration is mutating the global feature-gate registry via
RegisterFeatureGate while also running with t.Parallel, which makes the test
order-dependent. Make this test serial by removing t.Parallel, or snapshot and
restore the feature-gate registry around the RegisterFeatureGate calls so
parallel execution does not leak global state. Use the existing
TestLoadRawConfiguration setup and its RegisterFeatureGate calls as the place to
contain this mutation.

pkg/epp/config/loader/validation.go-125-133 (1)

125-133: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject malformed name=value feature-gate entries with extra =.

Line 125 uses strings.Split, so flowControl=true=garbage passes validation because only parts[1] is parsed as a bool. That lets malformed config through this phase and makes downstream feature-gate handling ambiguous. Use strings.Cut or explicitly reject more than one = and add a regression case next to errorBadFeatureGateText.

Proposed fix

-		parts := strings.Split(gate, "=")
-		if _, ok := registeredFeatureGates[parts[0]]; !ok {
-			return fmt.Errorf("feature gate '%s' is unknown or unregistered", parts[0])
+		name, value, hasValue := strings.Cut(gate, "=")
+		name = strings.TrimSpace(name)
+		if _, ok := registeredFeatureGates[name]; !ok {
+			return fmt.Errorf("feature gate '%s' is unknown or unregistered", name)
 		}
-		if len(parts) > 1 {
-			_, err := strconv.ParseBool(strings.ToLower(strings.TrimSpace(parts[1])))
+		if hasValue {
+			value = strings.TrimSpace(value)
+			if strings.Contains(value, "=") {
+				return fmt.Errorf("feature gate '%s' has an invalid value %q", name, value)
+			}
+			_, err := strconv.ParseBool(value)
 			if err != nil {
-				return fmt.Errorf("%s is not a valid value for the feature gate %s (error: %w)", parts[1], parts[0], err)
+				return fmt.Errorf("%s is not a valid value for the feature gate %s (error: %w)", value, name, err)
 			}
 		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/config/loader/validation.go` around lines 125 - 133, The feature-gate
validation in validation.go accepts malformed name=value entries because
strings.Split allows extra "=" segments, so fix the parsing in the feature-gate
loop to reject any gate with more than one "=" or switch to strings.Cut and fail
when a second separator is present. Update the validation logic around the
registeredFeatureGates check and strconv.ParseBool handling so only exactly one
optional value is allowed, and add a regression test alongside
errorBadFeatureGateText for inputs like flowControl=true=garbage.

pkg/epp/requestcontrol/admission.go-230-232 (1)

230-232: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Avoid duplicating the no-endpoints message.

If err is types.ErrNoEndpoints, Line 232 returns no endpoints available: no endpoints available. Reuse msg as-is for that sentinel, or only prepend when err carries extra detail.

Suggested fix

 	case types.QueueOutcomeRejectedNoEndpoints:
 		// No serving capacity exists (e.g. pool scaled to zero): signal genuine unavailability rather than backpressure.
-		return errcommon.Error{Code: errcommon.ServiceUnavailable, Msg: "no endpoints available: " + msg, Headers: map[string]string{errcommon.RequestDroppedReasonHeaderKey: string(errcommon.RequestDroppedReasonNoEndpoints)}}
+		if errors.Is(err, types.ErrNoEndpoints) {
+			return errcommon.Error{Code: errcommon.ServiceUnavailable, Msg: msg, Headers: map[string]string{errcommon.RequestDroppedReasonHeaderKey: string(errcommon.RequestDroppedReasonNoEndpoints)}}
+		}
+		return errcommon.Error{Code: errcommon.ServiceUnavailable, Msg: "no endpoints available: " + msg, Headers: map[string]string{errcommon.RequestDroppedReasonHeaderKey: string(errcommon.RequestDroppedReasonNoEndpoints)}}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/requestcontrol/admission.go` around lines 230 - 232, The
`QueueOutcomeRejectedNoEndpoints` branch in the admission error mapping is
duplicating the no-endpoints text when `err` is `types.ErrNoEndpoints`. Update
this case so it uses `msg` directly for the sentinel, and only prefixes with “no
endpoints available:” when the incoming error carries additional detail. Keep
the existing `errcommon.Error` construction and
`RequestDroppedReasonNoEndpoints` header unchanged.

pkg/epp/flowcontrol/framework/plugins/queue/functional_test.go-510-512 (1)

510-512: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Remove the non-atomic Len/Peek assertion.

Another goroutine can remove the peeked item between Line 510 and Line 511, making Len() == 0 while peeked is still non-nil. This makes the concurrency test flaky.

Proposed fix

-								peeked := q.Peek()
-								if q.Len() == 0 {
-									assert.Nil(t, peeked, "Peek on empty queue expected nil")
-								}
+								_ = q.Peek()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/framework/plugins/queue/functional_test.go` around lines
510 - 512, The concurrency test in the queue functional test is relying on a
non-atomic Len/Peek check, which can become flaky if another goroutine modifies
the queue between the two calls. Update the assertion around q.Peek() and
q.Len() in the test to avoid depending on their combined state, and instead
verify Peek behavior in a way that does not assume no concurrent removal. Use
the q.Peek and q.Len calls in this test block as the location to adjust the
check.

test/e2e/e2e_suite_test.go-32-33 (1)

32-33: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Do not hardcode the EPP app label.

eppName is fixed to e2e-epp, but the deployed pod label comes from ${EPP_NAME}. Any suite run that overrides EPP_NAME will stop finding the leader pod and fail the HA assertions. Read the same env var here, with e2e-epp only as the default.
Proposed fix
-	eppName = "e2e-epp"
+	eppName = env.GetEnvString("EPP_NAME", "e2e-epp", ginkgo.GinkgoLogr)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/e2e_suite_test.go` around lines 32 - 33, The eppName test constant
is hardcoded to e2e-epp, which can break leader-pod discovery when EPP_NAME is
overridden in the environment. Update the e2e suite setup around eppName to read
the same EPP_NAME env var used by the deployment, and keep e2e-epp only as the
fallback default. Make the change in the test initialization where eppName is
defined so the HA assertions and pod label matching stay aligned with the
deployed app label.

test/e2e/requests_test.go-292-295 (1)

292-295: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the malformed requests fail or move the error counters.

These calls are not validated today. The helper only checks metric-family presence, so it still passes if invalid bodies start returning 200 or stop incrementing the error series. Assert statusCode >= 400 here, or compare the pre/post values of the request-error counters before accepting the scrape.

Also applies to: 365-367
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/requests_test.go` around lines 292 - 295, The malformed request loop
in requests_test.go is only exercising the helper without verifying the outcome,
so it can pass even if invalid bodies stop failing or no longer affect error
metrics. Update the relevant calls around the chat completions error cases
(including the second occurrence noted in the comment) to assert each response
has statusCode >= 400, or capture the request-error counter values before and
after the loop and verify they increase. Use the existing doPostWithError helper
and the surrounding test flow in requests_test.go to anchor the fix.

test/e2e/setup_test.go-139-148 (1)

139-148: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Bound the readiness probe with an HTTP timeout.

A blocked http.Get can hang longer than readyTimeout, because Eventually cannot interrupt a single stuck request. Use a short-lived client or per-request context here so a wedged gateway does not freeze the suite.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/setup_test.go` around lines 139 - 148, The gateway readiness probe
in the test setup currently uses an unbounded http.Get, which can block past
readyTimeout and stall the suite. Update the readiness check in the Eventually
block to use a short-lived HTTP client or a per-request context timeout around
the request, then keep the existing status/body readiness logic and cleanup in
place so the probe fails fast if the gateway is wedged.

test/e2e/utils_test.go-197-205 (1)

197-205: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Add timeouts to the new network helpers.

Both helpers use the default client with no deadline. One stalled Hugging Face request or half-open metrics scrape can block forever and bypass the surrounding retry logic.

Proposed fix

+var e2eHTTPClient = &http.Client{Timeout: 10 * time.Second}
+
 func isModelReal(modelName string) bool {
 	req, err := http.NewRequest("GET", "https://huggingface.co/api/models/"+modelName, nil)
 	if err != nil {
 		return false
 	}
 	if token := os.Getenv("HF_TOKEN"); token != "" {
 		req.Header.Set("Authorization", "Bearer "+token)
 	}
 
-	resp, err := http.DefaultClient.Do(req)
+	resp, err := e2eHTTPClient.Do(req)
 	if err != nil {
 		return false
 	}
 	defer resp.Body.Close()
@@
 func getMetrics(metricsURL string) []string {
 	var body []byte
 	gomega.Eventually(func() error {
-		resp, err := http.Get(metricsURL)
+		resp, err := e2eHTTPClient.Get(metricsURL)
 		if err != nil {
 			return err
 		}

Also applies to: 285-298

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/utils_test.go` around lines 197 - 205, The new network helpers are
using http.DefaultClient without any deadline, so a stalled Hugging Face request
or metrics scrape can hang indefinitely and skip retries. Update the helper(s)
that make these calls to use a client with a timeout or per-request context
deadline instead of DefaultClient, and apply the same fix to both request paths
referenced by the tests. Keep the existing request construction and token header
logic intact while ensuring the network call always fails fast.

🧹 Nitpick comments (4)

cmd/epp/runner/runner.go (1)
290-290: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Remove the inline mode paraphrase.

The comment repeats the condition and is too long for an inline note.
Proposed fix
-	startCrdReconcilers := opts.EndpointSelector == nil // If endpointSelector is nil, it means it's not in the standalone mode. Then we should start the inferencePool and other CRD Reconciler.
+	// CRD reconcilers own pool discovery outside standalone mode.
+	startCrdReconcilers := opts.EndpointSelector == nil
As per coding guidelines, "Write comments tersely and only when the reason is non-obvious; do not paraphrase the code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/epp/runner/runner.go` at line 290, The comment on startCrdReconcilers in
runner.go is too long and just restates the nil-check, so remove the inline
paraphrase and replace it with a terse note only if needed. Keep the explanation
focused on the non-obvious reason for starting the CRD reconcilers in
non-standalone mode, and do not repeat the condition already expressed in the
code.
Source: Coding guidelines
pkg/epp/server/options.go (1)
76-76: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Keep selector parsing facts in one location.

Line 76 and Line 116 both document the --endpoint-selector/Complete() parsing path. Keep the public field comment canonical and make the internal field comment terse.
Proposed fix
-	EndpointSelector            labels.Selector // Parsed selector to filter model server pods on. Set via --endpoint-selector flag and parsed in Complete().
+	EndpointSelector            labels.Selector // Selector to filter model server pods.
 	EndpointTargetPorts         []int           // Target ports of model server pods.
 	DisableEndpointSubsetFilter bool            // Disables respecting destination endpoint subset metadata in EPP.
@@
-	fs                  *pflag.FlagSet // FlagSet used in AddFlags() and consulted in Validate()
-	endpointSelectorStr string         // Raw string from --endpoint-selector flag, parsed to EndpointSelector in Complete()
+	fs                  *pflag.FlagSet // FlagSet used in AddFlags() and consulted in Validate()
+	endpointSelectorStr string         // Raw --endpoint-selector value.
As per coding guidelines, "State each fact once in its canonical location; do not duplicate the same information across struct docs, prose, tables, inline comments, and examples."

Also applies to: 115-116
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/server/options.go` at line 76, The `EndpointSelector` docs repeat the
same `--endpoint-selector` parsing details in both the public field comment and
the internal `Complete()`-related comment; keep the full parsing explanation
only on the canonical `EndpointSelector` comment and make the
`Complete`/internal field comment terse so each fact is stated once. Update the
comments in `options.go` around `EndpointSelector` and `Complete()` to remove
duplicated wording while preserving the core meaning.
Source: Coding guidelines
pkg/epp/flowcontrol/types/errors.go (1)
49-52: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Collapse this duplicated rationale into one canonical doc site.

The scale-from-zero and 503-vs-429 explanation is repeated here and again on QueueOutcomeRejectedNoEndpoints. Keep the sentinel error comment terse and leave the longer behavior note in one place. As per coding guidelines, "Write comments tersely and only when the reason is non-obvious; do not paraphrase the code." and "State each fact once in its canonical location; do not duplicate the same information across struct docs, prose, tables, inline comments, and examples."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/types/errors.go` around lines 49 - 52, The comment on
ErrNoEndpoints duplicates the longer scale-from-zero and 503-vs-429 rationale
that already belongs with QueueOutcomeRejectedNoEndpoints. Keep the
ErrNoEndpoints sentinel comment brief and non-redundant, and move or retain the
detailed behavior explanation only in the canonical
QueueOutcomeRejectedNoEndpoints documentation so the same fact is stated once.
Source: Coding guidelines
pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.go (1)
144-159: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Trim the new comment blocks.

These comments restate the implementation in detail. Keep only the invariant so the comments do not become a second copy of the code. As per coding guidelines, "Write comments tersely and only when the reason is non-obvious; do not paraphrase the code."

Also applies to: 637-674
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.go`
around lines 144 - 159, Trim the new comment blocks in addedTokensEntry and
Clone so they only state the non-obvious invariant instead of restating the
implementation. Update the comments near tokenCounter, requestCounter, requests,
and the Clone method to be terse, focusing on the
shared-counter/independent-clone behavior rather than describing how
increment/release works step by step.
Source: Coding guidelines

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b39aa87a-a414-4894-b899-95f2e37babcb

📥 Commits

Reviewing files that changed from the base of the PR and between b750ef4 and 1b55af3.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum, !**/*.sum

📒 Files selected for processing (159)

.github/ISSUE_TEMPLATE/new-release.md
.github/workflows/check-typos.yaml
.github/workflows/ci-build-images.yaml
.github/workflows/ci-dependency-review.yaml
.github/workflows/ci-lint.yaml
.github/workflows/ci-pr-checks.yaml
.github/workflows/ci-release.yaml
.github/workflows/md-link-check.yml
.github/workflows/release-notes-assemble.yaml
.github/workflows/release-notes-update.yaml
.github/workflows/unstale.yaml
AGENTS.md
Makefile
RELEASE-NOTES.md
cmd/epp/runner/runner.go
deploy/components/inference-gateway/deployment.yaml
deploy/components/inference-gateway/rbac.yaml
deploy/components/vllm-decode/deployment.yaml
deploy/environments/dev/base-kind-istio/patch-deployments.yaml
deploy/environments/dev/e-p-d/patch-decode.yaml
deploy/environments/dev/e-pd/patch-decode.yaml
deploy/environments/dev/p-d/patch-decode.yaml
docs/metrics.md
go.mod
hack/update-codegen.sh
pkg/common/error/error.go
pkg/epp/config/loader/configloader.go
pkg/epp/config/loader/configloader_test.go
pkg/epp/config/loader/testdata_test.go
pkg/epp/config/loader/validation.go
pkg/epp/controller/inferencepool_reconciler_test.go
pkg/epp/datalayer/endpoint_pool.go
pkg/epp/datalayer/factory.go
pkg/epp/datastore/datastore.go
pkg/epp/datastore/datastore_test.go
pkg/epp/flowcontrol/contracts/mocks/mocks.go
pkg/epp/flowcontrol/controller/internal/processor.go
pkg/epp/flowcontrol/controller/internal/processor_test.go
pkg/epp/flowcontrol/framework/plugins/queue/benchmark_test.go
pkg/epp/flowcontrol/framework/plugins/queue/functional_test.go
pkg/epp/flowcontrol/framework/plugins/queue/listqueue.go
pkg/epp/flowcontrol/framework/plugins/queue/maxminheap.go
pkg/epp/flowcontrol/framework/plugins/queue/maxminheap_test.go
pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue.go
pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue_test.go
pkg/epp/flowcontrol/integration_helpers_test.go
pkg/epp/flowcontrol/integration_test.go
pkg/epp/flowcontrol/registry/config.go
pkg/epp/flowcontrol/registry/config_test.go
pkg/epp/flowcontrol/registry/managedqueue.go
pkg/epp/flowcontrol/registry/managedqueue_test.go
pkg/epp/flowcontrol/types/errors.go
pkg/epp/flowcontrol/types/outcomes.go
pkg/epp/framework/interface/flowcontrol/flow.go
pkg/epp/framework/interface/flowcontrol/mocks/mocks.go
pkg/epp/framework/interface/flowcontrol/queue.go
pkg/epp/framework/interface/requesthandling/types.go
pkg/epp/framework/interface/requesthandling/types_test.go
pkg/epp/framework/plugins/flowcontrol/fairness/functional_test.go
pkg/epp/framework/plugins/flowcontrol/fairness/globalstrict/global_strict.go
pkg/epp/framework/plugins/flowcontrol/fairness/globalstrict/global_strict_test.go
pkg/epp/framework/plugins/flowcontrol/fairness/program-aware/plugin.go
pkg/epp/framework/plugins/flowcontrol/fairness/program-aware/plugin_test.go
pkg/epp/framework/plugins/flowcontrol/fairness/program-aware/strategy_las.go
pkg/epp/framework/plugins/flowcontrol/fairness/program-aware/strategy_las_test.go
pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/README.md
pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.go
pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer_test.go
pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/token_estimator.go
pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/token_estimator_test.go
pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go
pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor_test.go
pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/producer_test.go
pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/metrics.go
pkg/epp/framework/plugins/requesthandling/parsers/anthropic/anthropic.go
pkg/epp/framework/plugins/requesthandling/parsers/anthropic/anthropic_test.go
pkg/epp/framework/plugins/requesthandling/parsers/openai/openai.go
pkg/epp/framework/plugins/requesthandling/parsers/openai/openai_test.go
pkg/epp/framework/plugins/requesthandling/parsers/vllmgrpc/vllmgrpc.go
pkg/epp/framework/plugins/requesthandling/parsers/vllmgrpc/vllmgrpc_test.go
pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.go
pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp_test.go
pkg/epp/metrics/metrics.go
pkg/epp/metrics/testdata/input_tokens_metric
pkg/epp/metrics/testdata/normalized_time_per_output_token_seconds_metric
pkg/epp/metrics/testdata/output_tokens_metric
pkg/epp/metrics/testdata/request_sizes_metric
pkg/epp/metrics/testdata/response_sizes_metric
pkg/epp/metrics/testdata/running_requests_metrics
pkg/epp/requestcontrol/admission.go
pkg/epp/requestcontrol/admission_test.go
pkg/epp/server/options.go
pkg/epp/util/pool/pool.go
pkg/sidecar/proxy/options.go
pkg/sidecar/proxy/options_test.go
release-notes.d/unreleased/1030.md
release-notes.d/unreleased/1121.md
release-notes.d/unreleased/1160.md
release-notes.d/unreleased/1206.md
release-notes.d/unreleased/1218.md
release-notes.d/unreleased/1234.md
release-notes.d/unreleased/1244.md
release-notes.d/unreleased/1247.md
release-notes.d/unreleased/1248.md
release-notes.d/unreleased/1276.md
release-notes.d/unreleased/1288.md
release-notes.d/unreleased/1302.md
release-notes.d/unreleased/1372.md
release-notes.d/unreleased/1402.md
release-notes.d/unreleased/1418.md
release-notes.d/unreleased/1426.md
release-notes.d/unreleased/1429.md
release-notes.d/unreleased/1436.md
release-notes.d/unreleased/1444.md
release-notes.d/unreleased/1449.md
release-notes.d/unreleased/1475.md
release-notes.d/unreleased/1488.md
release-notes.d/unreleased/1493.md
release-notes.d/unreleased/1509.md
release-notes.d/unreleased/1513.md
release-notes.d/unreleased/1515.md
release-notes.d/unreleased/1536.md
release-notes.d/unreleased/1539.md
release-notes.d/unreleased/1548.md
release-notes.d/unreleased/1550.md
release-notes.d/unreleased/1554.md
release-notes.d/unreleased/1575.md
release-notes.d/unreleased/1603.md
release-notes.d/unreleased/1607.md
release-notes.d/unreleased/1608.md
release-notes.d/unreleased/1616.md
release-notes.d/unreleased/1626.md
release-notes.d/unreleased/1631.md
release-notes.d/unreleased/1651.md
release-notes.d/unreleased/1653.md
release-notes.d/unreleased/1661.md
release-notes.d/unreleased/1667.md
release-notes.d/unreleased/1690.md
release-notes.d/unreleased/1728.md
release-notes.d/unreleased/1731.md
release-notes.d/unreleased/1739.md
release-notes.d/unreleased/1750.md
release-notes.d/unreleased/1751.md
scripts/kind-dev-env.sh
test/e2e/e2e_suite_test.go
test/e2e/e2e_test.go
test/e2e/epp/utils_test.go
test/e2e/requests_test.go
test/e2e/setup_test.go
test/e2e/utils_test.go
test/integration/epp/grpc_test.go
test/integration/epp/harness.go
test/integration/epp/hermetic_test.go
test/scripts/test-e2e-gaie.sh
test/sidecar/config/nixl/qwen-decoder-pod.yaml
test/testdata/configloader_1_test.yaml
test/utils/igw/utils.go
test/utils/k8s_objects.go
version/version.go

💤 Files with no reviewable changes (48)

release-notes.d/unreleased/1436.md
release-notes.d/unreleased/1513.md
release-notes.d/unreleased/1536.md
release-notes.d/unreleased/1206.md
release-notes.d/unreleased/1302.md
release-notes.d/unreleased/1653.md
release-notes.d/unreleased/1509.md
release-notes.d/unreleased/1539.md
release-notes.d/unreleased/1418.md
release-notes.d/unreleased/1449.md
release-notes.d/unreleased/1550.md
release-notes.d/unreleased/1575.md
release-notes.d/unreleased/1429.md
release-notes.d/unreleased/1244.md
release-notes.d/unreleased/1444.md
release-notes.d/unreleased/1218.md
release-notes.d/unreleased/1426.md
release-notes.d/unreleased/1661.md
release-notes.d/unreleased/1276.md
release-notes.d/unreleased/1603.md
release-notes.d/unreleased/1234.md
release-notes.d/unreleased/1372.md
release-notes.d/unreleased/1288.md
release-notes.d/unreleased/1493.md
release-notes.d/unreleased/1515.md
deploy/environments/dev/p-d/patch-decode.yaml
release-notes.d/unreleased/1488.md
deploy/environments/dev/e-pd/patch-decode.yaml
deploy/environments/dev/e-p-d/patch-decode.yaml
release-notes.d/unreleased/1475.md
deploy/components/vllm-decode/deployment.yaml
release-notes.d/unreleased/1631.md
release-notes.d/unreleased/1121.md
release-notes.d/unreleased/1626.md
release-notes.d/unreleased/1608.md
release-notes.d/unreleased/1247.md
release-notes.d/unreleased/1030.md
release-notes.d/unreleased/1160.md
release-notes.d/unreleased/1548.md
release-notes.d/unreleased/1554.md
release-notes.d/unreleased/1607.md
release-notes.d/unreleased/1248.md
test/sidecar/config/nixl/qwen-decoder-pod.yaml
pkg/epp/flowcontrol/framework/plugins/queue/maxminheap.go
release-notes.d/unreleased/1651.md
release-notes.d/unreleased/1402.md
pkg/epp/datalayer/factory.go
pkg/epp/flowcontrol/framework/plugins/queue/maxminheap_test.go

coderabbitai · 2026-06-28T08:15:52Z

CVE-2025-30066

      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v7


🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

Pin actions/checkout by full SHA and set persist-credentials: false. actions/checkout@v7 is a movable ref (CWE-829; same tag-move risk class as CVE-2025-30066), and the default checkout behavior leaves ${{ github.token }} in .git/config (CWE-200).

🧰 Tools

🪛 zizmor (1.26.1)

[warning] 19-20: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/check-typos.yaml around lines 19 - 20, The checkout step uses a movable actions ref and leaves the token persisted, so update the workflow’s Checkout code step to use a full SHA for actions/checkout instead of the version tag, and set persist-credentials to false. Make the change in the workflow job that runs the checkout so the action reference is pinned and credentials are not stored in .git/config.

coderabbitai · 2026-06-28T08:15:52Z

      - name: Checkout source
-        uses: actions/checkout@v6
+        uses: actions/checkout@v7


🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail file=".github/workflows/ci-build-images.yaml" echo "== file size ==" wc -l "$file" echo echo "== relevant lines ==" nl -ba "$file" | sed -n '1,180p' echo echo "== checkout occurrences ==" rg -n "actions/checkout|persist-credentials|uses: \./|uses: .*\.github" "$file"

Repository: opendatahub-io/llm-d-router

Length of output: 287

🏁 Script executed:

#!/bin/bash set -euo pipefail file=".github/workflows/ci-build-images.yaml" echo "== lines 1-140 with numbers ==" awk '{printf "%4d %s\n", NR, $0}' "$file" | sed -n '1,140p' echo echo "== direct matches ==" grep -nE 'actions/checkout|persist-credentials|uses: \./|uses: .*\.github' "$file" || true

Repository: opendatahub-io/llm-d-router

Length of output: 5290

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "persist-credentials|actions/checkout@" .github/workflows .github/actions || true

Repository: opendatahub-io/llm-d-router

Length of output: 1574

🌐 Web query:

actions/checkout persist-credentials default true documentation

💡 Result:

The persist-credentials input for the actions/checkout GitHub Action has a default value of true [1][2][3]. When set to true (the default), the action configures the authentication token or SSH key used for the checkout into the local git configuration [1][3]. This allows subsequent git commands executed within your workflow scripts to run authenticated [1][4]. If you wish to opt out of this behavior—for example, for security reasons if you do not want the credentials to persist in the git configuration—you must explicitly set persist-credentials to false in your workflow file [1][5]. Regardless of this setting, the action ensures that credentials are removed during the post-job cleanup process [1][4].

Citations:

1: https://github.com/actions/checkout/?tab=readme-ov-file

2: https://github.com/actions/checkout/blob/master/action.yml

3: https://github.com/actions/checkout/blob/v4/action.yml

4: https://github.com/actions/checkout/blob/main/README.md

5: https://github.com/actions/checkout/blob/v4/README.md

Pin the three checkout steps by SHA and disable persisted credentials.
actions/checkout@v7 is a movable tag in all three jobs, so a retargeted tag can inject code into this workflow (CWE-829). persist-credentials defaults to true, leaving the token in .git/config for later steps; set it to false (CWE-200).

🧰 Tools

🪛 zizmor (1.26.1)

[warning] 32-33: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci-build-images.yaml around lines 32 - 33, Update each Checkout source step in the workflow to use a fixed commit SHA instead of the movable actions/checkout@v7 tag, and set persist-credentials to false on all three checkout steps. Make the change in every job that uses the checkout action so the workflow consistently pins the action and does not leave the token in .git/config.

Sources: Path instructions, Linters/SAST tools

coderabbitai · 2026-06-28T08:15:52Z

      - name: Checkout source
-        uses: actions/checkout@v6
+        uses: actions/checkout@v7


🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

sed -n '1,120p' .github/workflows/ci-dependency-review.yaml

Repository: opendatahub-io/llm-d-router

Length of output: 655

Pin checkout and disable persisted credentials

actions/checkout@v7 is a mutable tag (CWE-829). Its default auth setup also writes ${{ github.token }} into the local git config (CWE-200). Pin it by full SHA and set persist-credentials: false here.

🧰 Tools

🪛 zizmor (1.26.1)

[warning] 15-16: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci-dependency-review.yaml around lines 15 - 16, The checkout step is using a mutable action tag and leaving persisted credentials enabled. Update the Checkout source step in the workflow to use a full SHA-pinned reference for actions/checkout, and set persist-credentials to false in that same step so the default token is not written into local git config.

Sources: Path instructions, Linters/SAST tools

coderabbitai · 2026-06-28T08:15:52Z

CVE-2025-30066

      - name: Checkout source
-        uses: actions/checkout@v6
+        uses: actions/checkout@v7


🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show the workflow file with line numbers around the referenced lines. sed -n '1,120p' .github/workflows/ci-lint.yaml | cat -n # Locate all checkout usages in this workflow. rg -n 'uses:\s*actions/checkout@' .github/workflows/ci-lint.yaml

Repository: opendatahub-io/llm-d-router

Length of output: 4021

🏁 Script executed:

#!/bin/bash set -euo pipefail # List all action references in the workflow. rg -n 'uses:\s*' .github/workflows/ci-lint.yaml # Show surrounding lines for every action reference. sed -n '30,110p' .github/workflows/ci-lint.yaml | cat -n

Repository: opendatahub-io/llm-d-router

Length of output: 3177

Pin every workflow action by full SHA. actions/checkout@v7 (both steps), dorny/paths-filter@v4, actions/setup-go@v6, and actions/cache@v5 are mutable tags and widen the CI supply-chain attack surface (CWE-829 / CVE-2025-30066 class). Replace each uses: reference with a vetted commit SHA.

🧰 Tools

🪛 zizmor (1.26.1)

[warning] 36-37: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci-lint.yaml around lines 36 - 37, The workflow currently uses mutable action tags, so update every `uses:` reference in this CI config to a vetted full commit SHA. Replace the `actions/checkout`, `dorny/paths-filter`, `actions/setup-go`, and `actions/cache` entries with pinned SHAs, including both checkout steps, and keep the existing step names intact so the workflow behavior remains the same.

Source: Path instructions

coderabbitai · 2026-06-28T08:15:52Z

    steps:
      - name: Checkout source
-        uses: actions/checkout@v6
+        uses: actions/checkout@v7


🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

Pin every actions/checkout reference to a full SHA.

actions/checkout@v7 is a mutable tag (CWE-829). Replace each occurrence in this workflow (28, 56, 73, 231, 281, 327, 402) with the vetted commit SHA; otherwise a retargeted release can run arbitrary code in CI.

🧰 Tools

🪛 zizmor (1.26.1)

[warning] 27-28: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci-pr-checks.yaml at line 28, The workflow is using mutable actions/checkout tags, which should be replaced with pinned commit SHAs. Update every actions/checkout reference in ci-pr-checks.yaml, including the checkout steps used in the workflow jobs, to the vetted full SHA instead of v7. Keep the change consistent across all occurrences so the workflow always runs the exact approved checkout revision.

coderabbitai · 2026-06-28T08:15:52Z

CVE-2025-30066

      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v7


🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

sed -n '1,220p' .github/workflows/md-link-check.yml

Repository: opendatahub-io/llm-d-router

Length of output: 1532

Critical: pin every uses: reference by full SHA (CWE-829, CVE-2025-30066 class).
actions/checkout@v7 and lycheeverse/lychee-action@v2.8.0 are mutable tags. Replace both with vetted commit SHAs.

🧰 Tools

🪛 zizmor (1.26.1)

[warning] 32-33: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/md-link-check.yml around lines 32 - 33, Both action references are using mutable tags, so update the workflow to pin each `uses:` entry to a vetted full commit SHA instead of version tags. Locate the `actions/checkout` step and the `lycheeverse/lychee-action` step in the workflow, and replace `@v7` / `@v2.8.0` with their approved SHAs.

Source: Path instructions

coderabbitai · 2026-06-28T08:15:53Z

CVE-2025-30066

        # into main between the tag push and this workflow's run is not
        # accidentally rolled into the wrong release.
-        uses: actions/checkout@v6
+        uses: actions/checkout@v7


🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

python3 - <<'PY' from pathlib import Path p = Path('.github/workflows/release-notes-assemble.yaml') text = p.read_text() for needle in ['actions/checkout@v7', 'origin/main', 'RELEASE-NOTES.md', 'rerun', 'overwrite']: print(f'## {needle}') for i, line in enumerate(text.splitlines(), 1): if needle in line: print(f'{i}: {line}') PY

Repository: opendatahub-io/llm-d-router

Length of output: 753

🏁 Script executed:

sed -n '70,130p' .github/workflows/release-notes-assemble.yaml | nl -ba

Repository: opendatahub-io/llm-d-router

Length of output: 205

🏁 Script executed:

awk 'NR>=70 && NR<=130 { printf "%4d %s\n", NR, $0 }' .github/workflows/release-notes-assemble.yaml

Repository: opendatahub-io/llm-d-router

Length of output: 2756

Pin actions/checkout and make the release-notes write idempotent.

.github/workflows/release-notes-assemble.yaml:36 uses actions/checkout@v7 instead of a full SHA (CWE-829; CVE-2025-30066 class).

The job prepends a fresh RELEASE-NOTES.md section on every run, so rerunning after the release notes already landed on main will duplicate the same tag block.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release-notes-assemble.yaml at line 36, The release-notes workflow currently uses an unpinned actions/checkout reference and blindly prepends a new RELEASE-NOTES.md section each run. Update the checkout step in the release-notes assembly workflow to a full commit SHA, and make the write logic in the release-notes generation step idempotent by checking for an existing tag section before adding it so reruns do not duplicate the same block.

Source: Path instructions

coderabbitai · 2026-06-28T08:15:53Z

CVE-2025-30066

      - name: Checkout main
-        uses: actions/checkout@v6
+        uses: actions/checkout@v7


🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail file=".github/workflows/release-notes-update.yaml" wc -l "$file" echo "----" cat -n "$file" | sed -n '1,220p'

Repository: opendatahub-io/llm-d-router

Length of output: 4522

Critical: pin the GitHub Actions here by full SHA (CWE-829, CVE-2025-30066 class). This pull_request_target workflow runs with a GitHub App token and later force-pushes a branch, so actions/create-github-app-token@v3 and actions/checkout@v7 are both supply-chain entry points.

🧰 Tools

🪛 zizmor (1.26.1)

[warning] 32-36: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release-notes-update.yaml around lines 32 - 33, The workflow uses unpinned third-party GitHub Actions, which leaves the pull_request_target job exposed to supply-chain risk. Update the actions referenced in the release-notes update workflow, especially actions/create-github-app-token and actions/checkout, to full commit SHAs instead of version tags, and keep the existing job logic intact while changing only the action references.

Source: Path instructions

coderabbitai · 2026-06-28T08:15:53Z

+	apiType := determineAPITypeFromPath(request.GetRequestPath(headers))
+	extractedBody, err := extractRequestBody(apiType, body)
 	if err != nil {
 		return nil, err
 	}
 	extractedBody.Payload = fwkrh.PayloadMap(bodyMap)
+	extractedBody.MaxOutputTokens = maxOutputTokensForAPI(apiType, bodyMap)


🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find the exact definition locations if the first pass did not surface them. rg -n "type PayloadMap|MaxOutputTokensFromPayload|maxOutputTokensForAPI" pkg -S

Repository: opendatahub-io/llm-d-router

Length of output: 1952

Pass fwkrh.PayloadMap through the max-token path

MaxOutputTokensFromPayload takes fwkrh.PayloadMap, but maxOutputTokensForAPI still accepts map[string]any and forwards it unchanged, which breaks the build on every branch here. Convert once and pass the typed payload through the helper. CWE-704.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/framework/plugins/requesthandling/parsers/openai/openai.go` around lines 115 - 121, The max-token flow is still using an untyped map, which causes the build break in the OpenAI parser path. Update the request handling path in openai.go so `maxOutputTokensForAPI` accepts and uses `fwkrh.PayloadMap` instead of `map[string]any`, and pass the typed payload through from the `extractRequestBody` call site where `extractedBody.Payload` and `extractedBody.MaxOutputTokens` are set. Make the conversion once and keep the payload typed all the way through the helper chain, including any `MaxOutputTokensFromPayload` usage.

coderabbitai · 2026-06-28T08:15:53Z

+	// max_tokens lives under sampling_params in the generate wire format.
+	if sp, ok := bodyMap["sampling_params"].(map[string]any); ok {
+		body.MaxOutputTokens = fwkrh.MaxOutputTokensFromPayload(sp, "max_tokens")
+	}


🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.go printf '\n--- helper search ---\n' rg -n "type PayloadMap|MaxOutputTokensFromPayload|sampling_params" pkg/epp/framework -S printf '\n--- type definition context ---\n' file=$(rg -l "type PayloadMap" pkg/epp/framework -S | head -n 1) if [ -n "${file:-}" ]; then sed -n '1,220p' "$file" fi

Repository: opendatahub-io/llm-d-router

Length of output: 16611

Convert sampling_params to fwkrh.PayloadMap first
Line 127 passes map[string]any into MaxOutputTokensFromPayload, which expects fwkrh.PayloadMap (CWE-704). This is a compile-time type mismatch. Cast the nested map before calling the helper.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.go` around lines 125 - 128, The sampling_params handling in vllmhttp.ParseGenerateRequest passes a plain map[string]any into MaxOutputTokensFromPayload, but that helper expects fwkrh.PayloadMap. Update the nested sampling_params extraction in vllmhttp.go to cast the map to fwkrh.PayloadMap before calling fwkrh.MaxOutputTokensFromPayload, keeping the existing body.MaxOutputTokens assignment unchanged.

llm-d-router-release-notes Bot and others added 30 commits June 22, 2026 09:41

ci: add artifacts target and upload-artifacts job for CRDs (llm-d#1713)

8fa2015

Signed-off-by: Cong Liu <conliu@google.com>

chore: switch default vllm port from 8001 to 8200 for sidecar (llm-d#…

20e61b8

…1723) - remove explicitly config to use 8200 in deployment and test - keep local dev test with 8001 Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Update release instruction for fetching the existing release branch (l…

fb8bd2d

…lm-d#1716) Signed-off-by: Cong Liu <conliu@google.com>

fix: wrong bundle version in artifacts CRDs (llm-d#1742)

e2e7253

- 0.9.0 has main-dev set in annoataion llm-d.ai/bundle-version Signed-off-by: Wen Zhou <wenzhou@redhat.com>

emphasize PR template use (llm-d#1682)

6fd2cfe

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

shichaooutlook and others added 2 commits June 27, 2026 10:44

github-actions Bot added the size/L label Jun 28, 2026

Sync upstream llm-d/llm-d-router 1fa3803

1b55af3

coderabbitai Bot reviewed Jun 28, 2026

View reviewed changes

kylape approved these changes Jun 30, 2026

View reviewed changes

zdtsw merged commit fae8ee5 into opendatahub-io:main Jun 30, 2026
41 of 43 checks passed

Uh oh!

Conversation

zdtsw commented Jun 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

zdtsw commented Jun 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading