[sync] upstream llm-d/llm-d-router 1fa3803d [2026-06-28]#262
Conversation
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…d#1572) GitHub Actions runners share public IPs that frequently hit the HF Hub anonymous API quota, causing 429 errors during vllm-render model downloads. This accounts for a large fraction of e2e flakes. Propagate HF_TOKEN from a repo secret into the e2e builder container so the vllm-render sidecar and the model-existence check authenticate against HF Hub. Authenticated requests have a substantially higher rate limit than anonymous ones. Changes: - Add HF_TOKEN to the Makefile E2E_ENV_VARS forwarding list - Read HF_TOKEN from the environment in setup_test.go instead of hardcoding an empty string - Attach a Bearer token header to the isModelReal HF API call in utils_test.go - Pass secrets.HF_TOKEN in both e2e-gaie and e2e-router CI steps The secret must be created as a GitHub repo secret. If the secret is not set, behavior is unchanged (empty token, anonymous requests). Closes llm-d#1527 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
Signed-off-by: Cong Liu <conliu@google.com>
…3 updates (llm-d#1715) Bumps the github-actions group with 3 updates in the / directory: [actions/checkout](https://github.com/actions/checkout), [actions/setup-go](https://github.com/actions/setup-go) and [azure/setup-kubectl](https://github.com/azure/setup-kubectl). Updates `actions/checkout` from 6 to 7 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v6...v7) Updates `actions/setup-go` from 5 to 6 - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@v5...v6) Updates `azure/setup-kubectl` from 4 to 5 - [Release notes](https://github.com/azure/setup-kubectl/releases) - [Changelog](https://github.com/Azure/setup-kubectl/blob/main/CHANGELOG.md) - [Commits](Azure/setup-kubectl@v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/setup-go dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: azure/setup-kubectl dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the go-dependencies group with 1 update: [github.com/jellydator/ttlcache/v3](https://github.com/jellydator/ttlcache). Updates `github.com/jellydator/ttlcache/v3` from 3.4.0 to 3.4.1 - [Release notes](https://github.com/jellydator/ttlcache/releases) - [Commits](jellydator/ttlcache@v3.4.0...v3.4.1) --- updated-dependencies: - dependency-name: github.com/jellydator/ttlcache/v3 dependency-version: 3.4.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#1720) Bumps golang.org/x/crypto from v0.51.0 to v0.52.0 to fix 8 HIGH severity CVEs (CVE-2026-39827 through CVE-2026-46598) that caused the sidecar image Trivy scan to fail in CI. Signed-off-by: Cong Liu <conliu@google.com>
llm-d#1690) * refactor: single-ended queue backed by heap The dispatch SafeQueue was double-ended, but nothing consumed PeekTail other than the flowQueueAccessor passthrough itself: dispatch needs only the head, and eviction uses an independent ordering. With PeekTail gone the queue is single-ended, so the bespoke max-min heap that gave O(1) access to both ends is no longer justified. - Remove the unused PeekTail from the SafeQueue contract and all implementers, mocks, and tests; rename PeekHead -> Peek now that there is a single end. - Replace the hand-rolled max-min heap (maxminheap.go) with a standard library container/heap priority queue (priorityqueue.go), eliminating the custom bubble-up/down logic and the latent bug class it carried. - Drop the side handle map: membership is validated by index identity (a *heapItem lives in exactly one queue), preserving the contract's Remove error semantics with less state. O(log n) targeted removal is retained via per-item index tracking. - Rename the registered queue MaxMinHeap -> PriorityQueue. Signed-off-by: Luke Van Drie <lukevandrie@google.com> * refactor(flowcontrol): apply review feedback on the priority queue Addresses review comments from @shmuelk on llm-d#1690: - Correct the new-file license headers to "2026 The llm-d Authors". - Clarify the PriorityQueue doc: ordering is maintained by an internal container/heap (the queue does not itself implement heap.Interface). - Narrow the Add and Remove critical sections to the shared heap mutation. Precondition checks (nil, type assertion) and the atomic byteSize update move outside the lock; the isInvalidated and index reads stay inside it, since those fields are written under the lock by Remove/Cleanup/Drain and reading them lock-free would race. - Compact Cleanup survivors in place instead of allocating a second slice, clearing the vacated tail so removed items are not retained. Signed-off-by: Luke Van Drie <lukevandrie@google.com> --------- Signed-off-by: Luke Van Drie <lukevandrie@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…1723) - remove explicitly config to use 8200 in deployment and test - keep local dev test with 8001 Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* Evaluate inotify limit on the kind node, not the host The Prometheus config-reloader inotify pre-flight check ran an unconditional cat of /proc/sys/fs/inotify/max_user_instances on the host, before the cluster existed. On macOS that proc file is absent (kind runs inside the container-runtime VM), so the check aborted env-dev-kind with PROM_ENABLED=true. The limit that applies to pods is the kind node's kernel: the host kernel on Linux, the runtime VM kernel on macOS. Move the check to after the node container is created and read the value from inside it, which is correct on both platforms. Update the remediation message to cover raising the limit on the host (Linux) and in the runtime VM (macOS). Signed-off-by: roytman <roytman@il.ibm.com> * fix copilot comments Signed-off-by: roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com>
* feat(inflightloadproducer): allow configurable output ratio Signed-off-by: mohammadkhan <mohammadkhan@digitalocean.com> * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Mohammad Ashar Khan <140482588+asharkhan3101@users.noreply.github.com> * ci: fix copilot line removal Signed-off-by: mohammadkhan <mohammadkhan@digitalocean.com> --------- Signed-off-by: mohammadkhan <mohammadkhan@digitalocean.com> Signed-off-by: Mohammad Ashar Khan <140482588+asharkhan3101@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…tive (llm-d#1728) * Support enabling and disabling feature flags Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Removed deprecated dataLayer feature flag Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated feature flag registration with default value Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated tests WRT new feature flag features Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
…ences (llm-d#1733) * fix(metrics): reset flow control metrics and correct deprecated references Add Reset() calls for flowControlDispatchCycleDuration and llmdFlowControlDispatchCycleDuration, which were registered but never reset. Update help strings of deprecated metrics to point to the actual metric names (e.g. using llm_d_epp_request_size_bytes instead of llm_d_epp_request_sizes). Signed-off-by: Ishwar <ishwarcm@iitbhilai.ac.in> * test(metrics): update expectations for deprecated metric references Signed-off-by: Ishwar <ishwarcm@iitbhilai.ac.in> --------- Signed-off-by: Ishwar <ishwarcm@iitbhilai.ac.in>
…llm-d#1747) * test(flowcontrol): add regression test for eager ManagedQueue resolution Verify that processAllQueuesConcurrently resolves ManagedQueue handles in Phase 1 (during IterateQueues) rather than deferring to Phase 3 workers. Deferred resolution races with flow GC, which can delete the flow between collection and lookup. Uses a phase-aware mock: ManagedQueueFunc succeeds while IterateQueues is running and fails after it returns, so the test passes only when resolution happens eagerly inside the callback. Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * fix(flowcontrol): resolve ManagedQueue eagerly and fix FlowKey.String for negative priorities Resolve ManagedQueue handles in Phase 1 of processAllQueuesConcurrently (inside the IterateQueues callback) instead of deferring to Phase 3 workers. This eliminates a race where flow GC deletes a flow between collection and lookup, producing spurious error logs. Fix FlowKey.String() to use strconv.Itoa instead of FormatUint so negative priorities render correctly in logs. Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * fix(flowcontrol): improve variable naming and trim test comment Rename work to resolvedQueues and t to task for clarity. Condense test comment to remove pseudocode arrows. Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * fix(flowcontrol): log skipped GC'd flows at DEBUG level Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * fix(flowcontrol): include error in debug log for skipped queues Address review feedback: include the error value in the debug log so ErrPriorityBandNotFound (real inconsistency) is distinguishable from ErrFlowInstanceNotFound (benign GC race). Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> --------- Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
…lm-d#1716) Signed-off-by: Cong Liu <conliu@google.com>
* release: set LATENCY_PREDICTOR_TAG to v0.8.0-rc.1 Signed-off-by: Cong Liu <conliu@google.com> * deps(go): bump llm-d-kv-cache to v0.9.0 (llm-d#1691) Update github.com/llm-d/llm-d-kv-cache from v0.9.0-rc.1 to the released v0.9.0. Signed-off-by: Maroon Ayoub <mayoub@redhat.com> Signed-off-by: Cong Liu <conliu@google.com> * fix(chart): update latency predictor image path to match subchart nesting (llm-d#1683) Signed-off-by: weizhoublue <weizhou.lan@daocloud.io> Signed-off-by: Cong Liu <conliu@google.com> * consume official latency predictor v0.8 release (llm-d#1729) Signed-off-by: greg pereira <grpereir@redhat.com> * ci: add artifacts target and upload-artifacts job for CRDs (llm-d#1713) Signed-off-by: Cong Liu <conliu@google.com> * fix: bump golang.org/x/crypto to v0.52.0 to resolve Trivy CVEs (llm-d#1720) Bumps golang.org/x/crypto from v0.51.0 to v0.52.0 to fix 8 HIGH severity CVEs (CVE-2026-39827 through CVE-2026-46598) that caused the sidecar image Trivy scan to fail in CI. Signed-off-by: Cong Liu <conliu@google.com> * docs: assemble release notes for v0.9.0 Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> * revert the latency predictor change Signed-off-by: Cong Liu <conliu@google.com> * Update push-chart.sh Signed-off-by: Cong Liu <conliu@google.com> --------- Signed-off-by: Cong Liu <conliu@google.com> Signed-off-by: Maroon Ayoub <mayoub@redhat.com> Signed-off-by: weizhoublue <weizhou.lan@daocloud.io> Signed-off-by: greg pereira <grpereir@redhat.com> Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: Cong Liu <conliu@google.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: weizhoublue <45163302+weizhoublue@users.noreply.github.com> Co-authored-by: Greg Pereira <grpereir@redhat.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…ap (llm-d#1739) * fix(inflightload): prevent in-flight counter underflow on endpoint flap The per-endpoint in-flight request/token counters are a +1/-1 ledger keyed only by NamespacedName. PreRequest increments via the tracker map; release (OnEvicted/releaseTokensEarly) re-looked-up the counter by endpoint ID and decremented whatever instance was currently mapped. When an endpoint flaps (NotReady -> EventDelete drops the counter -> rejoins under the same NamespacedName -> new traffic recreates it), an in-flight request's release landed on the recreated counter instead of the one it incremented, driving it negative. This surfaced as negative pool saturation (observed ~ -1.4), amplified when a pod crashed while holding many in-flight requests. The existing registeredEndpoints pointer-identity guard only drops out-of-order delete events for an already-replaced pointer; it does not cover the normal flap sequence, so the underflow persisted. - Capture a direct pointer to the exact *atomic.Int64 each request increments, and decrement that instance on release. A stale release after a flap now lands harmlessly on the orphaned counter; the live (recreated) counter only ever sees its own request's +1/-1. - Remove the now-unnecessary addIfPresent/decIfPresent 'is it present' guard. - Add a decrementClamped CAS helper (mirroring the canonical predictedlatency.decrementEndpointCounter primitive) that floors decrements at 0 as defense-in-depth, so saturation can never go negative even under a future regression. - Add regression tests for the delete-then-recreate flap and the crash-with-high-load drain (both fail without the fix), plus coverage for the stale-delete guard. Tests use the realistic AddOrUpdate-then-Delete ordering. Signed-off-by: Luke Van Drie <lukevandrie@google.com> * test(inflightload): table-driven flap test covering token counters Consolidate the per-request flap regression into a single table-driven test and extend coverage to the token counter, which receives the same instance-capture fix but was previously only asserted via the request counter. Cases cover both counters across both release paths: - request counter via the EndOfStream eviction path (OnEvicted) - token counter via the EndOfStream eviction path (OnEvicted) - token counter via the StartOfStream early-release path (releaseTokensEarly) Each case runs the same flap sequence (join, track A, delete + rejoin, track B, release A then B) and asserts A's release lands on the orphaned counter, leaving B's live counter accurate and never negative. Signed-off-by: Luke Van Drie <lukevandrie@google.com> --------- Signed-off-by: Luke Van Drie <lukevandrie@google.com>
…#1751) * fix(flowcontrol): return 503 (not 429) when no endpoints exist When the candidate pool has zero endpoints (e.g. scaled to zero), the flow control queue acts as a scale-from-zero waiting room: requests are buffered until capacity comes online. Previously, a request that overflowed the queue in that state was finalized as QueueOutcomeRejectedCapacity and surfaced as HTTP 429, painting a picture of partial availability when there is none. The mix of 429 (capacity overflow) and 503 (TTL expiry) for the same "no backends" condition was confusing for upstreams. Distinguish this case with a new QueueOutcomeRejectedNoEndpoints outcome that the admission layer maps to 503 Service Unavailable (with an x-llm-d-request-dropped-reason: rejected-no-endpoints header), matching upstream EPP behavior. The processor caches pool emptiness from the dispatch cycle and consults it when rejecting at capacity. Scale-from-zero buffering is preserved; only the terminal error code changes, and only when the pool is empty. TTL eviction already returns 503 and is unchanged. Fixes llm-d#1735 Signed-off-by: Luke Van Drie <lukevandrie@google.com> * test(flowcontrol): cover empty-pool no-endpoints classification The capacity-enforcement integration tests block dispatch with a saturated detector and an empty endpoint pool, then assert QueueOutcomeRejectedCapacity. With the no-endpoints fix, a queue-capacity rejection on an empty pool now classifies as QueueOutcomeRejectedNoEndpoints, so these counts broke. The detector ignores the pool, so giving those tests a single placeholder endpoint keeps dispatch blocked while restoring the RejectedCapacity classification they intend to exercise. Add TestEmptyPoolRejectsAsNoEndpoints to cover the scale-from-zero path end-to-end: with an empty pool, a full-queue rejection is surfaced as RejectedNoEndpoints (wrapping ErrNoEndpoints), never RejectedCapacity. Signed-off-by: Luke Van Drie <lukevandrie@google.com> --------- Signed-off-by: Luke Van Drie <lukevandrie@google.com>
…r caps (llm-d#1750) * feat(requesthandling): normalize client max output tokens across parsers Add InferenceRequestBody.MaxOutputTokens, a single normalized view of the client-requested output-token cap, populated by each parser from its API-specific field: - OpenAI completions: max_tokens - OpenAI chat completions: max_completion_tokens, falling back to max_tokens - OpenAI responses: max_output_tokens - Anthropic messages: max_tokens - vLLM gRPC/HTTP generate: SamplingParams.max_tokens A shared MaxOutputTokensFromPayload helper reads the value from the already -retained payload map with per-API key precedence, treating an explicit 0 as a binding cap and ignoring absent/negative/non-integral values. Downstream plugins can now bound behavior by the client's requested output limit. Signed-off-by: Luke Van Drie <lukevandrie@google.com> * feat(inflightload): bound output estimate by client and operator caps The in-flight load producer estimated output tokens as round(input * ratio), which badly overstates output for high-input/low-output workloads (e.g. long -context summarization). Bound the estimate two ways: - Client cap: when the request carries an output limit (now surfaced as Body.MaxOutputTokens), use it as an upper bound on the estimate. - Operator cap: new maxEstimatedOutputTokens config caps the estimate regardless of input or client value (validated non-negative). EstimateOutput now takes the client cap; the producer threads the request through estimateRequestTokens. Estimate becomes min(round(input*ratio), clientCap?, operatorCap?). Signed-off-by: Luke Van Drie <lukevandrie@google.com> * refactor(requesthandling): polish max-output-tokens handling Address review feedback on the max-output-tokens change: - MaxOutputTokensFromPayload now falls back to the next key when a present key holds a malformed value (wrong type, negative, non-integral) instead of returning nil. A chat request with an invalid max_completion_tokens but a valid max_tokens now correctly uses the latter. - Resolve the OpenAI API type once in ParseRequest and pass it to extractRequestBody, removing a duplicate path parse + classification on the request hot path. - Use k8s.io/utils/ptr.To for *int64/*uint32 in tests instead of bespoke per-file pointer helpers, matching the rest of the codebase. Signed-off-by: Luke Van Drie <lukevandrie@google.com> --------- Signed-off-by: Luke Van Drie <lukevandrie@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…m-d#1699) * Added the ability to set the number of replicas and to enable leader selection Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added RBAC for leader selection Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Set extra variables for development runs Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Use the same version of kind in both e2e tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored functions for more flexibility in tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored metrics fetching Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Aliased additional IGW test utilities Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refcatored EPP creation for leadership tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored tests, added encoding and leadership tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Fixed lint error Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
…diffs (llm-d#1827) * fix: base release-notes assembly branch on main to exclude unrelated diffs The workflow previously created its branch from the tagged commit. When the tag diverged from main HEAD, the PR diff included unrelated changes (e.g. hack/push-chart.sh in llm-d#1688). Split the script into three phases: 1. Read fragments from the tagged commit (preserves the guard against post-tag fragments). 2. Fetch and branch from origin/main so the PR only diffs release-notes files against main. 3. Apply changes and remove only the fragments that exist on main. Signed-off-by: Cong Liu <conliu@google.com> * fix: add comment explaining fragment removal loop Signed-off-by: Cong Liu <conliu@google.com> * fix: skip release-notes assembly for RC tags Signed-off-by: Cong Liu <conliu@google.com> * fix: add comment for RC tag exclusion Signed-off-by: Cong Liu <conliu@google.com> --------- Signed-off-by: Cong Liu <conliu@google.com>
- 0.9.0 has main-dev set in annoataion llm-d.ai/bundle-version Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Add documentation for the seven flow control metrics exposed when the flowControl feature gate is enabled: queue duration, dispatch cycle duration, enqueue duration, queue size, queue bytes, pool saturation, and requests total. Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> Co-authored-by: Luke Van Drie <lukevandrie@google.com>
…m-d#1701) Replace the string-based --endpoint-selector flag with Kubernetes-native labels.Selector to support set-based label expressions (e.g., 'env in (prod,staging),tier!=frontend', 'key,!deprecated') in standalone mode. - EndpointSelector type: string -> labels.Selector - Complete(): parse flag string via labels.Parse() - EndpointPool.Selector: map[string]string -> labels.Selector - PoolLabelsMatch simplified with native Selector.Matches() - podResyncAll passes Selector directly to K8s client - All callers and tests updated for new type Signed-off-by: shichaooutlook <shichao.outlook@gmail.com> Co-authored-by: shichaooutlook <268144704+shichaooutlook@users.noreply.github.com>
* fix: clear KV block index entries on endpoint removal On EventDelete, call Index.Clear with the pod's IP address to remove stale entries that reference blocks no longer cached on the removed pod. Fixes llm-d#1736 Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * Update pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go Co-authored-by: Bongwoo Bak <bongwoobak@gmail.com> Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com> * fix(test): match Clear assertion to IP:Port PodIdentifier format Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> * Update pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go Co-authored-by: Maroon Ayoub <Maroonay@gmail.com> Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com> * Update pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go Co-authored-by: Maroon Ayoub <Maroonay@gmail.com> Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com> --------- Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com> Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com> Co-authored-by: Bongwoo Bak <bongwoobak@gmail.com> Co-authored-by: Maroon Ayoub <Maroonay@gmail.com>
📝 WalkthroughWalkthroughThis PR prepares the v0.9.0 release with several independent but coordinated changes: Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes |
There was a problem hiding this comment.
Actionable comments posted: 11
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/epp/flowcontrol/contracts/mocks/mocks.go (1)
364-373: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winMake
MockManagedQueue.Peek()return the actual head item.This implementation violates the new
Peek()contract by returning whichever item Go yields first fromm.items. Map iteration order is nondeterministic, so any test that uses this mock to exercise queue ordering can become flaky or validate the wrong winner.Possible fix
func (m *MockManagedQueue) Peek() flowcontrol.QueueItemAccessor { m.mu.Lock() defer m.mu.Unlock() m.init() - for _, item := range m.items { - return item // Return first item found - } - return nil // Queue is empty + var best flowcontrol.QueueItemAccessor + policy := m.OrderingPolicy() + for _, item := range m.items { + if best == nil { + best = item + continue + } + if policy != nil && policy.Less(item, best) { + best = item + } + } + return best }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/flowcontrol/contracts/mocks/mocks.go` around lines 364 - 373, `MockManagedQueue.Peek()` is using nondeterministic map iteration instead of returning the queue head, which can make ordering tests flaky. Update the mock in `MockManagedQueue.Peek` to select the same item that `Pop`/queue order would treat as first, using the mock’s existing ordering state rather than ranging over `m.items`. Keep the behavior aligned with the new `Peek()` contract and preserve the nil case when the queue is empty.
🟠 Major comments (14)
.github/workflows/ci-release.yaml-55-57 (1)
55-57: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winGate
upload-artifactsonbuild-and-push.This job only depends on
set-params, so it can create or update the GitHub release even when image/chart publication fails. That leaves release state split across systems: GitHub has tagged assets, while the registry publish path failed. Addbuild-and-pushtoneedsso the release is only mutated after the runtime artifacts succeed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci-release.yaml around lines 55 - 57, The upload-artifacts job is only gated on set-params, so it can update the GitHub release before the build-and-push path succeeds. Update the ci-release workflow so upload-artifacts also depends on build-and-push, ensuring the release mutation only happens after runtime artifact publication completes successfully..github/ISSUE_TEMPLATE/new-release.md-73-74 (1)
73-74: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winThese commands never switch to the release branch.
git reset --hard ${REMOTE}/release-${BRANCH_VERSION}rewrites whatever branch the operator currently has checked out. If they are still onmain, this resetsmaininstead of checking outrelease-${BRANCH_VERSION}.Suggested fix
- git fetch ${REMOTE} - git reset --hard ${REMOTE}/release-${BRANCH_VERSION} + git fetch "${REMOTE}" + git checkout -B "release-${BRANCH_VERSION}" "${REMOTE}/release-${BRANCH_VERSION}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/ISSUE_TEMPLATE/new-release.md around lines 73 - 74, The release setup commands currently reset the checked-out branch instead of switching to the release branch, so update the workflow in the new-release template to explicitly check out release-${BRANCH_VERSION} after git fetch ${REMOTE}. Adjust the command sequence in the template section containing git fetch and git reset so it creates or switches to the intended release branch rather than mutating whatever branch is active..github/workflows/ci-release.yaml-74-91 (1)
74-91: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winMove
needs.set-params.outputs.tagintoenv:.${{ needs.set-params.outputs.tag }}is interpolated directly intorun:in both steps, creating a CWE-94 injection boundary for ref-derived data. Use a quoted shell variable instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci-release.yaml around lines 74 - 91, The workflow step uses needs.set-params.outputs.tag directly inside the run script, which creates an injection risk for ref-derived data. Move the tag value into env for both Build Artifacts and Upload Release Assets, then reference a quoted shell variable inside the script instead of interpolating it inline. Update the shell commands around the artifact build and gh release upload/create logic to use that env-backed variable consistently..github/workflows/release-notes-assemble.yaml-92-106 (1)
92-106: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winMake the assembly step idempotent before prepending the new section.
This block always rewrites
RELEASE-NOTES.mdby prependingRELEASE ${TAG}. A manual rerun after the assembly PR has already merged will add the same tag section again instead of cleanly no-oping.Remediation
# Phase 3: apply release-notes changes on top of main + if grep -q "^RELEASE ${TAG} " RELEASE-NOTES.md 2>/dev/null; then + echo "Release notes for ${TAG} already assembled; exiting." + exit 0 + fi touch RELEASE-NOTES.md cat "$new_section" RELEASE-NOTES.md > RELEASE-NOTES.md.new🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release-notes-assemble.yaml around lines 92 - 106, The release-notes assembly step in the workflow should be idempotent: before the `cat "$new_section" RELEASE-NOTES.md > RELEASE-NOTES.md.new` prepend in the Phase 3 block, check whether `RELEASE-NOTES.md` already contains the `RELEASE ${TAG}` section and skip the rewrite if it does. Update the logic around the existing `new_section`/`RELEASE-NOTES.md` merge so rerunning the job after the assembly PR has merged becomes a clean no-op instead of duplicating the same release section..github/workflows/unstale.yaml-31-32 (1)
31-32: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winRemove the unused checkout step (CWE-829) at
.github/workflows/unstale.yaml:31-32.gh issue editdoes not need a repository checkout, soactions/checkout@v7only adds an avoidable unpinned third-party action to a job withissues: writeandpull-requests: write. Delete it; if it must stay, pin it by full SHA.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/unstale.yaml around lines 31 - 32, The unstale workflow includes an unnecessary repository checkout step, which adds an extra third-party action without being used by the job. Remove the checkout step from the workflow job that runs gh issue edit, or if the checkout is still required for some other reason, update the actions/checkout usage in unstale.yaml to a full commit SHA. Use the checkout step label and the gh issue edit job context to locate the change.Source: Path instructions
pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue.go-159-168 (1)
159-168: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winSerialize
byteSizewith the heap mutation (CWE-362).
Addreleasespq.mubefore updatingpq.byteSize. IfDrainorCleanupacquires the lock between Line 165 and Line 167, the heap can be emptied andbyteSizereset to 0, thenAddrestores a stale byte count on an empty queue. That corrupts max-bytes accounting and can trigger false capacity rejections. Move the byte update into the same critical section asheap.Push.Proposed fix
func (pq *priorityQueue) Add(item flowcontrol.QueueItemAccessor) { hi := &heapItem{item: item} item.SetHandle(hi) pq.mu.Lock() heap.Push(pq.heap, hi) - pq.mu.Unlock() - pq.byteSize.Add(item.OriginalRequest().ByteSize()) + pq.mu.Unlock() }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue.go` around lines 159 - 168, The Add method in priorityQueue updates byteSize outside the pq.mu critical section, which can race with Drain or Cleanup and leave stale accounting on an empty heap. Move the pq.byteSize.Add call into the same locked section as heap.Push in priorityQueue.Add, using the existing pq.mu and heap mutation path so heap state and byte accounting stay serialized.pkg/epp/flowcontrol/controller/internal/processor.go-289-296 (1)
289-296: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRefresh endpoint emptiness before classifying capacity rejection.
poolEmptyis only updated indispatchCycle, butRuncallsenqueuebeforedispatchCyclefor new items. Startup or endpoint flaps can use stale state here and return the wrong rejection outcome, mapping no-endpoints to capacity or capacity to no-endpoints.Proposed fix
- sp.enqueue(item) + sp.enqueue(ctx, item) sp.dispatchCycle(ctx) // Process immediately when an item arrives-func (sp *Processor) enqueue(item *FlowItem) { +func (sp *Processor) enqueue(ctx context.Context, item *FlowItem) {if ok, stats := sp.hasCapacity(key.Priority, req.ByteSize()); !ok { + pool := sp.endpointCandidates.Locate(ctx, nil) + sp.poolEmpty = len(pool) == 0 // When the pool has no endpoints, the queue is acting as a scale-from-zero waiting room. A capacity rejection in // that state reflects genuine unavailability (surfaced as 503), not backpressure against a contended pool (429). if sp.poolEmpty {Update direct unit-test calls to pass
context.Background().Also applies to: 363-364
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/flowcontrol/controller/internal/processor.go` around lines 289 - 296, The rejection path in processor.go is using a stale sp.poolEmpty value inside enqueue/classification logic, so update the endpoint-emptiness check before deciding QueueOutcomeRejectedNoEndpoints versus capacity rejection. Refresh the pool state in the path that handles new items in Run/enqueue, or call the same state update used by dispatchCycle before the capacity check, so the decision is based on current endpoint availability. Also update the direct unit-test calls mentioned in the comment to pass context.Background() where required.pkg/epp/framework/interface/requesthandling/types.go-114-148 (1)
114-148: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winReject lossy or overflowing token caps before converting to int64.
json.Unmarshalgives JSON numbers asfloat64, so large client caps can round before this helper sees them, and out-of-range values can still hit implementation-dependent float-to-int behavior (CWE-190). Decode withjson.Decoder.UseNumber()upstream or bound-check before returning the cap.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/framework/interface/requesthandling/types.go` around lines 114 - 148, MaxOutputTokensFromPayload currently accepts float64/json.Number values and converts them to int64 without rejecting lossy or out-of-range caps. Update the parsing path in MaxOutputTokensFromPayload to preserve numeric precision and verify bounds before conversion, and ensure callers decode request JSON with json.Decoder.UseNumber where the payload enters this helper. Keep the existing key precedence and validation behavior, but reject values that cannot be represented exactly or safely as int64 before returning the cap.pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go-55-60 (1)
55-60: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winPropagate KV index cleanup failures from endpoint deletes.
If
KVBlockIndex().Clear()fails here,Extractstill returnsnilafter removing the subscriber. That leaves stale pod entries in the prefix-cache index with no retry signal, so later scheduling can keep reading deleted endpoint metadata until some unrelated event rebuilds the index.Proposed fix
case fwkdl.EventDelete: p.subscribersManager.RemoveSubscriber(ctx, endpointKey) if meta.Address != "" { if err := p.kvCacheIndexer.KVBlockIndex().Clear(ctx, fmt.Sprintf("%s:%s", meta.Address, meta.Port)); err != nil { logger.Error(err, "Failed to clear index entries for removed endpoint", "endpoint", endpointKey, "address", meta.Address, "port", meta.Port) + return fmt.Errorf("clear kv index for %s: %w", endpointKey, err) } } logger.V(logging.DEBUG).Info("Removed KV-events subscriber", "endpoint", endpointKey)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go` around lines 55 - 60, Propagate the KV index cleanup failure from the endpoint-delete path in Extract instead of only logging it. In the block that calls p.kvCacheIndexer.KVBlockIndex().Clear(), return the error (or wrap it with context) so the caller sees the failure after the subscriber is removed, and keep the existing logger.Error call for observability. Use the Extract method and KVBlockIndex().Clear as the key symbols to update.scripts/kind-dev-env.sh-289-289 (1)
289-289: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winQuote
CONTAINER_RUNTIMEbefore executing it.Line 289 runs
${CONTAINER_RUNTIME}unquoted. If that environment variable is tampered with, the bootstrap path can execute unintended shell syntax during cluster setup (CWE-78). Use"${CONTAINER_RUNTIME}" exec ...here.As per path instructions, "
**/*.sh: Quote all variables to prevent injection".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/kind-dev-env.sh` at line 289, The cluster bootstrap command in the inotify check is using CONTAINER_RUNTIME without quotes, which can allow shell injection if the variable is manipulated. Update the command in the script’s inotify_limit check to use the quoted CONTAINER_RUNTIME invocation style, keeping the exec call to CONTAINER_NAME and the /proc/sys/fs/inotify/max_user_instances read intact.Source: Path instructions
scripts/kind-dev-env.sh-409-412 (1)
409-412: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winDo not overwrite caller-provided HA settings.
These exports force every invocation back to
EPP_REPLICA_COUNT=1andENABLE_LEADER_ELECTION=false, which disables the HA path this cohort is trying to exercise. Default them only when unset.Proposed fix
-export EPP_REPLICA_COUNT=1 +export EPP_REPLICA_COUNT="${EPP_REPLICA_COUNT:-1}" -# Some end to end tests enable leader election -export ENABLE_LEADER_ELECTION=false +# Some end to end tests enable leader election +export ENABLE_LEADER_ELECTION="${ENABLE_LEADER_ELECTION:-false}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/kind-dev-env.sh` around lines 409 - 412, The HA defaults in kind-dev-env.sh are being forced on every run, which overwrites caller-provided settings. Update the setup around the EPP_REPLICA_COUNT and ENABLE_LEADER_ELECTION exports so they only apply when those variables are unset, preserving any values passed in by the caller while still providing defaults for local runs.deploy/components/inference-gateway/rbac.yaml-71-73 (1)
71-73: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winScope the lease RBAC to least privilege.
Lines 71-73 grant the EPP ServiceAccount write access to every
Leasein the namespace, includingdelete. If that identity is abused, an attacker can seize or remove other controllers' locks and force failover or denial of service (CWE-732). Restrict the rule to the single leader-election lease name and dropdeleteunless the client library proves it needs it.Remediation sketch
- apiGroups: [ "coordination.k8s.io" ] resources: [ "leases" ] - verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ] + verbs: [ "get", "create", "update", "patch" ] + # If the lock name is fixed, also scope this rule with resourceNames.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/components/inference-gateway/rbac.yaml` around lines 71 - 73, The RBAC rule for lease access in the inference-gateway config is too broad; update the Lease permissions on the relevant Role/RoleBinding entry to follow least privilege. In the RBAC manifest, scope the leases rule used by the EPP ServiceAccount to the single leader-election Lease name via resourceNames, and remove delete unless the leader-election client in this codebase explicitly requires it. Use the existing lease rule block in rbac.yaml to locate and tighten the permissions.test/scripts/test-e2e-gaie.sh-28-28 (1)
28-28: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winPin
KIND_NODE_IMAGEby digest.
kindest/node:v1.31.12is a mutable reference (CWE-829); a retagged image can swap the Kubernetes node runtime for every E2E cluster. Usekindest/node@sha256:...instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/scripts/test-e2e-gaie.sh` at line 28, The KIND node image reference in the E2E script is mutable and should be pinned to an immutable digest. Update the KIND_NODE_IMAGE default in test-e2e-gaie.sh to use the kindest/node image by sha256 digest instead of the v1.31.12 tag, so the cluster setup in the script always pulls the exact intended node image.test/e2e/e2e_test.go-69-149 (1)
69-149: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winRegister cleanup before the first assertion.
These specs only clean up at the tail. Any failed
Expector timeout before Lines 71, 87, 106, or 146 leaves EPP or model-server objects behind, and the next ordered spec can see the wrong pod count or stale metrics. Useginkgo.DeferCleanupimmediately after each create call.Remediation sketch
modelServers := createModelServersDecode(1) +ginkgo.DeferCleanup(testutils.DeleteObjects, testConfig, modelServers) epp := createEndPointPickerHelper(simpleConfig, numOfPods, true, false) +ginkgo.DeferCleanup(testutils.DeleteObjects, testConfig, epp)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/e2e_test.go` around lines 69 - 149, Register cleanup immediately after each resource creation in these Ginkgo specs so failures before the tail cleanup do not leak objects into later ordered tests. In the affected test cases, add ginkgo.DeferCleanup right after createInferencePool, createModelServersDecode, createEndPointPicker, and createEndPointPickerHelper, and remove reliance on the final testutils.DeleteObjects calls for cleanup. This keeps the generated resources tied to the lifecycle of each spec and avoids stale pod counts or metrics between runs.
🟡 Minor comments (11)
docs/metrics.md-49-55 (1)
49-55: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winAdd
RejectedNoEndpointsto theoutcomelabel contract.The new empty-pool 503 path introduces a terminal outcome that is missing from this metric definition. Consumers will miss a real bucket unless the docs match the implemented label set.
Suggested edit
- * `outcome`: string (`Dispatched`, `RejectedCapacity`, `RejectedOther`, `EvictedTTL`, `EvictedContextCancelled`, `EvictedOther`) + * `outcome`: string (`Dispatched`, `RejectedCapacity`, `RejectedNoEndpoints`, `RejectedOther`, `EvictedTTL`, `EvictedContextCancelled`, `EvictedOther`)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/metrics.md` around lines 49 - 55, The `flow_control_request_queue_duration_seconds` metric label contract is missing the new terminal outcome introduced by the empty-pool 503 path. Update the `outcome` label list in `docs/metrics.md` to include `RejectedNoEndpoints` alongside the existing values so the documented contract matches the implemented labels.AGENTS.md-33-33 (1)
33-33: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winFix the typo in the PR checklist.
passsshould bepasses.Remediation
-- Verify the code passs `make presubmit` locally before submitting a PR. +- Verify the code passes `make presubmit` locally before submitting a PR.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@AGENTS.md` at line 33, The PR checklist text in AGENTS.md has a typo in the presubmit verification item; update the checklist entry so it says the code passes make presubmit locally before submitting a PR, correcting the misspelling in that sentence.Source: Linters/SAST tools
pkg/epp/config/loader/configloader.go-316-324 (1)
316-324: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReject malformed feature-gate overrides. CWE-20:
strings.Splitacceptsname=true=falseand drops the trailing segment, so a malformed override is treated as valid input. Usestrings.CutorSplitN(..., 2)and reject any value containing a second=.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/config/loader/configloader.go` around lines 316 - 324, The feature-gate parsing in config loader is too permissive because the current string split accepts malformed overrides like name=true=false and silently truncates them. Update the parsing in configloader.go to use a two-part split via strings.Cut or strings.SplitN(..., 2) in the gate handling logic, and explicitly reject any override whose value still contains another = so malformed input returns an error instead of being accepted.pkg/epp/config/loader/configloader_test.go-79-80 (1)
79-80: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winDo not mutate the global feature-gate registry from a parallel test.
The writes on Lines 79-80 make the existing
t.Parallel()on Line 76 unsafe. This file already keepsTestInstantiateAndConfigureserial for another global registry, soTestLoadRawConfigurationcan become order-dependent as more loader tests are added. Make this test serial or snapshot and restore the feature-gate registry.Proposed fix
func TestLoadRawConfiguration(t *testing.T) { - t.Parallel() - // Register known feature gates for validation. RegisterFeatureGate(testFeatureGate, true) RegisterFeatureGate(flowcontrol.FeatureGate, false)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/config/loader/configloader_test.go` around lines 79 - 80, TestLoadRawConfiguration is mutating the global feature-gate registry via RegisterFeatureGate while also running with t.Parallel, which makes the test order-dependent. Make this test serial by removing t.Parallel, or snapshot and restore the feature-gate registry around the RegisterFeatureGate calls so parallel execution does not leak global state. Use the existing TestLoadRawConfiguration setup and its RegisterFeatureGate calls as the place to contain this mutation.pkg/epp/config/loader/validation.go-125-133 (1)
125-133: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReject malformed
name=valuefeature-gate entries with extra=.Line 125 uses
strings.Split, soflowControl=true=garbagepasses validation because onlyparts[1]is parsed as a bool. That lets malformed config through this phase and makes downstream feature-gate handling ambiguous. Usestrings.Cutor explicitly reject more than one=and add a regression case next toerrorBadFeatureGateText.Proposed fix
- parts := strings.Split(gate, "=") - if _, ok := registeredFeatureGates[parts[0]]; !ok { - return fmt.Errorf("feature gate '%s' is unknown or unregistered", parts[0]) + name, value, hasValue := strings.Cut(gate, "=") + name = strings.TrimSpace(name) + if _, ok := registeredFeatureGates[name]; !ok { + return fmt.Errorf("feature gate '%s' is unknown or unregistered", name) } - if len(parts) > 1 { - _, err := strconv.ParseBool(strings.ToLower(strings.TrimSpace(parts[1]))) + if hasValue { + value = strings.TrimSpace(value) + if strings.Contains(value, "=") { + return fmt.Errorf("feature gate '%s' has an invalid value %q", name, value) + } + _, err := strconv.ParseBool(value) if err != nil { - return fmt.Errorf("%s is not a valid value for the feature gate %s (error: %w)", parts[1], parts[0], err) + return fmt.Errorf("%s is not a valid value for the feature gate %s (error: %w)", value, name, err) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/config/loader/validation.go` around lines 125 - 133, The feature-gate validation in validation.go accepts malformed name=value entries because strings.Split allows extra "=" segments, so fix the parsing in the feature-gate loop to reject any gate with more than one "=" or switch to strings.Cut and fail when a second separator is present. Update the validation logic around the registeredFeatureGates check and strconv.ParseBool handling so only exactly one optional value is allowed, and add a regression test alongside errorBadFeatureGateText for inputs like flowControl=true=garbage.pkg/epp/requestcontrol/admission.go-230-232 (1)
230-232: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winAvoid duplicating the no-endpoints message.
If
erristypes.ErrNoEndpoints, Line 232 returnsno endpoints available: no endpoints available. Reusemsgas-is for that sentinel, or only prepend whenerrcarries extra detail.Suggested fix
case types.QueueOutcomeRejectedNoEndpoints: // No serving capacity exists (e.g. pool scaled to zero): signal genuine unavailability rather than backpressure. - return errcommon.Error{Code: errcommon.ServiceUnavailable, Msg: "no endpoints available: " + msg, Headers: map[string]string{errcommon.RequestDroppedReasonHeaderKey: string(errcommon.RequestDroppedReasonNoEndpoints)}} + if errors.Is(err, types.ErrNoEndpoints) { + return errcommon.Error{Code: errcommon.ServiceUnavailable, Msg: msg, Headers: map[string]string{errcommon.RequestDroppedReasonHeaderKey: string(errcommon.RequestDroppedReasonNoEndpoints)}} + } + return errcommon.Error{Code: errcommon.ServiceUnavailable, Msg: "no endpoints available: " + msg, Headers: map[string]string{errcommon.RequestDroppedReasonHeaderKey: string(errcommon.RequestDroppedReasonNoEndpoints)}}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/requestcontrol/admission.go` around lines 230 - 232, The `QueueOutcomeRejectedNoEndpoints` branch in the admission error mapping is duplicating the no-endpoints text when `err` is `types.ErrNoEndpoints`. Update this case so it uses `msg` directly for the sentinel, and only prefixes with “no endpoints available:” when the incoming error carries additional detail. Keep the existing `errcommon.Error` construction and `RequestDroppedReasonNoEndpoints` header unchanged.pkg/epp/flowcontrol/framework/plugins/queue/functional_test.go-510-512 (1)
510-512: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winRemove the non-atomic
Len/Peekassertion.Another goroutine can remove the peeked item between Line 510 and Line 511, making
Len() == 0whilepeekedis still non-nil. This makes the concurrency test flaky.Proposed fix
- peeked := q.Peek() - if q.Len() == 0 { - assert.Nil(t, peeked, "Peek on empty queue expected nil") - } + _ = q.Peek()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/flowcontrol/framework/plugins/queue/functional_test.go` around lines 510 - 512, The concurrency test in the queue functional test is relying on a non-atomic Len/Peek check, which can become flaky if another goroutine modifies the queue between the two calls. Update the assertion around q.Peek() and q.Len() in the test to avoid depending on their combined state, and instead verify Peek behavior in a way that does not assume no concurrent removal. Use the q.Peek and q.Len calls in this test block as the location to adjust the check.test/e2e/e2e_suite_test.go-32-33 (1)
32-33: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winDo not hardcode the EPP app label.
eppNameis fixed toe2e-epp, but the deployed pod label comes from${EPP_NAME}. Any suite run that overridesEPP_NAMEwill stop finding the leader pod and fail the HA assertions. Read the same env var here, withe2e-epponly as the default.Proposed fix
- eppName = "e2e-epp" + eppName = env.GetEnvString("EPP_NAME", "e2e-epp", ginkgo.GinkgoLogr)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/e2e_suite_test.go` around lines 32 - 33, The eppName test constant is hardcoded to e2e-epp, which can break leader-pod discovery when EPP_NAME is overridden in the environment. Update the e2e suite setup around eppName to read the same EPP_NAME env var used by the deployment, and keep e2e-epp only as the fallback default. Make the change in the test initialization where eppName is defined so the HA assertions and pod label matching stay aligned with the deployed app label.test/e2e/requests_test.go-292-295 (1)
292-295: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winAssert the malformed requests fail or move the error counters.
These calls are not validated today. The helper only checks metric-family presence, so it still passes if invalid bodies start returning 200 or stop incrementing the error series. Assert
statusCode >= 400here, or compare the pre/post values of the request-error counters before accepting the scrape.Also applies to: 365-367
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/requests_test.go` around lines 292 - 295, The malformed request loop in requests_test.go is only exercising the helper without verifying the outcome, so it can pass even if invalid bodies stop failing or no longer affect error metrics. Update the relevant calls around the chat completions error cases (including the second occurrence noted in the comment) to assert each response has statusCode >= 400, or capture the request-error counter values before and after the loop and verify they increase. Use the existing doPostWithError helper and the surrounding test flow in requests_test.go to anchor the fix.test/e2e/setup_test.go-139-148 (1)
139-148: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winBound the readiness probe with an HTTP timeout.
A blocked
http.Getcan hang longer thanreadyTimeout, becauseEventuallycannot interrupt a single stuck request. Use a short-lived client or per-request context here so a wedged gateway does not freeze the suite.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/setup_test.go` around lines 139 - 148, The gateway readiness probe in the test setup currently uses an unbounded http.Get, which can block past readyTimeout and stall the suite. Update the readiness check in the Eventually block to use a short-lived HTTP client or a per-request context timeout around the request, then keep the existing status/body readiness logic and cleanup in place so the probe fails fast if the gateway is wedged.test/e2e/utils_test.go-197-205 (1)
197-205: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winAdd timeouts to the new network helpers.
Both helpers use the default client with no deadline. One stalled Hugging Face request or half-open metrics scrape can block forever and bypass the surrounding retry logic.
Proposed fix
+var e2eHTTPClient = &http.Client{Timeout: 10 * time.Second} + func isModelReal(modelName string) bool { req, err := http.NewRequest("GET", "https://huggingface.co/api/models/"+modelName, nil) if err != nil { return false } if token := os.Getenv("HF_TOKEN"); token != "" { req.Header.Set("Authorization", "Bearer "+token) } - resp, err := http.DefaultClient.Do(req) + resp, err := e2eHTTPClient.Do(req) if err != nil { return false } defer resp.Body.Close() @@ func getMetrics(metricsURL string) []string { var body []byte gomega.Eventually(func() error { - resp, err := http.Get(metricsURL) + resp, err := e2eHTTPClient.Get(metricsURL) if err != nil { return err }Also applies to: 285-298
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/utils_test.go` around lines 197 - 205, The new network helpers are using http.DefaultClient without any deadline, so a stalled Hugging Face request or metrics scrape can hang indefinitely and skip retries. Update the helper(s) that make these calls to use a client with a timeout or per-request context deadline instead of DefaultClient, and apply the same fix to both request paths referenced by the tests. Keep the existing request construction and token header logic intact while ensuring the network call always fails fast.
🧹 Nitpick comments (4)
cmd/epp/runner/runner.go (1)
290-290: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRemove the inline mode paraphrase.
The comment repeats the condition and is too long for an inline note.
Proposed fix
- startCrdReconcilers := opts.EndpointSelector == nil // If endpointSelector is nil, it means it's not in the standalone mode. Then we should start the inferencePool and other CRD Reconciler. + // CRD reconcilers own pool discovery outside standalone mode. + startCrdReconcilers := opts.EndpointSelector == nilAs per coding guidelines, "Write comments tersely and only when the reason is non-obvious; do not paraphrase the code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/epp/runner/runner.go` at line 290, The comment on startCrdReconcilers in runner.go is too long and just restates the nil-check, so remove the inline paraphrase and replace it with a terse note only if needed. Keep the explanation focused on the non-obvious reason for starting the CRD reconcilers in non-standalone mode, and do not repeat the condition already expressed in the code.Source: Coding guidelines
pkg/epp/server/options.go (1)
76-76: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winKeep selector parsing facts in one location.
Line 76 and Line 116 both document the
--endpoint-selector/Complete()parsing path. Keep the public field comment canonical and make the internal field comment terse.Proposed fix
- EndpointSelector labels.Selector // Parsed selector to filter model server pods on. Set via --endpoint-selector flag and parsed in Complete(). + EndpointSelector labels.Selector // Selector to filter model server pods. EndpointTargetPorts []int // Target ports of model server pods. DisableEndpointSubsetFilter bool // Disables respecting destination endpoint subset metadata in EPP. @@ - fs *pflag.FlagSet // FlagSet used in AddFlags() and consulted in Validate() - endpointSelectorStr string // Raw string from --endpoint-selector flag, parsed to EndpointSelector in Complete() + fs *pflag.FlagSet // FlagSet used in AddFlags() and consulted in Validate() + endpointSelectorStr string // Raw --endpoint-selector value.As per coding guidelines, "State each fact once in its canonical location; do not duplicate the same information across struct docs, prose, tables, inline comments, and examples."
Also applies to: 115-116
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/server/options.go` at line 76, The `EndpointSelector` docs repeat the same `--endpoint-selector` parsing details in both the public field comment and the internal `Complete()`-related comment; keep the full parsing explanation only on the canonical `EndpointSelector` comment and make the `Complete`/internal field comment terse so each fact is stated once. Update the comments in `options.go` around `EndpointSelector` and `Complete()` to remove duplicated wording while preserving the core meaning.Source: Coding guidelines
pkg/epp/flowcontrol/types/errors.go (1)
49-52: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winCollapse this duplicated rationale into one canonical doc site.
The scale-from-zero and 503-vs-429 explanation is repeated here and again on
QueueOutcomeRejectedNoEndpoints. Keep the sentinel error comment terse and leave the longer behavior note in one place. As per coding guidelines, "Write comments tersely and only when the reason is non-obvious; do not paraphrase the code." and "State each fact once in its canonical location; do not duplicate the same information across struct docs, prose, tables, inline comments, and examples."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/flowcontrol/types/errors.go` around lines 49 - 52, The comment on ErrNoEndpoints duplicates the longer scale-from-zero and 503-vs-429 rationale that already belongs with QueueOutcomeRejectedNoEndpoints. Keep the ErrNoEndpoints sentinel comment brief and non-redundant, and move or retain the detailed behavior explanation only in the canonical QueueOutcomeRejectedNoEndpoints documentation so the same fact is stated once.Source: Coding guidelines
pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.go (1)
144-159: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winTrim the new comment blocks.
These comments restate the implementation in detail. Keep only the invariant so the comments do not become a second copy of the code. As per coding guidelines, "Write comments tersely and only when the reason is non-obvious; do not paraphrase the code."
Also applies to: 637-674
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.go` around lines 144 - 159, Trim the new comment blocks in addedTokensEntry and Clone so they only state the non-obvious invariant instead of restating the implementation. Update the comments near tokenCounter, requestCounter, requests, and the Clone method to be terse, focusing on the shared-counter/independent-clone behavior rather than describing how increment/release works step by step.Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b39aa87a-a414-4894-b899-95f2e37babcb
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum,!**/*.sum
📒 Files selected for processing (159)
.github/ISSUE_TEMPLATE/new-release.md.github/workflows/check-typos.yaml.github/workflows/ci-build-images.yaml.github/workflows/ci-dependency-review.yaml.github/workflows/ci-lint.yaml.github/workflows/ci-pr-checks.yaml.github/workflows/ci-release.yaml.github/workflows/md-link-check.yml.github/workflows/release-notes-assemble.yaml.github/workflows/release-notes-update.yaml.github/workflows/unstale.yamlAGENTS.mdMakefileRELEASE-NOTES.mdcmd/epp/runner/runner.godeploy/components/inference-gateway/deployment.yamldeploy/components/inference-gateway/rbac.yamldeploy/components/vllm-decode/deployment.yamldeploy/environments/dev/base-kind-istio/patch-deployments.yamldeploy/environments/dev/e-p-d/patch-decode.yamldeploy/environments/dev/e-pd/patch-decode.yamldeploy/environments/dev/p-d/patch-decode.yamldocs/metrics.mdgo.modhack/update-codegen.shpkg/common/error/error.gopkg/epp/config/loader/configloader.gopkg/epp/config/loader/configloader_test.gopkg/epp/config/loader/testdata_test.gopkg/epp/config/loader/validation.gopkg/epp/controller/inferencepool_reconciler_test.gopkg/epp/datalayer/endpoint_pool.gopkg/epp/datalayer/factory.gopkg/epp/datastore/datastore.gopkg/epp/datastore/datastore_test.gopkg/epp/flowcontrol/contracts/mocks/mocks.gopkg/epp/flowcontrol/controller/internal/processor.gopkg/epp/flowcontrol/controller/internal/processor_test.gopkg/epp/flowcontrol/framework/plugins/queue/benchmark_test.gopkg/epp/flowcontrol/framework/plugins/queue/functional_test.gopkg/epp/flowcontrol/framework/plugins/queue/listqueue.gopkg/epp/flowcontrol/framework/plugins/queue/maxminheap.gopkg/epp/flowcontrol/framework/plugins/queue/maxminheap_test.gopkg/epp/flowcontrol/framework/plugins/queue/priorityqueue.gopkg/epp/flowcontrol/framework/plugins/queue/priorityqueue_test.gopkg/epp/flowcontrol/integration_helpers_test.gopkg/epp/flowcontrol/integration_test.gopkg/epp/flowcontrol/registry/config.gopkg/epp/flowcontrol/registry/config_test.gopkg/epp/flowcontrol/registry/managedqueue.gopkg/epp/flowcontrol/registry/managedqueue_test.gopkg/epp/flowcontrol/types/errors.gopkg/epp/flowcontrol/types/outcomes.gopkg/epp/framework/interface/flowcontrol/flow.gopkg/epp/framework/interface/flowcontrol/mocks/mocks.gopkg/epp/framework/interface/flowcontrol/queue.gopkg/epp/framework/interface/requesthandling/types.gopkg/epp/framework/interface/requesthandling/types_test.gopkg/epp/framework/plugins/flowcontrol/fairness/functional_test.gopkg/epp/framework/plugins/flowcontrol/fairness/globalstrict/global_strict.gopkg/epp/framework/plugins/flowcontrol/fairness/globalstrict/global_strict_test.gopkg/epp/framework/plugins/flowcontrol/fairness/program-aware/plugin.gopkg/epp/framework/plugins/flowcontrol/fairness/program-aware/plugin_test.gopkg/epp/framework/plugins/flowcontrol/fairness/program-aware/strategy_las.gopkg/epp/framework/plugins/flowcontrol/fairness/program-aware/strategy_las_test.gopkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/README.mdpkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.gopkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer_test.gopkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/token_estimator.gopkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/token_estimator_test.gopkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.gopkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor_test.gopkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/producer_test.gopkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/metrics.gopkg/epp/framework/plugins/requesthandling/parsers/anthropic/anthropic.gopkg/epp/framework/plugins/requesthandling/parsers/anthropic/anthropic_test.gopkg/epp/framework/plugins/requesthandling/parsers/openai/openai.gopkg/epp/framework/plugins/requesthandling/parsers/openai/openai_test.gopkg/epp/framework/plugins/requesthandling/parsers/vllmgrpc/vllmgrpc.gopkg/epp/framework/plugins/requesthandling/parsers/vllmgrpc/vllmgrpc_test.gopkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.gopkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp_test.gopkg/epp/metrics/metrics.gopkg/epp/metrics/testdata/input_tokens_metricpkg/epp/metrics/testdata/normalized_time_per_output_token_seconds_metricpkg/epp/metrics/testdata/output_tokens_metricpkg/epp/metrics/testdata/request_sizes_metricpkg/epp/metrics/testdata/response_sizes_metricpkg/epp/metrics/testdata/running_requests_metricspkg/epp/requestcontrol/admission.gopkg/epp/requestcontrol/admission_test.gopkg/epp/server/options.gopkg/epp/util/pool/pool.gopkg/sidecar/proxy/options.gopkg/sidecar/proxy/options_test.gorelease-notes.d/unreleased/1030.mdrelease-notes.d/unreleased/1121.mdrelease-notes.d/unreleased/1160.mdrelease-notes.d/unreleased/1206.mdrelease-notes.d/unreleased/1218.mdrelease-notes.d/unreleased/1234.mdrelease-notes.d/unreleased/1244.mdrelease-notes.d/unreleased/1247.mdrelease-notes.d/unreleased/1248.mdrelease-notes.d/unreleased/1276.mdrelease-notes.d/unreleased/1288.mdrelease-notes.d/unreleased/1302.mdrelease-notes.d/unreleased/1372.mdrelease-notes.d/unreleased/1402.mdrelease-notes.d/unreleased/1418.mdrelease-notes.d/unreleased/1426.mdrelease-notes.d/unreleased/1429.mdrelease-notes.d/unreleased/1436.mdrelease-notes.d/unreleased/1444.mdrelease-notes.d/unreleased/1449.mdrelease-notes.d/unreleased/1475.mdrelease-notes.d/unreleased/1488.mdrelease-notes.d/unreleased/1493.mdrelease-notes.d/unreleased/1509.mdrelease-notes.d/unreleased/1513.mdrelease-notes.d/unreleased/1515.mdrelease-notes.d/unreleased/1536.mdrelease-notes.d/unreleased/1539.mdrelease-notes.d/unreleased/1548.mdrelease-notes.d/unreleased/1550.mdrelease-notes.d/unreleased/1554.mdrelease-notes.d/unreleased/1575.mdrelease-notes.d/unreleased/1603.mdrelease-notes.d/unreleased/1607.mdrelease-notes.d/unreleased/1608.mdrelease-notes.d/unreleased/1616.mdrelease-notes.d/unreleased/1626.mdrelease-notes.d/unreleased/1631.mdrelease-notes.d/unreleased/1651.mdrelease-notes.d/unreleased/1653.mdrelease-notes.d/unreleased/1661.mdrelease-notes.d/unreleased/1667.mdrelease-notes.d/unreleased/1690.mdrelease-notes.d/unreleased/1728.mdrelease-notes.d/unreleased/1731.mdrelease-notes.d/unreleased/1739.mdrelease-notes.d/unreleased/1750.mdrelease-notes.d/unreleased/1751.mdscripts/kind-dev-env.shtest/e2e/e2e_suite_test.gotest/e2e/e2e_test.gotest/e2e/epp/utils_test.gotest/e2e/requests_test.gotest/e2e/setup_test.gotest/e2e/utils_test.gotest/integration/epp/grpc_test.gotest/integration/epp/harness.gotest/integration/epp/hermetic_test.gotest/scripts/test-e2e-gaie.shtest/sidecar/config/nixl/qwen-decoder-pod.yamltest/testdata/configloader_1_test.yamltest/utils/igw/utils.gotest/utils/k8s_objects.goversion/version.go
💤 Files with no reviewable changes (48)
- release-notes.d/unreleased/1436.md
- release-notes.d/unreleased/1513.md
- release-notes.d/unreleased/1536.md
- release-notes.d/unreleased/1206.md
- release-notes.d/unreleased/1302.md
- release-notes.d/unreleased/1653.md
- release-notes.d/unreleased/1509.md
- release-notes.d/unreleased/1539.md
- release-notes.d/unreleased/1418.md
- release-notes.d/unreleased/1449.md
- release-notes.d/unreleased/1550.md
- release-notes.d/unreleased/1575.md
- release-notes.d/unreleased/1429.md
- release-notes.d/unreleased/1244.md
- release-notes.d/unreleased/1444.md
- release-notes.d/unreleased/1218.md
- release-notes.d/unreleased/1426.md
- release-notes.d/unreleased/1661.md
- release-notes.d/unreleased/1276.md
- release-notes.d/unreleased/1603.md
- release-notes.d/unreleased/1234.md
- release-notes.d/unreleased/1372.md
- release-notes.d/unreleased/1288.md
- release-notes.d/unreleased/1493.md
- release-notes.d/unreleased/1515.md
- deploy/environments/dev/p-d/patch-decode.yaml
- release-notes.d/unreleased/1488.md
- deploy/environments/dev/e-pd/patch-decode.yaml
- deploy/environments/dev/e-p-d/patch-decode.yaml
- release-notes.d/unreleased/1475.md
- deploy/components/vllm-decode/deployment.yaml
- release-notes.d/unreleased/1631.md
- release-notes.d/unreleased/1121.md
- release-notes.d/unreleased/1626.md
- release-notes.d/unreleased/1608.md
- release-notes.d/unreleased/1247.md
- release-notes.d/unreleased/1030.md
- release-notes.d/unreleased/1160.md
- release-notes.d/unreleased/1548.md
- release-notes.d/unreleased/1554.md
- release-notes.d/unreleased/1607.md
- release-notes.d/unreleased/1248.md
- test/sidecar/config/nixl/qwen-decoder-pod.yaml
- pkg/epp/flowcontrol/framework/plugins/queue/maxminheap.go
- release-notes.d/unreleased/1651.md
- release-notes.d/unreleased/1402.md
- pkg/epp/datalayer/factory.go
- pkg/epp/flowcontrol/framework/plugins/queue/maxminheap_test.go
| - name: Checkout code | ||
| uses: actions/checkout@v6 | ||
| uses: actions/checkout@v7 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
Pin actions/checkout by full SHA and set persist-credentials: false. actions/checkout@v7 is a movable ref (CWE-829; same tag-move risk class as CVE-2025-30066), and the default checkout behavior leaves ${{ github.token }} in .git/config (CWE-200).
🧰 Tools
🪛 zizmor (1.26.1)
[warning] 19-20: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/check-typos.yaml around lines 19 - 20, The checkout step
uses a movable actions ref and leaves the token persisted, so update the
workflow’s Checkout code step to use a full SHA for actions/checkout instead of
the version tag, and set persist-credentials to false. Make the change in the
workflow job that runs the checkout so the action reference is pinned and
credentials are not stored in .git/config.
| - name: Checkout source | ||
| uses: actions/checkout@v6 | ||
| uses: actions/checkout@v7 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=".github/workflows/ci-build-images.yaml"
echo "== file size =="
wc -l "$file"
echo
echo "== relevant lines =="
nl -ba "$file" | sed -n '1,180p'
echo
echo "== checkout occurrences =="
rg -n "actions/checkout|persist-credentials|uses: \./|uses: .*\.github" "$file"Repository: opendatahub-io/llm-d-router
Length of output: 287
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=".github/workflows/ci-build-images.yaml"
echo "== lines 1-140 with numbers =="
awk '{printf "%4d %s\n", NR, $0}' "$file" | sed -n '1,140p'
echo
echo "== direct matches =="
grep -nE 'actions/checkout|persist-credentials|uses: \./|uses: .*\.github' "$file" || trueRepository: opendatahub-io/llm-d-router
Length of output: 5290
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "persist-credentials|actions/checkout@" .github/workflows .github/actions || trueRepository: opendatahub-io/llm-d-router
Length of output: 1574
🌐 Web query:
actions/checkout persist-credentials default true documentation
💡 Result:
The persist-credentials input for the actions/checkout GitHub Action has a default value of true [1][2][3]. When set to true (the default), the action configures the authentication token or SSH key used for the checkout into the local git configuration [1][3]. This allows subsequent git commands executed within your workflow scripts to run authenticated [1][4]. If you wish to opt out of this behavior—for example, for security reasons if you do not want the credentials to persist in the git configuration—you must explicitly set persist-credentials to false in your workflow file [1][5]. Regardless of this setting, the action ensures that credentials are removed during the post-job cleanup process [1][4].
Citations:
- 1: https://github.com/actions/checkout/?tab=readme-ov-file
- 2: https://github.com/actions/checkout/blob/master/action.yml
- 3: https://github.com/actions/checkout/blob/v4/action.yml
- 4: https://github.com/actions/checkout/blob/main/README.md
- 5: https://github.com/actions/checkout/blob/v4/README.md
Pin the three checkout steps by SHA and disable persisted credentials.
actions/checkout@v7 is a movable tag in all three jobs, so a retargeted tag can inject code into this workflow (CWE-829). persist-credentials defaults to true, leaving the token in .git/config for later steps; set it to false (CWE-200).
🧰 Tools
🪛 zizmor (1.26.1)
[warning] 32-33: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/ci-build-images.yaml around lines 32 - 33, Update each
Checkout source step in the workflow to use a fixed commit SHA instead of the
movable actions/checkout@v7 tag, and set persist-credentials to false on all
three checkout steps. Make the change in every job that uses the checkout action
so the workflow consistently pins the action and does not leave the token in
.git/config.
Sources: Path instructions, Linters/SAST tools
| - name: Checkout source | ||
| uses: actions/checkout@v6 | ||
| uses: actions/checkout@v7 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
sed -n '1,120p' .github/workflows/ci-dependency-review.yamlRepository: opendatahub-io/llm-d-router
Length of output: 655
Pin checkout and disable persisted credentials
actions/checkout@v7 is a mutable tag (CWE-829). Its default auth setup also writes ${{ github.token }} into the local git config (CWE-200). Pin it by full SHA and set persist-credentials: false here.
🧰 Tools
🪛 zizmor (1.26.1)
[warning] 15-16: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/ci-dependency-review.yaml around lines 15 - 16, The
checkout step is using a mutable action tag and leaving persisted credentials
enabled. Update the Checkout source step in the workflow to use a full
SHA-pinned reference for actions/checkout, and set persist-credentials to false
in that same step so the default token is not written into local git config.
Sources: Path instructions, Linters/SAST tools
| - name: Checkout source | ||
| uses: actions/checkout@v6 | ||
| uses: actions/checkout@v7 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the workflow file with line numbers around the referenced lines.
sed -n '1,120p' .github/workflows/ci-lint.yaml | cat -n
# Locate all checkout usages in this workflow.
rg -n 'uses:\s*actions/checkout@' .github/workflows/ci-lint.yamlRepository: opendatahub-io/llm-d-router
Length of output: 4021
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# List all action references in the workflow.
rg -n 'uses:\s*' .github/workflows/ci-lint.yaml
# Show surrounding lines for every action reference.
sed -n '30,110p' .github/workflows/ci-lint.yaml | cat -nRepository: opendatahub-io/llm-d-router
Length of output: 3177
Pin every workflow action by full SHA. actions/checkout@v7 (both steps), dorny/paths-filter@v4, actions/setup-go@v6, and actions/cache@v5 are mutable tags and widen the CI supply-chain attack surface (CWE-829 / CVE-2025-30066 class). Replace each uses: reference with a vetted commit SHA.
🧰 Tools
🪛 zizmor (1.26.1)
[warning] 36-37: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/ci-lint.yaml around lines 36 - 37, The workflow currently
uses mutable action tags, so update every `uses:` reference in this CI config to
a vetted full commit SHA. Replace the `actions/checkout`, `dorny/paths-filter`,
`actions/setup-go`, and `actions/cache` entries with pinned SHAs, including both
checkout steps, and keep the existing step names intact so the workflow behavior
remains the same.
Source: Path instructions
| steps: | ||
| - name: Checkout source | ||
| uses: actions/checkout@v6 | ||
| uses: actions/checkout@v7 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
Pin every actions/checkout reference to a full SHA.
actions/checkout@v7 is a mutable tag (CWE-829). Replace each occurrence in this workflow (28, 56, 73, 231, 281, 327, 402) with the vetted commit SHA; otherwise a retargeted release can run arbitrary code in CI.
🧰 Tools
🪛 zizmor (1.26.1)
[warning] 27-28: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/ci-pr-checks.yaml at line 28, The workflow is using
mutable actions/checkout tags, which should be replaced with pinned commit SHAs.
Update every actions/checkout reference in ci-pr-checks.yaml, including the
checkout steps used in the workflow jobs, to the vetted full SHA instead of v7.
Keep the change consistent across all occurrences so the workflow always runs
the exact approved checkout revision.
| - name: Checkout code | ||
| uses: actions/checkout@v6 | ||
| uses: actions/checkout@v7 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
sed -n '1,220p' .github/workflows/md-link-check.ymlRepository: opendatahub-io/llm-d-router
Length of output: 1532
Critical: pin every uses: reference by full SHA (CWE-829, CVE-2025-30066 class).
actions/checkout@v7 and lycheeverse/lychee-action@v2.8.0 are mutable tags. Replace both with vetted commit SHAs.
🧰 Tools
🪛 zizmor (1.26.1)
[warning] 32-33: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/md-link-check.yml around lines 32 - 33, Both action
references are using mutable tags, so update the workflow to pin each `uses:`
entry to a vetted full commit SHA instead of version tags. Locate the
`actions/checkout` step and the `lycheeverse/lychee-action` step in the
workflow, and replace `@v7` / `@v2.8.0` with their approved SHAs.
Source: Path instructions
| # into main between the tag push and this workflow's run is not | ||
| # accidentally rolled into the wrong release. | ||
| uses: actions/checkout@v6 | ||
| uses: actions/checkout@v7 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
python3 - <<'PY'
from pathlib import Path
p = Path('.github/workflows/release-notes-assemble.yaml')
text = p.read_text()
for needle in ['actions/checkout@v7', 'origin/main', 'RELEASE-NOTES.md', 'rerun', 'overwrite']:
print(f'## {needle}')
for i, line in enumerate(text.splitlines(), 1):
if needle in line:
print(f'{i}: {line}')
PYRepository: opendatahub-io/llm-d-router
Length of output: 753
🏁 Script executed:
sed -n '70,130p' .github/workflows/release-notes-assemble.yaml | nl -baRepository: opendatahub-io/llm-d-router
Length of output: 205
🏁 Script executed:
awk 'NR>=70 && NR<=130 { printf "%4d %s\n", NR, $0 }' .github/workflows/release-notes-assemble.yamlRepository: opendatahub-io/llm-d-router
Length of output: 2756
Pin actions/checkout and make the release-notes write idempotent.
.github/workflows/release-notes-assemble.yaml:36usesactions/checkout@v7instead of a full SHA (CWE-829; CVE-2025-30066 class).- The job prepends a fresh
RELEASE-NOTES.mdsection on every run, so rerunning after the release notes already landed onmainwill duplicate the same tag block.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/release-notes-assemble.yaml at line 36, The release-notes
workflow currently uses an unpinned actions/checkout reference and blindly
prepends a new RELEASE-NOTES.md section each run. Update the checkout step in
the release-notes assembly workflow to a full commit SHA, and make the write
logic in the release-notes generation step idempotent by checking for an
existing tag section before adding it so reruns do not duplicate the same block.
Source: Path instructions
| - name: Checkout main | ||
| uses: actions/checkout@v6 | ||
| uses: actions/checkout@v7 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=".github/workflows/release-notes-update.yaml"
wc -l "$file"
echo "----"
cat -n "$file" | sed -n '1,220p'Repository: opendatahub-io/llm-d-router
Length of output: 4522
Critical: pin the GitHub Actions here by full SHA (CWE-829, CVE-2025-30066 class). This pull_request_target workflow runs with a GitHub App token and later force-pushes a branch, so actions/create-github-app-token@v3 and actions/checkout@v7 are both supply-chain entry points.
🧰 Tools
🪛 zizmor (1.26.1)
[warning] 32-36: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/release-notes-update.yaml around lines 32 - 33, The
workflow uses unpinned third-party GitHub Actions, which leaves the
pull_request_target job exposed to supply-chain risk. Update the actions
referenced in the release-notes update workflow, especially
actions/create-github-app-token and actions/checkout, to full commit SHAs
instead of version tags, and keep the existing job logic intact while changing
only the action references.
Source: Path instructions
| apiType := determineAPITypeFromPath(request.GetRequestPath(headers)) | ||
| extractedBody, err := extractRequestBody(apiType, body) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| extractedBody.Payload = fwkrh.PayloadMap(bodyMap) | ||
| extractedBody.MaxOutputTokens = maxOutputTokensForAPI(apiType, bodyMap) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the exact definition locations if the first pass did not surface them.
rg -n "type PayloadMap|MaxOutputTokensFromPayload|maxOutputTokensForAPI" pkg -SRepository: opendatahub-io/llm-d-router
Length of output: 1952
Pass fwkrh.PayloadMap through the max-token path
MaxOutputTokensFromPayload takes fwkrh.PayloadMap, but maxOutputTokensForAPI still accepts map[string]any and forwards it unchanged, which breaks the build on every branch here. Convert once and pass the typed payload through the helper. CWE-704.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/epp/framework/plugins/requesthandling/parsers/openai/openai.go` around
lines 115 - 121, The max-token flow is still using an untyped map, which causes
the build break in the OpenAI parser path. Update the request handling path in
openai.go so `maxOutputTokensForAPI` accepts and uses `fwkrh.PayloadMap` instead
of `map[string]any`, and pass the typed payload through from the
`extractRequestBody` call site where `extractedBody.Payload` and
`extractedBody.MaxOutputTokens` are set. Make the conversion once and keep the
payload typed all the way through the helper chain, including any
`MaxOutputTokensFromPayload` usage.
| // max_tokens lives under sampling_params in the generate wire format. | ||
| if sp, ok := bodyMap["sampling_params"].(map[string]any); ok { | ||
| body.MaxOutputTokens = fwkrh.MaxOutputTokensFromPayload(sp, "max_tokens") | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,220p' pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.go
printf '\n--- helper search ---\n'
rg -n "type PayloadMap|MaxOutputTokensFromPayload|sampling_params" pkg/epp/framework -S
printf '\n--- type definition context ---\n'
file=$(rg -l "type PayloadMap" pkg/epp/framework -S | head -n 1)
if [ -n "${file:-}" ]; then
sed -n '1,220p' "$file"
fiRepository: opendatahub-io/llm-d-router
Length of output: 16611
Convert sampling_params to fwkrh.PayloadMap first
Line 127 passes map[string]any into MaxOutputTokensFromPayload, which expects fwkrh.PayloadMap (CWE-704). This is a compile-time type mismatch. Cast the nested map before calling the helper.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.go`
around lines 125 - 128, The sampling_params handling in
vllmhttp.ParseGenerateRequest passes a plain map[string]any into
MaxOutputTokensFromPayload, but that helper expects fwkrh.PayloadMap. Update the
nested sampling_params extraction in vllmhttp.go to cast the map to
fwkrh.PayloadMap before calling fwkrh.MaxOutputTokensFromPayload, keeping the
existing body.MaxOutputTokens assignment unchanged.
Syncs llm-d/llm-d-router main into opendatahub-io/llm-d-router main.
Upstream commit: llm-d@1fa3803
Summary by CodeRabbit
New Features
Bug Fixes