Skip to content

[sync] upstream llm-d/llm-d-router 1fa3803d [2026-06-28]#262

Merged
zdtsw merged 33 commits into
opendatahub-io:mainfrom
zdtsw-forking:sync/upstream-1fa3803d
Jun 30, 2026
Merged

[sync] upstream llm-d/llm-d-router 1fa3803d [2026-06-28]#262
zdtsw merged 33 commits into
opendatahub-io:mainfrom
zdtsw-forking:sync/upstream-1fa3803d

Conversation

@zdtsw

@zdtsw zdtsw commented Jun 28, 2026

Copy link
Copy Markdown
Member

Syncs llm-d/llm-d-router main into opendatahub-io/llm-d-router main.

Upstream commit: llm-d@1fa3803

Summary by CodeRabbit

  • New Features

    • Added support for leader election/HA deployment settings and release artifact uploads in the release workflow.
    • Expanded flow-control with a new priority queue, clearer queue semantics, and support for distinguishing “no endpoints available” rejections.
    • Added request max-output-token handling across supported parsers and token estimation.
  • Bug Fixes

    • Improved metrics naming and release-note generation consistency.
    • Updated CI and docs for newer checkout behavior and refreshed release guidance.

llm-d-router-release-notes Bot and others added 30 commits June 22, 2026 09:41
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…d#1572)

GitHub Actions runners share public IPs that frequently hit the
HF Hub anonymous API quota, causing 429 errors during vllm-render
model downloads. This accounts for a large fraction of e2e flakes.

Propagate HF_TOKEN from a repo secret into the e2e builder container
so the vllm-render sidecar and the model-existence check authenticate
against HF Hub. Authenticated requests have a substantially higher
rate limit than anonymous ones.

Changes:
- Add HF_TOKEN to the Makefile E2E_ENV_VARS forwarding list
- Read HF_TOKEN from the environment in setup_test.go instead of
  hardcoding an empty string
- Attach a Bearer token header to the isModelReal HF API call
  in utils_test.go
- Pass secrets.HF_TOKEN in both e2e-gaie and e2e-router CI steps

The secret must be created as a GitHub repo secret. If the secret
is not set, behavior is unchanged (empty token, anonymous requests).

Closes llm-d#1527

Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
…3 updates (llm-d#1715)

Bumps the github-actions group with 3 updates in the / directory: [actions/checkout](https://github.com/actions/checkout), [actions/setup-go](https://github.com/actions/setup-go) and [azure/setup-kubectl](https://github.com/azure/setup-kubectl).


Updates `actions/checkout` from 6 to 7
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v6...v7)

Updates `actions/setup-go` from 5 to 6
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](actions/setup-go@v5...v6)

Updates `azure/setup-kubectl` from 4 to 5
- [Release notes](https://github.com/azure/setup-kubectl/releases)
- [Changelog](https://github.com/Azure/setup-kubectl/blob/main/CHANGELOG.md)
- [Commits](Azure/setup-kubectl@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: actions/setup-go
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: azure/setup-kubectl
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the go-dependencies group with 1 update: [github.com/jellydator/ttlcache/v3](https://github.com/jellydator/ttlcache).


Updates `github.com/jellydator/ttlcache/v3` from 3.4.0 to 3.4.1
- [Release notes](https://github.com/jellydator/ttlcache/releases)
- [Commits](jellydator/ttlcache@v3.4.0...v3.4.1)

---
updated-dependencies:
- dependency-name: github.com/jellydator/ttlcache/v3
  dependency-version: 3.4.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#1720)

Bumps golang.org/x/crypto from v0.51.0 to v0.52.0 to fix 8 HIGH
severity CVEs (CVE-2026-39827 through CVE-2026-46598) that caused
the sidecar image Trivy scan to fail in CI.

Signed-off-by: Cong Liu <conliu@google.com>
llm-d#1690)

* refactor: single-ended queue backed by heap

The dispatch SafeQueue was double-ended, but nothing consumed PeekTail
other than the flowQueueAccessor passthrough itself: dispatch needs only
the head, and eviction uses an independent ordering. With PeekTail gone
the queue is single-ended, so the bespoke max-min heap that gave O(1)
access to both ends is no longer justified.

- Remove the unused PeekTail from the SafeQueue contract and all
  implementers, mocks, and tests; rename PeekHead -> Peek now that there
  is a single end.
- Replace the hand-rolled max-min heap (maxminheap.go) with a standard
  library container/heap priority queue (priorityqueue.go), eliminating
  the custom bubble-up/down logic and the latent bug class it carried.
- Drop the side handle map: membership is validated by index identity
  (a *heapItem lives in exactly one queue), preserving the contract's
  Remove error semantics with less state. O(log n) targeted removal is
  retained via per-item index tracking.
- Rename the registered queue MaxMinHeap -> PriorityQueue.

Signed-off-by: Luke Van Drie <lukevandrie@google.com>

* refactor(flowcontrol): apply review feedback on the priority queue

Addresses review comments from @shmuelk on llm-d#1690:

- Correct the new-file license headers to "2026 The llm-d Authors".
- Clarify the PriorityQueue doc: ordering is maintained by an internal
  container/heap (the queue does not itself implement heap.Interface).
- Narrow the Add and Remove critical sections to the shared heap
  mutation. Precondition checks (nil, type assertion) and the atomic
  byteSize update move outside the lock; the isInvalidated and index
  reads stay inside it, since those fields are written under the lock by
  Remove/Cleanup/Drain and reading them lock-free would race.
- Compact Cleanup survivors in place instead of allocating a second
  slice, clearing the vacated tail so removed items are not retained.

Signed-off-by: Luke Van Drie <lukevandrie@google.com>

---------

Signed-off-by: Luke Van Drie <lukevandrie@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…1723)

- remove explicitly config to use 8200 in deployment and test
- keep local dev test with 8001

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* Evaluate inotify limit on the kind node, not the host

The Prometheus config-reloader inotify pre-flight check ran an
unconditional cat of /proc/sys/fs/inotify/max_user_instances on the
host, before the cluster existed. On macOS that proc file is absent
(kind runs inside the container-runtime VM), so the check aborted
env-dev-kind with PROM_ENABLED=true.

The limit that applies to pods is the kind node's kernel: the host
kernel on Linux, the runtime VM kernel on macOS. Move the check to
after the node container is created and read the value from inside it,
which is correct on both platforms. Update the remediation message to
cover raising the limit on the host (Linux) and in the runtime VM
(macOS).

Signed-off-by: roytman <roytman@il.ibm.com>

* fix copilot comments

Signed-off-by: roytman <roytman@il.ibm.com>

---------

Signed-off-by: roytman <roytman@il.ibm.com>
* feat(inflightloadproducer): allow configurable output ratio

Signed-off-by: mohammadkhan <mohammadkhan@digitalocean.com>

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Mohammad Ashar Khan <140482588+asharkhan3101@users.noreply.github.com>

* ci: fix copilot line removal

Signed-off-by: mohammadkhan <mohammadkhan@digitalocean.com>

---------

Signed-off-by: mohammadkhan <mohammadkhan@digitalocean.com>
Signed-off-by: Mohammad Ashar Khan <140482588+asharkhan3101@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…tive (llm-d#1728)

* Support enabling and disabling feature flags

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Removed deprecated dataLayer feature flag

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated feature flag registration with default value

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated tests WRT new feature flag features

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
…ences (llm-d#1733)

* fix(metrics): reset flow control metrics and correct deprecated references

Add Reset() calls for flowControlDispatchCycleDuration and llmdFlowControlDispatchCycleDuration, which were registered but never reset.

Update help strings of deprecated metrics to point to the actual metric names (e.g. using llm_d_epp_request_size_bytes instead of llm_d_epp_request_sizes).

Signed-off-by: Ishwar <ishwarcm@iitbhilai.ac.in>

* test(metrics): update expectations for deprecated metric references

Signed-off-by: Ishwar <ishwarcm@iitbhilai.ac.in>

---------

Signed-off-by: Ishwar <ishwarcm@iitbhilai.ac.in>
…llm-d#1747)

* test(flowcontrol): add regression test for eager ManagedQueue resolution

Verify that processAllQueuesConcurrently resolves ManagedQueue handles
in Phase 1 (during IterateQueues) rather than deferring to Phase 3
workers. Deferred resolution races with flow GC, which can delete the
flow between collection and lookup.

Uses a phase-aware mock: ManagedQueueFunc succeeds while IterateQueues
is running and fails after it returns, so the test passes only when
resolution happens eagerly inside the callback.

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* fix(flowcontrol): resolve ManagedQueue eagerly and fix FlowKey.String for negative priorities

Resolve ManagedQueue handles in Phase 1 of processAllQueuesConcurrently
(inside the IterateQueues callback) instead of deferring to Phase 3
workers. This eliminates a race where flow GC deletes a flow between
collection and lookup, producing spurious error logs.

Fix FlowKey.String() to use strconv.Itoa instead of FormatUint so
negative priorities render correctly in logs.

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* fix(flowcontrol): improve variable naming and trim test comment

Rename work to resolvedQueues and t to task for clarity. Condense
test comment to remove pseudocode arrows.

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* fix(flowcontrol): log skipped GC'd flows at DEBUG level

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* fix(flowcontrol): include error in debug log for skipped queues

Address review feedback: include the error value in the debug log so
ErrPriorityBandNotFound (real inconsistency) is distinguishable from
ErrFlowInstanceNotFound (benign GC race).

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

---------

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
* release: set LATENCY_PREDICTOR_TAG to v0.8.0-rc.1

Signed-off-by: Cong Liu <conliu@google.com>

* deps(go): bump llm-d-kv-cache to v0.9.0 (llm-d#1691)

Update github.com/llm-d/llm-d-kv-cache from v0.9.0-rc.1 to the
released v0.9.0.

Signed-off-by: Maroon Ayoub <mayoub@redhat.com>
Signed-off-by: Cong Liu <conliu@google.com>

* fix(chart): update latency predictor image path to match subchart nesting (llm-d#1683)

Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>
Signed-off-by: Cong Liu <conliu@google.com>

* consume official latency predictor v0.8 release (llm-d#1729)

Signed-off-by: greg pereira <grpereir@redhat.com>

* ci: add artifacts target and upload-artifacts job for CRDs (llm-d#1713)

Signed-off-by: Cong Liu <conliu@google.com>

* fix: bump golang.org/x/crypto to v0.52.0 to resolve Trivy CVEs (llm-d#1720)

Bumps golang.org/x/crypto from v0.51.0 to v0.52.0 to fix 8 HIGH
severity CVEs (CVE-2026-39827 through CVE-2026-46598) that caused
the sidecar image Trivy scan to fail in CI.

Signed-off-by: Cong Liu <conliu@google.com>

* docs: assemble release notes for v0.9.0

Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>

* revert the latency predictor change

Signed-off-by: Cong Liu <conliu@google.com>

* Update push-chart.sh

Signed-off-by: Cong Liu <conliu@google.com>

---------

Signed-off-by: Cong Liu <conliu@google.com>
Signed-off-by: Maroon Ayoub <mayoub@redhat.com>
Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>
Signed-off-by: greg pereira <grpereir@redhat.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: Cong Liu <conliu@google.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: weizhoublue <45163302+weizhoublue@users.noreply.github.com>
Co-authored-by: Greg Pereira <grpereir@redhat.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…ap (llm-d#1739)

* fix(inflightload): prevent in-flight counter underflow on endpoint flap

The per-endpoint in-flight request/token counters are a +1/-1 ledger keyed
only by NamespacedName. PreRequest increments via the tracker map; release
(OnEvicted/releaseTokensEarly) re-looked-up the counter by endpoint ID and
decremented whatever instance was currently mapped. When an endpoint flaps
(NotReady -> EventDelete drops the counter -> rejoins under the same
NamespacedName -> new traffic recreates it), an in-flight request's release
landed on the recreated counter instead of the one it incremented, driving it
negative. This surfaced as negative pool saturation (observed ~ -1.4),
amplified when a pod crashed while holding many in-flight requests.

The existing registeredEndpoints pointer-identity guard only drops
out-of-order delete events for an already-replaced pointer; it does not cover
the normal flap sequence, so the underflow persisted.

- Capture a direct pointer to the exact *atomic.Int64 each request increments,
  and decrement that instance on release. A stale release after a flap now
  lands harmlessly on the orphaned counter; the live (recreated) counter only
  ever sees its own request's +1/-1.
- Remove the now-unnecessary addIfPresent/decIfPresent 'is it present' guard.
- Add a decrementClamped CAS helper (mirroring the canonical
  predictedlatency.decrementEndpointCounter primitive) that floors decrements
  at 0 as defense-in-depth, so saturation can never go negative even under a
  future regression.
- Add regression tests for the delete-then-recreate flap and the
  crash-with-high-load drain (both fail without the fix), plus coverage for the
  stale-delete guard. Tests use the realistic AddOrUpdate-then-Delete ordering.

Signed-off-by: Luke Van Drie <lukevandrie@google.com>

* test(inflightload): table-driven flap test covering token counters

Consolidate the per-request flap regression into a single table-driven test and
extend coverage to the token counter, which receives the same instance-capture
fix but was previously only asserted via the request counter.

Cases cover both counters across both release paths:
- request counter via the EndOfStream eviction path (OnEvicted)
- token counter via the EndOfStream eviction path (OnEvicted)
- token counter via the StartOfStream early-release path (releaseTokensEarly)

Each case runs the same flap sequence (join, track A, delete + rejoin, track B,
release A then B) and asserts A's release lands on the orphaned counter, leaving
B's live counter accurate and never negative.

Signed-off-by: Luke Van Drie <lukevandrie@google.com>

---------

Signed-off-by: Luke Van Drie <lukevandrie@google.com>
…#1751)

* fix(flowcontrol): return 503 (not 429) when no endpoints exist

When the candidate pool has zero endpoints (e.g. scaled to zero), the
flow control queue acts as a scale-from-zero waiting room: requests are
buffered until capacity comes online. Previously, a request that
overflowed the queue in that state was finalized as
QueueOutcomeRejectedCapacity and surfaced as HTTP 429, painting a
picture of partial availability when there is none. The mix of 429
(capacity overflow) and 503 (TTL expiry) for the same "no backends"
condition was confusing for upstreams.

Distinguish this case with a new QueueOutcomeRejectedNoEndpoints outcome
that the admission layer maps to 503 Service Unavailable (with an
x-llm-d-request-dropped-reason: rejected-no-endpoints header), matching
upstream EPP behavior. The processor caches pool emptiness from the
dispatch cycle and consults it when rejecting at capacity.

Scale-from-zero buffering is preserved; only the terminal error code
changes, and only when the pool is empty. TTL eviction already returns
503 and is unchanged.

Fixes llm-d#1735

Signed-off-by: Luke Van Drie <lukevandrie@google.com>

* test(flowcontrol): cover empty-pool no-endpoints classification

The capacity-enforcement integration tests block dispatch with a saturated
detector and an empty endpoint pool, then assert QueueOutcomeRejectedCapacity.
With the no-endpoints fix, a queue-capacity rejection on an empty pool now
classifies as QueueOutcomeRejectedNoEndpoints, so these counts broke.

The detector ignores the pool, so giving those tests a single placeholder
endpoint keeps dispatch blocked while restoring the RejectedCapacity
classification they intend to exercise.

Add TestEmptyPoolRejectsAsNoEndpoints to cover the scale-from-zero path
end-to-end: with an empty pool, a full-queue rejection is surfaced as
RejectedNoEndpoints (wrapping ErrNoEndpoints), never RejectedCapacity.

Signed-off-by: Luke Van Drie <lukevandrie@google.com>

---------

Signed-off-by: Luke Van Drie <lukevandrie@google.com>
…r caps (llm-d#1750)

* feat(requesthandling): normalize client max output tokens across parsers

Add InferenceRequestBody.MaxOutputTokens, a single normalized view of the
client-requested output-token cap, populated by each parser from its
API-specific field:

- OpenAI completions: max_tokens
- OpenAI chat completions: max_completion_tokens, falling back to max_tokens
- OpenAI responses: max_output_tokens
- Anthropic messages: max_tokens
- vLLM gRPC/HTTP generate: SamplingParams.max_tokens

A shared MaxOutputTokensFromPayload helper reads the value from the already
-retained payload map with per-API key precedence, treating an explicit 0 as a
binding cap and ignoring absent/negative/non-integral values. Downstream
plugins can now bound behavior by the client's requested output limit.

Signed-off-by: Luke Van Drie <lukevandrie@google.com>

* feat(inflightload): bound output estimate by client and operator caps

The in-flight load producer estimated output tokens as round(input * ratio),
which badly overstates output for high-input/low-output workloads (e.g. long
-context summarization). Bound the estimate two ways:

- Client cap: when the request carries an output limit (now surfaced as
  Body.MaxOutputTokens), use it as an upper bound on the estimate.
- Operator cap: new maxEstimatedOutputTokens config caps the estimate
  regardless of input or client value (validated non-negative).

EstimateOutput now takes the client cap; the producer threads the request
through estimateRequestTokens. Estimate becomes
min(round(input*ratio), clientCap?, operatorCap?).

Signed-off-by: Luke Van Drie <lukevandrie@google.com>

* refactor(requesthandling): polish max-output-tokens handling

Address review feedback on the max-output-tokens change:

- MaxOutputTokensFromPayload now falls back to the next key when a present
  key holds a malformed value (wrong type, negative, non-integral) instead
  of returning nil. A chat request with an invalid max_completion_tokens but
  a valid max_tokens now correctly uses the latter.
- Resolve the OpenAI API type once in ParseRequest and pass it to
  extractRequestBody, removing a duplicate path parse + classification on
  the request hot path.
- Use k8s.io/utils/ptr.To for *int64/*uint32 in tests instead of bespoke
  per-file pointer helpers, matching the rest of the codebase.

Signed-off-by: Luke Van Drie <lukevandrie@google.com>

---------

Signed-off-by: Luke Van Drie <lukevandrie@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…m-d#1699)

* Added the ability to set the number of replicas and to enable leader selection

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added RBAC for leader selection

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Set extra variables for development runs

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Use the same version of kind in both e2e tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored functions for more flexibility in tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored metrics fetching

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Aliased additional IGW test utilities

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refcatored EPP creation for leadership tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored tests, added encoding and leadership tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Fixed lint error

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
…diffs (llm-d#1827)

* fix: base release-notes assembly branch on main to exclude unrelated diffs

The workflow previously created its branch from the tagged commit.
When the tag diverged from main HEAD, the PR diff included unrelated
changes (e.g. hack/push-chart.sh in llm-d#1688).

Split the script into three phases:
1. Read fragments from the tagged commit (preserves the guard against
   post-tag fragments).
2. Fetch and branch from origin/main so the PR only diffs release-notes
   files against main.
3. Apply changes and remove only the fragments that exist on main.

Signed-off-by: Cong Liu <conliu@google.com>

* fix: add comment explaining fragment removal loop

Signed-off-by: Cong Liu <conliu@google.com>

* fix: skip release-notes assembly for RC tags

Signed-off-by: Cong Liu <conliu@google.com>

* fix: add comment for RC tag exclusion

Signed-off-by: Cong Liu <conliu@google.com>

---------

Signed-off-by: Cong Liu <conliu@google.com>
- 0.9.0 has main-dev set in annoataion llm-d.ai/bundle-version

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Add documentation for the seven flow control metrics exposed when the
flowControl feature gate is enabled: queue duration, dispatch cycle
duration, enqueue duration, queue size, queue bytes, pool saturation,
and requests total.

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
Co-authored-by: Luke Van Drie <lukevandrie@google.com>
shichaooutlook and others added 2 commits June 27, 2026 10:44
…m-d#1701)

Replace the string-based --endpoint-selector flag with
Kubernetes-native labels.Selector to support set-based
label expressions (e.g., 'env in (prod,staging),tier!=frontend',
'key,!deprecated') in standalone mode.

- EndpointSelector type: string -> labels.Selector
- Complete(): parse flag string via labels.Parse()
- EndpointPool.Selector: map[string]string -> labels.Selector
- PoolLabelsMatch simplified with native Selector.Matches()
- podResyncAll passes Selector directly to K8s client
- All callers and tests updated for new type

Signed-off-by: shichaooutlook <shichao.outlook@gmail.com>
Co-authored-by: shichaooutlook <268144704+shichaooutlook@users.noreply.github.com>
* fix: clear KV block index entries on endpoint removal

On EventDelete, call Index.Clear with the pod's IP address to
remove stale entries that reference blocks no longer cached on
the removed pod.

Fixes llm-d#1736

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* Update pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go

Co-authored-by: Bongwoo Bak <bongwoobak@gmail.com>
Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com>

* fix(test): match Clear assertion to IP:Port PodIdentifier format

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* Update pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go

Co-authored-by: Maroon Ayoub <Maroonay@gmail.com>
Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com>

* Update pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go

Co-authored-by: Maroon Ayoub <Maroonay@gmail.com>
Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com>

---------

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
Signed-off-by: Rishabh Saini <rishabhsaini01@gmail.com>
Co-authored-by: Bongwoo Bak <bongwoobak@gmail.com>
Co-authored-by: Maroon Ayoub <Maroonay@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR prepares the v0.9.0 release with several independent but coordinated changes: actions/checkout bumped to v7 across all workflows; EndpointPool.Selector changed from map[string]string to labels.Selector; RegisterFeatureGate extended with an isEnabledByDefault bool and name=value parsing; ExperimentalDatalayerFeatureGate removed. Queue interface consolidates PeekHead/PeekTail into Peek; MaxMinHeap replaced by a container/heap-backed PriorityQueue. Flow control adds ErrNoEndpoints/QueueOutcomeRejectedNoEndpoints mapping to HTTP 503. MaxOutputTokens propagated through all parsers and the token estimator. InFlightLoadProducer captures counter pointers at increment time to prevent underflow on endpoint flap. HA leader election added to deploy manifests with RBAC. BundleVersion becomes a var; artifacts Makefile target and a ci-release upload job added. E2E infrastructure gains leader-election failover tests and verifyMetrics. Metrics help strings updated to renamed llm_d_epp_request_* names.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/epp/flowcontrol/contracts/mocks/mocks.go (1)

364-373: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make MockManagedQueue.Peek() return the actual head item.

This implementation violates the new Peek() contract by returning whichever item Go yields first from m.items. Map iteration order is nondeterministic, so any test that uses this mock to exercise queue ordering can become flaky or validate the wrong winner.

Possible fix
 func (m *MockManagedQueue) Peek() flowcontrol.QueueItemAccessor {
 	m.mu.Lock()
 	defer m.mu.Unlock()
 	m.init()
-	for _, item := range m.items {
-		return item // Return first item found
-	}
-	return nil // Queue is empty
+	var best flowcontrol.QueueItemAccessor
+	policy := m.OrderingPolicy()
+	for _, item := range m.items {
+		if best == nil {
+			best = item
+			continue
+		}
+		if policy != nil && policy.Less(item, best) {
+			best = item
+		}
+	}
+	return best
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/contracts/mocks/mocks.go` around lines 364 - 373,
`MockManagedQueue.Peek()` is using nondeterministic map iteration instead of
returning the queue head, which can make ordering tests flaky. Update the mock
in `MockManagedQueue.Peek` to select the same item that `Pop`/queue order would
treat as first, using the mock’s existing ordering state rather than ranging
over `m.items`. Keep the behavior aligned with the new `Peek()` contract and
preserve the nil case when the queue is empty.
🟠 Major comments (14)
.github/workflows/ci-release.yaml-55-57 (1)

55-57: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Gate upload-artifacts on build-and-push.

This job only depends on set-params, so it can create or update the GitHub release even when image/chart publication fails. That leaves release state split across systems: GitHub has tagged assets, while the registry publish path failed. Add build-and-push to needs so the release is only mutated after the runtime artifacts succeed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-release.yaml around lines 55 - 57, The upload-artifacts
job is only gated on set-params, so it can update the GitHub release before the
build-and-push path succeeds. Update the ci-release workflow so upload-artifacts
also depends on build-and-push, ensuring the release mutation only happens after
runtime artifact publication completes successfully.
.github/ISSUE_TEMPLATE/new-release.md-73-74 (1)

73-74: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

These commands never switch to the release branch.

git reset --hard ${REMOTE}/release-${BRANCH_VERSION} rewrites whatever branch the operator currently has checked out. If they are still on main, this resets main instead of checking out release-${BRANCH_VERSION}.

Suggested fix
- git fetch ${REMOTE}
- git reset --hard ${REMOTE}/release-${BRANCH_VERSION}
+ git fetch "${REMOTE}"
+ git checkout -B "release-${BRANCH_VERSION}" "${REMOTE}/release-${BRANCH_VERSION}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/ISSUE_TEMPLATE/new-release.md around lines 73 - 74, The release
setup commands currently reset the checked-out branch instead of switching to
the release branch, so update the workflow in the new-release template to
explicitly check out release-${BRANCH_VERSION} after git fetch ${REMOTE}. Adjust
the command sequence in the template section containing git fetch and git reset
so it creates or switches to the intended release branch rather than mutating
whatever branch is active.
.github/workflows/ci-release.yaml-74-91 (1)

74-91: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Move needs.set-params.outputs.tag into env:. ${{ needs.set-params.outputs.tag }} is interpolated directly into run: in both steps, creating a CWE-94 injection boundary for ref-derived data. Use a quoted shell variable instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-release.yaml around lines 74 - 91, The workflow step
uses needs.set-params.outputs.tag directly inside the run script, which creates
an injection risk for ref-derived data. Move the tag value into env for both
Build Artifacts and Upload Release Assets, then reference a quoted shell
variable inside the script instead of interpolating it inline. Update the shell
commands around the artifact build and gh release upload/create logic to use
that env-backed variable consistently.
.github/workflows/release-notes-assemble.yaml-92-106 (1)

92-106: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make the assembly step idempotent before prepending the new section.

This block always rewrites RELEASE-NOTES.md by prepending RELEASE ${TAG}. A manual rerun after the assembly PR has already merged will add the same tag section again instead of cleanly no-oping.

Remediation
           # Phase 3: apply release-notes changes on top of main
+          if grep -q "^RELEASE ${TAG} " RELEASE-NOTES.md 2>/dev/null; then
+            echo "Release notes for ${TAG} already assembled; exiting."
+            exit 0
+          fi
           touch RELEASE-NOTES.md
           cat "$new_section" RELEASE-NOTES.md > RELEASE-NOTES.md.new
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release-notes-assemble.yaml around lines 92 - 106, The
release-notes assembly step in the workflow should be idempotent: before the
`cat "$new_section" RELEASE-NOTES.md > RELEASE-NOTES.md.new` prepend in the
Phase 3 block, check whether `RELEASE-NOTES.md` already contains the `RELEASE
${TAG}` section and skip the rewrite if it does. Update the logic around the
existing `new_section`/`RELEASE-NOTES.md` merge so rerunning the job after the
assembly PR has merged becomes a clean no-op instead of duplicating the same
release section.
.github/workflows/unstale.yaml-31-32 (1)

31-32: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Remove the unused checkout step (CWE-829) at .github/workflows/unstale.yaml:31-32. gh issue edit does not need a repository checkout, so actions/checkout@v7 only adds an avoidable unpinned third-party action to a job with issues: write and pull-requests: write. Delete it; if it must stay, pin it by full SHA.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/unstale.yaml around lines 31 - 32, The unstale workflow
includes an unnecessary repository checkout step, which adds an extra
third-party action without being used by the job. Remove the checkout step from
the workflow job that runs gh issue edit, or if the checkout is still required
for some other reason, update the actions/checkout usage in unstale.yaml to a
full commit SHA. Use the checkout step label and the gh issue edit job context
to locate the change.

Source: Path instructions

pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue.go-159-168 (1)

159-168: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Serialize byteSize with the heap mutation (CWE-362).

Add releases pq.mu before updating pq.byteSize. If Drain or Cleanup acquires the lock between Line 165 and Line 167, the heap can be emptied and byteSize reset to 0, then Add restores a stale byte count on an empty queue. That corrupts max-bytes accounting and can trigger false capacity rejections. Move the byte update into the same critical section as heap.Push.

Proposed fix
 func (pq *priorityQueue) Add(item flowcontrol.QueueItemAccessor) {
 	hi := &heapItem{item: item}
 	item.SetHandle(hi)

 	pq.mu.Lock()
 	heap.Push(pq.heap, hi)
-	pq.mu.Unlock()
-
 	pq.byteSize.Add(item.OriginalRequest().ByteSize())
+	pq.mu.Unlock()
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue.go` around lines
159 - 168, The Add method in priorityQueue updates byteSize outside the pq.mu
critical section, which can race with Drain or Cleanup and leave stale
accounting on an empty heap. Move the pq.byteSize.Add call into the same locked
section as heap.Push in priorityQueue.Add, using the existing pq.mu and heap
mutation path so heap state and byte accounting stay serialized.
pkg/epp/flowcontrol/controller/internal/processor.go-289-296 (1)

289-296: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Refresh endpoint emptiness before classifying capacity rejection.

poolEmpty is only updated in dispatchCycle, but Run calls enqueue before dispatchCycle for new items. Startup or endpoint flaps can use stale state here and return the wrong rejection outcome, mapping no-endpoints to capacity or capacity to no-endpoints.

Proposed fix
-			sp.enqueue(item)
+			sp.enqueue(ctx, item)
 			sp.dispatchCycle(ctx) // Process immediately when an item arrives
-func (sp *Processor) enqueue(item *FlowItem) {
+func (sp *Processor) enqueue(ctx context.Context, item *FlowItem) {
 	if ok, stats := sp.hasCapacity(key.Priority, req.ByteSize()); !ok {
+		pool := sp.endpointCandidates.Locate(ctx, nil)
+		sp.poolEmpty = len(pool) == 0
 		// When the pool has no endpoints, the queue is acting as a scale-from-zero waiting room. A capacity rejection in
 		// that state reflects genuine unavailability (surfaced as 503), not backpressure against a contended pool (429).
 		if sp.poolEmpty {

Update direct unit-test calls to pass context.Background().

Also applies to: 363-364

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/controller/internal/processor.go` around lines 289 - 296,
The rejection path in processor.go is using a stale sp.poolEmpty value inside
enqueue/classification logic, so update the endpoint-emptiness check before
deciding QueueOutcomeRejectedNoEndpoints versus capacity rejection. Refresh the
pool state in the path that handles new items in Run/enqueue, or call the same
state update used by dispatchCycle before the capacity check, so the decision is
based on current endpoint availability. Also update the direct unit-test calls
mentioned in the comment to pass context.Background() where required.
pkg/epp/framework/interface/requesthandling/types.go-114-148 (1)

114-148: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Reject lossy or overflowing token caps before converting to int64. json.Unmarshal gives JSON numbers as float64, so large client caps can round before this helper sees them, and out-of-range values can still hit implementation-dependent float-to-int behavior (CWE-190). Decode with json.Decoder.UseNumber() upstream or bound-check before returning the cap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/framework/interface/requesthandling/types.go` around lines 114 - 148,
MaxOutputTokensFromPayload currently accepts float64/json.Number values and
converts them to int64 without rejecting lossy or out-of-range caps. Update the
parsing path in MaxOutputTokensFromPayload to preserve numeric precision and
verify bounds before conversion, and ensure callers decode request JSON with
json.Decoder.UseNumber where the payload enters this helper. Keep the existing
key precedence and validation behavior, but reject values that cannot be
represented exactly or safely as int64 before returning the cap.
pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go-55-60 (1)

55-60: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Propagate KV index cleanup failures from endpoint deletes.

If KVBlockIndex().Clear() fails here, Extract still returns nil after removing the subscriber. That leaves stale pod entries in the prefix-cache index with no retry signal, so later scheduling can keep reading deleted endpoint metadata until some unrelated event rebuilds the index.

Proposed fix
 	case fwkdl.EventDelete:
 		p.subscribersManager.RemoveSubscriber(ctx, endpointKey)
 		if meta.Address != "" {
 			if err := p.kvCacheIndexer.KVBlockIndex().Clear(ctx, fmt.Sprintf("%s:%s", meta.Address, meta.Port)); err != nil {
 				logger.Error(err, "Failed to clear index entries for removed endpoint",
 					"endpoint", endpointKey, "address", meta.Address, "port", meta.Port)
+				return fmt.Errorf("clear kv index for %s: %w", endpointKey, err)
 			}
 		}
 		logger.V(logging.DEBUG).Info("Removed KV-events subscriber", "endpoint", endpointKey)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go`
around lines 55 - 60, Propagate the KV index cleanup failure from the
endpoint-delete path in Extract instead of only logging it. In the block that
calls p.kvCacheIndexer.KVBlockIndex().Clear(), return the error (or wrap it with
context) so the caller sees the failure after the subscriber is removed, and
keep the existing logger.Error call for observability. Use the Extract method
and KVBlockIndex().Clear as the key symbols to update.
scripts/kind-dev-env.sh-289-289 (1)

289-289: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Quote CONTAINER_RUNTIME before executing it.

Line 289 runs ${CONTAINER_RUNTIME} unquoted. If that environment variable is tampered with, the bootstrap path can execute unintended shell syntax during cluster setup (CWE-78). Use "${CONTAINER_RUNTIME}" exec ... here.

As per path instructions, "**/*.sh: Quote all variables to prevent injection".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/kind-dev-env.sh` at line 289, The cluster bootstrap command in the
inotify check is using CONTAINER_RUNTIME without quotes, which can allow shell
injection if the variable is manipulated. Update the command in the script’s
inotify_limit check to use the quoted CONTAINER_RUNTIME invocation style,
keeping the exec call to CONTAINER_NAME and the
/proc/sys/fs/inotify/max_user_instances read intact.

Source: Path instructions

scripts/kind-dev-env.sh-409-412 (1)

409-412: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Do not overwrite caller-provided HA settings.

These exports force every invocation back to EPP_REPLICA_COUNT=1 and ENABLE_LEADER_ELECTION=false, which disables the HA path this cohort is trying to exercise. Default them only when unset.

Proposed fix
-export EPP_REPLICA_COUNT=1
+export EPP_REPLICA_COUNT="${EPP_REPLICA_COUNT:-1}"
-# Some end to end tests enable leader election
-export ENABLE_LEADER_ELECTION=false
+# Some end to end tests enable leader election
+export ENABLE_LEADER_ELECTION="${ENABLE_LEADER_ELECTION:-false}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/kind-dev-env.sh` around lines 409 - 412, The HA defaults in
kind-dev-env.sh are being forced on every run, which overwrites caller-provided
settings. Update the setup around the EPP_REPLICA_COUNT and
ENABLE_LEADER_ELECTION exports so they only apply when those variables are
unset, preserving any values passed in by the caller while still providing
defaults for local runs.
deploy/components/inference-gateway/rbac.yaml-71-73 (1)

71-73: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Scope the lease RBAC to least privilege.

Lines 71-73 grant the EPP ServiceAccount write access to every Lease in the namespace, including delete. If that identity is abused, an attacker can seize or remove other controllers' locks and force failover or denial of service (CWE-732). Restrict the rule to the single leader-election lease name and drop delete unless the client library proves it needs it.

Remediation sketch
 - apiGroups: [ "coordination.k8s.io" ]
   resources: [ "leases" ]
-  verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]
+  verbs: [ "get", "create", "update", "patch" ]
+  # If the lock name is fixed, also scope this rule with resourceNames.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/components/inference-gateway/rbac.yaml` around lines 71 - 73, The RBAC
rule for lease access in the inference-gateway config is too broad; update the
Lease permissions on the relevant Role/RoleBinding entry to follow least
privilege. In the RBAC manifest, scope the leases rule used by the EPP
ServiceAccount to the single leader-election Lease name via resourceNames, and
remove delete unless the leader-election client in this codebase explicitly
requires it. Use the existing lease rule block in rbac.yaml to locate and
tighten the permissions.
test/scripts/test-e2e-gaie.sh-28-28 (1)

28-28: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Pin KIND_NODE_IMAGE by digest.
kindest/node:v1.31.12 is a mutable reference (CWE-829); a retagged image can swap the Kubernetes node runtime for every E2E cluster. Use kindest/node@sha256:... instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/scripts/test-e2e-gaie.sh` at line 28, The KIND node image reference in
the E2E script is mutable and should be pinned to an immutable digest. Update
the KIND_NODE_IMAGE default in test-e2e-gaie.sh to use the kindest/node image by
sha256 digest instead of the v1.31.12 tag, so the cluster setup in the script
always pulls the exact intended node image.
test/e2e/e2e_test.go-69-149 (1)

69-149: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Register cleanup before the first assertion.

These specs only clean up at the tail. Any failed Expect or timeout before Lines 71, 87, 106, or 146 leaves EPP or model-server objects behind, and the next ordered spec can see the wrong pod count or stale metrics. Use ginkgo.DeferCleanup immediately after each create call.

Remediation sketch
 modelServers := createModelServersDecode(1)
+ginkgo.DeferCleanup(testutils.DeleteObjects, testConfig, modelServers)

 epp := createEndPointPickerHelper(simpleConfig, numOfPods, true, false)
+ginkgo.DeferCleanup(testutils.DeleteObjects, testConfig, epp)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/e2e_test.go` around lines 69 - 149, Register cleanup immediately
after each resource creation in these Ginkgo specs so failures before the tail
cleanup do not leak objects into later ordered tests. In the affected test
cases, add ginkgo.DeferCleanup right after createInferencePool,
createModelServersDecode, createEndPointPicker, and createEndPointPickerHelper,
and remove reliance on the final testutils.DeleteObjects calls for cleanup. This
keeps the generated resources tied to the lifecycle of each spec and avoids
stale pod counts or metrics between runs.
🟡 Minor comments (11)
docs/metrics.md-49-55 (1)

49-55: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Add RejectedNoEndpoints to the outcome label contract.

The new empty-pool 503 path introduces a terminal outcome that is missing from this metric definition. Consumers will miss a real bucket unless the docs match the implemented label set.

Suggested edit
-    *   `outcome`: string (`Dispatched`, `RejectedCapacity`, `RejectedOther`, `EvictedTTL`, `EvictedContextCancelled`, `EvictedOther`)
+    *   `outcome`: string (`Dispatched`, `RejectedCapacity`, `RejectedNoEndpoints`, `RejectedOther`, `EvictedTTL`, `EvictedContextCancelled`, `EvictedOther`)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/metrics.md` around lines 49 - 55, The
`flow_control_request_queue_duration_seconds` metric label contract is missing
the new terminal outcome introduced by the empty-pool 503 path. Update the
`outcome` label list in `docs/metrics.md` to include `RejectedNoEndpoints`
alongside the existing values so the documented contract matches the implemented
labels.
AGENTS.md-33-33 (1)

33-33: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Fix the typo in the PR checklist.

passs should be passes.

Remediation
-- Verify the code passs `make presubmit` locally before submitting a PR.
+- Verify the code passes `make presubmit` locally before submitting a PR.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@AGENTS.md` at line 33, The PR checklist text in AGENTS.md has a typo in the
presubmit verification item; update the checklist entry so it says the code
passes make presubmit locally before submitting a PR, correcting the misspelling
in that sentence.

Source: Linters/SAST tools

pkg/epp/config/loader/configloader.go-316-324 (1)

316-324: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject malformed feature-gate overrides. CWE-20: strings.Split accepts name=true=false and drops the trailing segment, so a malformed override is treated as valid input. Use strings.Cut or SplitN(..., 2) and reject any value containing a second =.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/config/loader/configloader.go` around lines 316 - 324, The
feature-gate parsing in config loader is too permissive because the current
string split accepts malformed overrides like name=true=false and silently
truncates them. Update the parsing in configloader.go to use a two-part split
via strings.Cut or strings.SplitN(..., 2) in the gate handling logic, and
explicitly reject any override whose value still contains another = so malformed
input returns an error instead of being accepted.
pkg/epp/config/loader/configloader_test.go-79-80 (1)

79-80: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Do not mutate the global feature-gate registry from a parallel test.

The writes on Lines 79-80 make the existing t.Parallel() on Line 76 unsafe. This file already keeps TestInstantiateAndConfigure serial for another global registry, so TestLoadRawConfiguration can become order-dependent as more loader tests are added. Make this test serial or snapshot and restore the feature-gate registry.

Proposed fix
 func TestLoadRawConfiguration(t *testing.T) {
-	t.Parallel()
-
 	// Register known feature gates for validation.
 	RegisterFeatureGate(testFeatureGate, true)
 	RegisterFeatureGate(flowcontrol.FeatureGate, false)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/config/loader/configloader_test.go` around lines 79 - 80,
TestLoadRawConfiguration is mutating the global feature-gate registry via
RegisterFeatureGate while also running with t.Parallel, which makes the test
order-dependent. Make this test serial by removing t.Parallel, or snapshot and
restore the feature-gate registry around the RegisterFeatureGate calls so
parallel execution does not leak global state. Use the existing
TestLoadRawConfiguration setup and its RegisterFeatureGate calls as the place to
contain this mutation.
pkg/epp/config/loader/validation.go-125-133 (1)

125-133: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject malformed name=value feature-gate entries with extra =.

Line 125 uses strings.Split, so flowControl=true=garbage passes validation because only parts[1] is parsed as a bool. That lets malformed config through this phase and makes downstream feature-gate handling ambiguous. Use strings.Cut or explicitly reject more than one = and add a regression case next to errorBadFeatureGateText.

Proposed fix
-		parts := strings.Split(gate, "=")
-		if _, ok := registeredFeatureGates[parts[0]]; !ok {
-			return fmt.Errorf("feature gate '%s' is unknown or unregistered", parts[0])
+		name, value, hasValue := strings.Cut(gate, "=")
+		name = strings.TrimSpace(name)
+		if _, ok := registeredFeatureGates[name]; !ok {
+			return fmt.Errorf("feature gate '%s' is unknown or unregistered", name)
 		}
-		if len(parts) > 1 {
-			_, err := strconv.ParseBool(strings.ToLower(strings.TrimSpace(parts[1])))
+		if hasValue {
+			value = strings.TrimSpace(value)
+			if strings.Contains(value, "=") {
+				return fmt.Errorf("feature gate '%s' has an invalid value %q", name, value)
+			}
+			_, err := strconv.ParseBool(value)
 			if err != nil {
-				return fmt.Errorf("%s is not a valid value for the feature gate %s (error: %w)", parts[1], parts[0], err)
+				return fmt.Errorf("%s is not a valid value for the feature gate %s (error: %w)", value, name, err)
 			}
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/config/loader/validation.go` around lines 125 - 133, The feature-gate
validation in validation.go accepts malformed name=value entries because
strings.Split allows extra "=" segments, so fix the parsing in the feature-gate
loop to reject any gate with more than one "=" or switch to strings.Cut and fail
when a second separator is present. Update the validation logic around the
registeredFeatureGates check and strconv.ParseBool handling so only exactly one
optional value is allowed, and add a regression test alongside
errorBadFeatureGateText for inputs like flowControl=true=garbage.
pkg/epp/requestcontrol/admission.go-230-232 (1)

230-232: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Avoid duplicating the no-endpoints message.

If err is types.ErrNoEndpoints, Line 232 returns no endpoints available: no endpoints available. Reuse msg as-is for that sentinel, or only prepend when err carries extra detail.

Suggested fix
 	case types.QueueOutcomeRejectedNoEndpoints:
 		// No serving capacity exists (e.g. pool scaled to zero): signal genuine unavailability rather than backpressure.
-		return errcommon.Error{Code: errcommon.ServiceUnavailable, Msg: "no endpoints available: " + msg, Headers: map[string]string{errcommon.RequestDroppedReasonHeaderKey: string(errcommon.RequestDroppedReasonNoEndpoints)}}
+		if errors.Is(err, types.ErrNoEndpoints) {
+			return errcommon.Error{Code: errcommon.ServiceUnavailable, Msg: msg, Headers: map[string]string{errcommon.RequestDroppedReasonHeaderKey: string(errcommon.RequestDroppedReasonNoEndpoints)}}
+		}
+		return errcommon.Error{Code: errcommon.ServiceUnavailable, Msg: "no endpoints available: " + msg, Headers: map[string]string{errcommon.RequestDroppedReasonHeaderKey: string(errcommon.RequestDroppedReasonNoEndpoints)}}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/requestcontrol/admission.go` around lines 230 - 232, The
`QueueOutcomeRejectedNoEndpoints` branch in the admission error mapping is
duplicating the no-endpoints text when `err` is `types.ErrNoEndpoints`. Update
this case so it uses `msg` directly for the sentinel, and only prefixes with “no
endpoints available:” when the incoming error carries additional detail. Keep
the existing `errcommon.Error` construction and
`RequestDroppedReasonNoEndpoints` header unchanged.
pkg/epp/flowcontrol/framework/plugins/queue/functional_test.go-510-512 (1)

510-512: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Remove the non-atomic Len/Peek assertion.

Another goroutine can remove the peeked item between Line 510 and Line 511, making Len() == 0 while peeked is still non-nil. This makes the concurrency test flaky.

Proposed fix
-								peeked := q.Peek()
-								if q.Len() == 0 {
-									assert.Nil(t, peeked, "Peek on empty queue expected nil")
-								}
+								_ = q.Peek()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/framework/plugins/queue/functional_test.go` around lines
510 - 512, The concurrency test in the queue functional test is relying on a
non-atomic Len/Peek check, which can become flaky if another goroutine modifies
the queue between the two calls. Update the assertion around q.Peek() and
q.Len() in the test to avoid depending on their combined state, and instead
verify Peek behavior in a way that does not assume no concurrent removal. Use
the q.Peek and q.Len calls in this test block as the location to adjust the
check.
test/e2e/e2e_suite_test.go-32-33 (1)

32-33: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Do not hardcode the EPP app label.

eppName is fixed to e2e-epp, but the deployed pod label comes from ${EPP_NAME}. Any suite run that overrides EPP_NAME will stop finding the leader pod and fail the HA assertions. Read the same env var here, with e2e-epp only as the default.

Proposed fix
-	eppName = "e2e-epp"
+	eppName = env.GetEnvString("EPP_NAME", "e2e-epp", ginkgo.GinkgoLogr)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/e2e_suite_test.go` around lines 32 - 33, The eppName test constant
is hardcoded to e2e-epp, which can break leader-pod discovery when EPP_NAME is
overridden in the environment. Update the e2e suite setup around eppName to read
the same EPP_NAME env var used by the deployment, and keep e2e-epp only as the
fallback default. Make the change in the test initialization where eppName is
defined so the HA assertions and pod label matching stay aligned with the
deployed app label.
test/e2e/requests_test.go-292-295 (1)

292-295: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the malformed requests fail or move the error counters.

These calls are not validated today. The helper only checks metric-family presence, so it still passes if invalid bodies start returning 200 or stop incrementing the error series. Assert statusCode >= 400 here, or compare the pre/post values of the request-error counters before accepting the scrape.

Also applies to: 365-367

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/requests_test.go` around lines 292 - 295, The malformed request loop
in requests_test.go is only exercising the helper without verifying the outcome,
so it can pass even if invalid bodies stop failing or no longer affect error
metrics. Update the relevant calls around the chat completions error cases
(including the second occurrence noted in the comment) to assert each response
has statusCode >= 400, or capture the request-error counter values before and
after the loop and verify they increase. Use the existing doPostWithError helper
and the surrounding test flow in requests_test.go to anchor the fix.
test/e2e/setup_test.go-139-148 (1)

139-148: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Bound the readiness probe with an HTTP timeout.

A blocked http.Get can hang longer than readyTimeout, because Eventually cannot interrupt a single stuck request. Use a short-lived client or per-request context here so a wedged gateway does not freeze the suite.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/setup_test.go` around lines 139 - 148, The gateway readiness probe
in the test setup currently uses an unbounded http.Get, which can block past
readyTimeout and stall the suite. Update the readiness check in the Eventually
block to use a short-lived HTTP client or a per-request context timeout around
the request, then keep the existing status/body readiness logic and cleanup in
place so the probe fails fast if the gateway is wedged.
test/e2e/utils_test.go-197-205 (1)

197-205: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Add timeouts to the new network helpers.

Both helpers use the default client with no deadline. One stalled Hugging Face request or half-open metrics scrape can block forever and bypass the surrounding retry logic.

Proposed fix
+var e2eHTTPClient = &http.Client{Timeout: 10 * time.Second}
+
 func isModelReal(modelName string) bool {
 	req, err := http.NewRequest("GET", "https://huggingface.co/api/models/"+modelName, nil)
 	if err != nil {
 		return false
 	}
 	if token := os.Getenv("HF_TOKEN"); token != "" {
 		req.Header.Set("Authorization", "Bearer "+token)
 	}
 
-	resp, err := http.DefaultClient.Do(req)
+	resp, err := e2eHTTPClient.Do(req)
 	if err != nil {
 		return false
 	}
 	defer resp.Body.Close()
@@
 func getMetrics(metricsURL string) []string {
 	var body []byte
 	gomega.Eventually(func() error {
-		resp, err := http.Get(metricsURL)
+		resp, err := e2eHTTPClient.Get(metricsURL)
 		if err != nil {
 			return err
 		}

Also applies to: 285-298

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/utils_test.go` around lines 197 - 205, The new network helpers are
using http.DefaultClient without any deadline, so a stalled Hugging Face request
or metrics scrape can hang indefinitely and skip retries. Update the helper(s)
that make these calls to use a client with a timeout or per-request context
deadline instead of DefaultClient, and apply the same fix to both request paths
referenced by the tests. Keep the existing request construction and token header
logic intact while ensuring the network call always fails fast.
🧹 Nitpick comments (4)
cmd/epp/runner/runner.go (1)

290-290: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Remove the inline mode paraphrase.

The comment repeats the condition and is too long for an inline note.

Proposed fix
-	startCrdReconcilers := opts.EndpointSelector == nil // If endpointSelector is nil, it means it's not in the standalone mode. Then we should start the inferencePool and other CRD Reconciler.
+	// CRD reconcilers own pool discovery outside standalone mode.
+	startCrdReconcilers := opts.EndpointSelector == nil

As per coding guidelines, "Write comments tersely and only when the reason is non-obvious; do not paraphrase the code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/epp/runner/runner.go` at line 290, The comment on startCrdReconcilers in
runner.go is too long and just restates the nil-check, so remove the inline
paraphrase and replace it with a terse note only if needed. Keep the explanation
focused on the non-obvious reason for starting the CRD reconcilers in
non-standalone mode, and do not repeat the condition already expressed in the
code.

Source: Coding guidelines

pkg/epp/server/options.go (1)

76-76: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Keep selector parsing facts in one location.

Line 76 and Line 116 both document the --endpoint-selector/Complete() parsing path. Keep the public field comment canonical and make the internal field comment terse.

Proposed fix
-	EndpointSelector            labels.Selector // Parsed selector to filter model server pods on. Set via --endpoint-selector flag and parsed in Complete().
+	EndpointSelector            labels.Selector // Selector to filter model server pods.
 	EndpointTargetPorts         []int           // Target ports of model server pods.
 	DisableEndpointSubsetFilter bool            // Disables respecting destination endpoint subset metadata in EPP.
@@
-	fs                  *pflag.FlagSet // FlagSet used in AddFlags() and consulted in Validate()
-	endpointSelectorStr string         // Raw string from --endpoint-selector flag, parsed to EndpointSelector in Complete()
+	fs                  *pflag.FlagSet // FlagSet used in AddFlags() and consulted in Validate()
+	endpointSelectorStr string         // Raw --endpoint-selector value.

As per coding guidelines, "State each fact once in its canonical location; do not duplicate the same information across struct docs, prose, tables, inline comments, and examples."

Also applies to: 115-116

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/server/options.go` at line 76, The `EndpointSelector` docs repeat the
same `--endpoint-selector` parsing details in both the public field comment and
the internal `Complete()`-related comment; keep the full parsing explanation
only on the canonical `EndpointSelector` comment and make the
`Complete`/internal field comment terse so each fact is stated once. Update the
comments in `options.go` around `EndpointSelector` and `Complete()` to remove
duplicated wording while preserving the core meaning.

Source: Coding guidelines

pkg/epp/flowcontrol/types/errors.go (1)

49-52: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Collapse this duplicated rationale into one canonical doc site.

The scale-from-zero and 503-vs-429 explanation is repeated here and again on QueueOutcomeRejectedNoEndpoints. Keep the sentinel error comment terse and leave the longer behavior note in one place. As per coding guidelines, "Write comments tersely and only when the reason is non-obvious; do not paraphrase the code." and "State each fact once in its canonical location; do not duplicate the same information across struct docs, prose, tables, inline comments, and examples."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/flowcontrol/types/errors.go` around lines 49 - 52, The comment on
ErrNoEndpoints duplicates the longer scale-from-zero and 503-vs-429 rationale
that already belongs with QueueOutcomeRejectedNoEndpoints. Keep the
ErrNoEndpoints sentinel comment brief and non-redundant, and move or retain the
detailed behavior explanation only in the canonical
QueueOutcomeRejectedNoEndpoints documentation so the same fact is stated once.

Source: Coding guidelines

pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.go (1)

144-159: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Trim the new comment blocks.

These comments restate the implementation in detail. Keep only the invariant so the comments do not become a second copy of the code. As per coding guidelines, "Write comments tersely and only when the reason is non-obvious; do not paraphrase the code."

Also applies to: 637-674

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.go`
around lines 144 - 159, Trim the new comment blocks in addedTokensEntry and
Clone so they only state the non-obvious invariant instead of restating the
implementation. Update the comments near tokenCounter, requestCounter, requests,
and the Clone method to be terse, focusing on the
shared-counter/independent-clone behavior rather than describing how
increment/release works step by step.

Source: Coding guidelines


ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b39aa87a-a414-4894-b899-95f2e37babcb

📥 Commits

Reviewing files that changed from the base of the PR and between b750ef4 and 1b55af3.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum, !**/*.sum
📒 Files selected for processing (159)
  • .github/ISSUE_TEMPLATE/new-release.md
  • .github/workflows/check-typos.yaml
  • .github/workflows/ci-build-images.yaml
  • .github/workflows/ci-dependency-review.yaml
  • .github/workflows/ci-lint.yaml
  • .github/workflows/ci-pr-checks.yaml
  • .github/workflows/ci-release.yaml
  • .github/workflows/md-link-check.yml
  • .github/workflows/release-notes-assemble.yaml
  • .github/workflows/release-notes-update.yaml
  • .github/workflows/unstale.yaml
  • AGENTS.md
  • Makefile
  • RELEASE-NOTES.md
  • cmd/epp/runner/runner.go
  • deploy/components/inference-gateway/deployment.yaml
  • deploy/components/inference-gateway/rbac.yaml
  • deploy/components/vllm-decode/deployment.yaml
  • deploy/environments/dev/base-kind-istio/patch-deployments.yaml
  • deploy/environments/dev/e-p-d/patch-decode.yaml
  • deploy/environments/dev/e-pd/patch-decode.yaml
  • deploy/environments/dev/p-d/patch-decode.yaml
  • docs/metrics.md
  • go.mod
  • hack/update-codegen.sh
  • pkg/common/error/error.go
  • pkg/epp/config/loader/configloader.go
  • pkg/epp/config/loader/configloader_test.go
  • pkg/epp/config/loader/testdata_test.go
  • pkg/epp/config/loader/validation.go
  • pkg/epp/controller/inferencepool_reconciler_test.go
  • pkg/epp/datalayer/endpoint_pool.go
  • pkg/epp/datalayer/factory.go
  • pkg/epp/datastore/datastore.go
  • pkg/epp/datastore/datastore_test.go
  • pkg/epp/flowcontrol/contracts/mocks/mocks.go
  • pkg/epp/flowcontrol/controller/internal/processor.go
  • pkg/epp/flowcontrol/controller/internal/processor_test.go
  • pkg/epp/flowcontrol/framework/plugins/queue/benchmark_test.go
  • pkg/epp/flowcontrol/framework/plugins/queue/functional_test.go
  • pkg/epp/flowcontrol/framework/plugins/queue/listqueue.go
  • pkg/epp/flowcontrol/framework/plugins/queue/maxminheap.go
  • pkg/epp/flowcontrol/framework/plugins/queue/maxminheap_test.go
  • pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue.go
  • pkg/epp/flowcontrol/framework/plugins/queue/priorityqueue_test.go
  • pkg/epp/flowcontrol/integration_helpers_test.go
  • pkg/epp/flowcontrol/integration_test.go
  • pkg/epp/flowcontrol/registry/config.go
  • pkg/epp/flowcontrol/registry/config_test.go
  • pkg/epp/flowcontrol/registry/managedqueue.go
  • pkg/epp/flowcontrol/registry/managedqueue_test.go
  • pkg/epp/flowcontrol/types/errors.go
  • pkg/epp/flowcontrol/types/outcomes.go
  • pkg/epp/framework/interface/flowcontrol/flow.go
  • pkg/epp/framework/interface/flowcontrol/mocks/mocks.go
  • pkg/epp/framework/interface/flowcontrol/queue.go
  • pkg/epp/framework/interface/requesthandling/types.go
  • pkg/epp/framework/interface/requesthandling/types_test.go
  • pkg/epp/framework/plugins/flowcontrol/fairness/functional_test.go
  • pkg/epp/framework/plugins/flowcontrol/fairness/globalstrict/global_strict.go
  • pkg/epp/framework/plugins/flowcontrol/fairness/globalstrict/global_strict_test.go
  • pkg/epp/framework/plugins/flowcontrol/fairness/program-aware/plugin.go
  • pkg/epp/framework/plugins/flowcontrol/fairness/program-aware/plugin_test.go
  • pkg/epp/framework/plugins/flowcontrol/fairness/program-aware/strategy_las.go
  • pkg/epp/framework/plugins/flowcontrol/fairness/program-aware/strategy_las_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/README.md
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/token_estimator.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/token_estimator_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/extractor_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/producer_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/metrics.go
  • pkg/epp/framework/plugins/requesthandling/parsers/anthropic/anthropic.go
  • pkg/epp/framework/plugins/requesthandling/parsers/anthropic/anthropic_test.go
  • pkg/epp/framework/plugins/requesthandling/parsers/openai/openai.go
  • pkg/epp/framework/plugins/requesthandling/parsers/openai/openai_test.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vllmgrpc/vllmgrpc.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vllmgrpc/vllmgrpc_test.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp_test.go
  • pkg/epp/metrics/metrics.go
  • pkg/epp/metrics/testdata/input_tokens_metric
  • pkg/epp/metrics/testdata/normalized_time_per_output_token_seconds_metric
  • pkg/epp/metrics/testdata/output_tokens_metric
  • pkg/epp/metrics/testdata/request_sizes_metric
  • pkg/epp/metrics/testdata/response_sizes_metric
  • pkg/epp/metrics/testdata/running_requests_metrics
  • pkg/epp/requestcontrol/admission.go
  • pkg/epp/requestcontrol/admission_test.go
  • pkg/epp/server/options.go
  • pkg/epp/util/pool/pool.go
  • pkg/sidecar/proxy/options.go
  • pkg/sidecar/proxy/options_test.go
  • release-notes.d/unreleased/1030.md
  • release-notes.d/unreleased/1121.md
  • release-notes.d/unreleased/1160.md
  • release-notes.d/unreleased/1206.md
  • release-notes.d/unreleased/1218.md
  • release-notes.d/unreleased/1234.md
  • release-notes.d/unreleased/1244.md
  • release-notes.d/unreleased/1247.md
  • release-notes.d/unreleased/1248.md
  • release-notes.d/unreleased/1276.md
  • release-notes.d/unreleased/1288.md
  • release-notes.d/unreleased/1302.md
  • release-notes.d/unreleased/1372.md
  • release-notes.d/unreleased/1402.md
  • release-notes.d/unreleased/1418.md
  • release-notes.d/unreleased/1426.md
  • release-notes.d/unreleased/1429.md
  • release-notes.d/unreleased/1436.md
  • release-notes.d/unreleased/1444.md
  • release-notes.d/unreleased/1449.md
  • release-notes.d/unreleased/1475.md
  • release-notes.d/unreleased/1488.md
  • release-notes.d/unreleased/1493.md
  • release-notes.d/unreleased/1509.md
  • release-notes.d/unreleased/1513.md
  • release-notes.d/unreleased/1515.md
  • release-notes.d/unreleased/1536.md
  • release-notes.d/unreleased/1539.md
  • release-notes.d/unreleased/1548.md
  • release-notes.d/unreleased/1550.md
  • release-notes.d/unreleased/1554.md
  • release-notes.d/unreleased/1575.md
  • release-notes.d/unreleased/1603.md
  • release-notes.d/unreleased/1607.md
  • release-notes.d/unreleased/1608.md
  • release-notes.d/unreleased/1616.md
  • release-notes.d/unreleased/1626.md
  • release-notes.d/unreleased/1631.md
  • release-notes.d/unreleased/1651.md
  • release-notes.d/unreleased/1653.md
  • release-notes.d/unreleased/1661.md
  • release-notes.d/unreleased/1667.md
  • release-notes.d/unreleased/1690.md
  • release-notes.d/unreleased/1728.md
  • release-notes.d/unreleased/1731.md
  • release-notes.d/unreleased/1739.md
  • release-notes.d/unreleased/1750.md
  • release-notes.d/unreleased/1751.md
  • scripts/kind-dev-env.sh
  • test/e2e/e2e_suite_test.go
  • test/e2e/e2e_test.go
  • test/e2e/epp/utils_test.go
  • test/e2e/requests_test.go
  • test/e2e/setup_test.go
  • test/e2e/utils_test.go
  • test/integration/epp/grpc_test.go
  • test/integration/epp/harness.go
  • test/integration/epp/hermetic_test.go
  • test/scripts/test-e2e-gaie.sh
  • test/sidecar/config/nixl/qwen-decoder-pod.yaml
  • test/testdata/configloader_1_test.yaml
  • test/utils/igw/utils.go
  • test/utils/k8s_objects.go
  • version/version.go
💤 Files with no reviewable changes (48)
  • release-notes.d/unreleased/1436.md
  • release-notes.d/unreleased/1513.md
  • release-notes.d/unreleased/1536.md
  • release-notes.d/unreleased/1206.md
  • release-notes.d/unreleased/1302.md
  • release-notes.d/unreleased/1653.md
  • release-notes.d/unreleased/1509.md
  • release-notes.d/unreleased/1539.md
  • release-notes.d/unreleased/1418.md
  • release-notes.d/unreleased/1449.md
  • release-notes.d/unreleased/1550.md
  • release-notes.d/unreleased/1575.md
  • release-notes.d/unreleased/1429.md
  • release-notes.d/unreleased/1244.md
  • release-notes.d/unreleased/1444.md
  • release-notes.d/unreleased/1218.md
  • release-notes.d/unreleased/1426.md
  • release-notes.d/unreleased/1661.md
  • release-notes.d/unreleased/1276.md
  • release-notes.d/unreleased/1603.md
  • release-notes.d/unreleased/1234.md
  • release-notes.d/unreleased/1372.md
  • release-notes.d/unreleased/1288.md
  • release-notes.d/unreleased/1493.md
  • release-notes.d/unreleased/1515.md
  • deploy/environments/dev/p-d/patch-decode.yaml
  • release-notes.d/unreleased/1488.md
  • deploy/environments/dev/e-pd/patch-decode.yaml
  • deploy/environments/dev/e-p-d/patch-decode.yaml
  • release-notes.d/unreleased/1475.md
  • deploy/components/vllm-decode/deployment.yaml
  • release-notes.d/unreleased/1631.md
  • release-notes.d/unreleased/1121.md
  • release-notes.d/unreleased/1626.md
  • release-notes.d/unreleased/1608.md
  • release-notes.d/unreleased/1247.md
  • release-notes.d/unreleased/1030.md
  • release-notes.d/unreleased/1160.md
  • release-notes.d/unreleased/1548.md
  • release-notes.d/unreleased/1554.md
  • release-notes.d/unreleased/1607.md
  • release-notes.d/unreleased/1248.md
  • test/sidecar/config/nixl/qwen-decoder-pod.yaml
  • pkg/epp/flowcontrol/framework/plugins/queue/maxminheap.go
  • release-notes.d/unreleased/1651.md
  • release-notes.d/unreleased/1402.md
  • pkg/epp/datalayer/factory.go
  • pkg/epp/flowcontrol/framework/plugins/queue/maxminheap_test.go

Comment on lines 19 to +20
- name: Checkout code
uses: actions/checkout@v6
uses: actions/checkout@v7

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

Pin actions/checkout by full SHA and set persist-credentials: false. actions/checkout@v7 is a movable ref (CWE-829; same tag-move risk class as CVE-2025-30066), and the default checkout behavior leaves ${{ github.token }} in .git/config (CWE-200).

🧰 Tools
🪛 zizmor (1.26.1)

[warning] 19-20: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/check-typos.yaml around lines 19 - 20, The checkout step
uses a movable actions ref and leaves the token persisted, so update the
workflow’s Checkout code step to use a full SHA for actions/checkout instead of
the version tag, and set persist-credentials to false. Make the change in the
workflow job that runs the checkout so the action reference is pinned and
credentials are not stored in .git/config.

Comment on lines 32 to +33
- name: Checkout source
uses: actions/checkout@v6
uses: actions/checkout@v7

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file=".github/workflows/ci-build-images.yaml"

echo "== file size =="
wc -l "$file"

echo
echo "== relevant lines =="
nl -ba "$file" | sed -n '1,180p'

echo
echo "== checkout occurrences =="
rg -n "actions/checkout|persist-credentials|uses: \./|uses: .*\.github" "$file"

Repository: opendatahub-io/llm-d-router

Length of output: 287


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file=".github/workflows/ci-build-images.yaml"

echo "== lines 1-140 with numbers =="
awk '{printf "%4d  %s\n", NR, $0}' "$file" | sed -n '1,140p'

echo
echo "== direct matches =="
grep -nE 'actions/checkout|persist-credentials|uses: \./|uses: .*\.github' "$file" || true

Repository: opendatahub-io/llm-d-router

Length of output: 5290


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "persist-credentials|actions/checkout@" .github/workflows .github/actions || true

Repository: opendatahub-io/llm-d-router

Length of output: 1574


🌐 Web query:

actions/checkout persist-credentials default true documentation

💡 Result:

The persist-credentials input for the actions/checkout GitHub Action has a default value of true [1][2][3]. When set to true (the default), the action configures the authentication token or SSH key used for the checkout into the local git configuration [1][3]. This allows subsequent git commands executed within your workflow scripts to run authenticated [1][4]. If you wish to opt out of this behavior—for example, for security reasons if you do not want the credentials to persist in the git configuration—you must explicitly set persist-credentials to false in your workflow file [1][5]. Regardless of this setting, the action ensures that credentials are removed during the post-job cleanup process [1][4].

Citations:


Pin the three checkout steps by SHA and disable persisted credentials.
actions/checkout@v7 is a movable tag in all three jobs, so a retargeted tag can inject code into this workflow (CWE-829). persist-credentials defaults to true, leaving the token in .git/config for later steps; set it to false (CWE-200).

🧰 Tools
🪛 zizmor (1.26.1)

[warning] 32-33: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-build-images.yaml around lines 32 - 33, Update each
Checkout source step in the workflow to use a fixed commit SHA instead of the
movable actions/checkout@v7 tag, and set persist-credentials to false on all
three checkout steps. Make the change in every job that uses the checkout action
so the workflow consistently pins the action and does not leave the token in
.git/config.

Sources: Path instructions, Linters/SAST tools

Comment on lines 15 to +16
- name: Checkout source
uses: actions/checkout@v6
uses: actions/checkout@v7

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

sed -n '1,120p' .github/workflows/ci-dependency-review.yaml

Repository: opendatahub-io/llm-d-router

Length of output: 655


Pin checkout and disable persisted credentials

actions/checkout@v7 is a mutable tag (CWE-829). Its default auth setup also writes ${{ github.token }} into the local git config (CWE-200). Pin it by full SHA and set persist-credentials: false here.

🧰 Tools
🪛 zizmor (1.26.1)

[warning] 15-16: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-dependency-review.yaml around lines 15 - 16, The
checkout step is using a mutable action tag and leaving persisted credentials
enabled. Update the Checkout source step in the workflow to use a full
SHA-pinned reference for actions/checkout, and set persist-credentials to false
in that same step so the default token is not written into local git config.

Sources: Path instructions, Linters/SAST tools

Comment on lines 36 to +37
- name: Checkout source
uses: actions/checkout@v6
uses: actions/checkout@v7

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the workflow file with line numbers around the referenced lines.
sed -n '1,120p' .github/workflows/ci-lint.yaml | cat -n

# Locate all checkout usages in this workflow.
rg -n 'uses:\s*actions/checkout@' .github/workflows/ci-lint.yaml

Repository: opendatahub-io/llm-d-router

Length of output: 4021


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# List all action references in the workflow.
rg -n 'uses:\s*' .github/workflows/ci-lint.yaml

# Show surrounding lines for every action reference.
sed -n '30,110p' .github/workflows/ci-lint.yaml | cat -n

Repository: opendatahub-io/llm-d-router

Length of output: 3177


Pin every workflow action by full SHA. actions/checkout@v7 (both steps), dorny/paths-filter@v4, actions/setup-go@v6, and actions/cache@v5 are mutable tags and widen the CI supply-chain attack surface (CWE-829 / CVE-2025-30066 class). Replace each uses: reference with a vetted commit SHA.

🧰 Tools
🪛 zizmor (1.26.1)

[warning] 36-37: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-lint.yaml around lines 36 - 37, The workflow currently
uses mutable action tags, so update every `uses:` reference in this CI config to
a vetted full commit SHA. Replace the `actions/checkout`, `dorny/paths-filter`,
`actions/setup-go`, and `actions/cache` entries with pinned SHAs, including both
checkout steps, and keep the existing step names intact so the workflow behavior
remains the same.

Source: Path instructions

steps:
- name: Checkout source
uses: actions/checkout@v6
uses: actions/checkout@v7

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

Pin every actions/checkout reference to a full SHA.

actions/checkout@v7 is a mutable tag (CWE-829). Replace each occurrence in this workflow (28, 56, 73, 231, 281, 327, 402) with the vetted commit SHA; otherwise a retargeted release can run arbitrary code in CI.

🧰 Tools
🪛 zizmor (1.26.1)

[warning] 27-28: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-pr-checks.yaml at line 28, The workflow is using
mutable actions/checkout tags, which should be replaced with pinned commit SHAs.
Update every actions/checkout reference in ci-pr-checks.yaml, including the
checkout steps used in the workflow jobs, to the vetted full SHA instead of v7.
Keep the change consistent across all occurrences so the workflow always runs
the exact approved checkout revision.

Comment on lines 32 to +33
- name: Checkout code
uses: actions/checkout@v6
uses: actions/checkout@v7

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

sed -n '1,220p' .github/workflows/md-link-check.yml

Repository: opendatahub-io/llm-d-router

Length of output: 1532


Critical: pin every uses: reference by full SHA (CWE-829, CVE-2025-30066 class).
actions/checkout@v7 and lycheeverse/lychee-action@v2.8.0 are mutable tags. Replace both with vetted commit SHAs.

🧰 Tools
🪛 zizmor (1.26.1)

[warning] 32-33: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/md-link-check.yml around lines 32 - 33, Both action
references are using mutable tags, so update the workflow to pin each `uses:`
entry to a vetted full commit SHA instead of version tags. Locate the
`actions/checkout` step and the `lycheeverse/lychee-action` step in the
workflow, and replace `@v7` / `@v2.8.0` with their approved SHAs.

Source: Path instructions

# into main between the tag push and this workflow's run is not
# accidentally rolled into the wrong release.
uses: actions/checkout@v6
uses: actions/checkout@v7

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

python3 - <<'PY'
from pathlib import Path
p = Path('.github/workflows/release-notes-assemble.yaml')
text = p.read_text()
for needle in ['actions/checkout@v7', 'origin/main', 'RELEASE-NOTES.md', 'rerun', 'overwrite']:
    print(f'## {needle}')
    for i, line in enumerate(text.splitlines(), 1):
        if needle in line:
            print(f'{i}: {line}')
PY

Repository: opendatahub-io/llm-d-router

Length of output: 753


🏁 Script executed:

sed -n '70,130p' .github/workflows/release-notes-assemble.yaml | nl -ba

Repository: opendatahub-io/llm-d-router

Length of output: 205


🏁 Script executed:

awk 'NR>=70 && NR<=130 { printf "%4d  %s\n", NR, $0 }' .github/workflows/release-notes-assemble.yaml

Repository: opendatahub-io/llm-d-router

Length of output: 2756


Pin actions/checkout and make the release-notes write idempotent.

  • .github/workflows/release-notes-assemble.yaml:36 uses actions/checkout@v7 instead of a full SHA (CWE-829; CVE-2025-30066 class).
  • The job prepends a fresh RELEASE-NOTES.md section on every run, so rerunning after the release notes already landed on main will duplicate the same tag block.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release-notes-assemble.yaml at line 36, The release-notes
workflow currently uses an unpinned actions/checkout reference and blindly
prepends a new RELEASE-NOTES.md section each run. Update the checkout step in
the release-notes assembly workflow to a full commit SHA, and make the write
logic in the release-notes generation step idempotent by checking for an
existing tag section before adding it so reruns do not duplicate the same block.

Source: Path instructions

Comment on lines 32 to +33
- name: Checkout main
uses: actions/checkout@v6
uses: actions/checkout@v7

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
file=".github/workflows/release-notes-update.yaml"
wc -l "$file"
echo "----"
cat -n "$file" | sed -n '1,220p'

Repository: opendatahub-io/llm-d-router

Length of output: 4522


Critical: pin the GitHub Actions here by full SHA (CWE-829, CVE-2025-30066 class). This pull_request_target workflow runs with a GitHub App token and later force-pushes a branch, so actions/create-github-app-token@v3 and actions/checkout@v7 are both supply-chain entry points.

🧰 Tools
🪛 zizmor (1.26.1)

[warning] 32-36: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release-notes-update.yaml around lines 32 - 33, The
workflow uses unpinned third-party GitHub Actions, which leaves the
pull_request_target job exposed to supply-chain risk. Update the actions
referenced in the release-notes update workflow, especially
actions/create-github-app-token and actions/checkout, to full commit SHAs
instead of version tags, and keep the existing job logic intact while changing
only the action references.

Source: Path instructions

Comment on lines +115 to +121
apiType := determineAPITypeFromPath(request.GetRequestPath(headers))
extractedBody, err := extractRequestBody(apiType, body)
if err != nil {
return nil, err
}
extractedBody.Payload = fwkrh.PayloadMap(bodyMap)
extractedBody.MaxOutputTokens = maxOutputTokensForAPI(apiType, bodyMap)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the exact definition locations if the first pass did not surface them.
rg -n "type PayloadMap|MaxOutputTokensFromPayload|maxOutputTokensForAPI" pkg -S

Repository: opendatahub-io/llm-d-router

Length of output: 1952


Pass fwkrh.PayloadMap through the max-token path

MaxOutputTokensFromPayload takes fwkrh.PayloadMap, but maxOutputTokensForAPI still accepts map[string]any and forwards it unchanged, which breaks the build on every branch here. Convert once and pass the typed payload through the helper. CWE-704.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/framework/plugins/requesthandling/parsers/openai/openai.go` around
lines 115 - 121, The max-token flow is still using an untyped map, which causes
the build break in the OpenAI parser path. Update the request handling path in
openai.go so `maxOutputTokensForAPI` accepts and uses `fwkrh.PayloadMap` instead
of `map[string]any`, and pass the typed payload through from the
`extractRequestBody` call site where `extractedBody.Payload` and
`extractedBody.MaxOutputTokens` are set. Make the conversion once and keep the
payload typed all the way through the helper chain, including any
`MaxOutputTokensFromPayload` usage.

Comment on lines +125 to +128
// max_tokens lives under sampling_params in the generate wire format.
if sp, ok := bodyMap["sampling_params"].(map[string]any); ok {
body.MaxOutputTokens = fwkrh.MaxOutputTokensFromPayload(sp, "max_tokens")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,220p' pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.go

printf '\n--- helper search ---\n'
rg -n "type PayloadMap|MaxOutputTokensFromPayload|sampling_params" pkg/epp/framework -S

printf '\n--- type definition context ---\n'
file=$(rg -l "type PayloadMap" pkg/epp/framework -S | head -n 1)
if [ -n "${file:-}" ]; then
  sed -n '1,220p' "$file"
fi

Repository: opendatahub-io/llm-d-router

Length of output: 16611


Convert sampling_params to fwkrh.PayloadMap first
Line 127 passes map[string]any into MaxOutputTokensFromPayload, which expects fwkrh.PayloadMap (CWE-704). This is a compile-time type mismatch. Cast the nested map before calling the helper.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.go`
around lines 125 - 128, The sampling_params handling in
vllmhttp.ParseGenerateRequest passes a plain map[string]any into
MaxOutputTokensFromPayload, but that helper expects fwkrh.PayloadMap. Update the
nested sampling_params extraction in vllmhttp.go to cast the map to
fwkrh.PayloadMap before calling fwkrh.MaxOutputTokensFromPayload, keeping the
existing body.MaxOutputTokens assignment unchanged.

@zdtsw zdtsw merged commit fae8ee5 into opendatahub-io:main Jun 30, 2026
41 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.