Skip to content

[sync] upstream llm-d/llm-d-router 2136215b [2026-06-11]#253

Merged
kylape merged 103 commits into
opendatahub-io:mainfrom
kylape:sync-upstream
Jun 12, 2026
Merged

[sync] upstream llm-d/llm-d-router 2136215b [2026-06-11]#253
kylape merged 103 commits into
opendatahub-io:mainfrom
kylape:sync-upstream

Conversation

@kylape

@kylape kylape commented Jun 12, 2026

Copy link
Copy Markdown

Syncs llm-d/llm-d-router main into opendatahub-io/llm-d-router main.

  • removed CODEOWNER
  • Downgrade go requirement for downstream go-toolchain builder img

Upstream commit: llm-d@2136215

Summary by CodeRabbit

  • New Features

    • Added explicit tokenized prompt support with configurable backends (vLLM, estimate).
    • Implemented flow control with priority-band topology management.
    • Added plugin state debugging endpoint for operational troubleshooting.
    • Extended metrics extraction with custom scalar metrics support.
  • Documentation

    • Updated sidecar configuration and disaggregation guides.
    • Added plugin metric protocol and debug endpoint documentation.
  • Chores

    • Updated Go version to 1.25.11 and container image versions.

sagearc and others added 30 commits June 1, 2026 08:31
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
* Use build push action for image builds

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* cr review comments 3 and 4

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* fix(datalayer): dispatch endpoint update notifications

Existing endpoint metadata changes updated the in-memory endpoint but did not notify endpoint-notification-source consumers. Dispatch EventAddOrUpdate when metadata changes so endpoint-aware plugins can refresh per-endpoint state.

Signed-off-by: zhouyou9505 <zhouyou9505@gmail.com>

* fix(datalayer): skip no-op endpoint updates

Signed-off-by: zhouyou9505 <zhouyou9505@gmail.com>

---------

Signed-off-by: zhouyou9505 <zhouyou9505@gmail.com>
…llm-d#1333)

* refactor: flow control, move plugin resolution into the config loader

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* apply suggestions

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* change copyright heading

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
- default SELinux require :z for Go cache to be mounted
- fix permission denieds on mkdiir /go/cache

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…-d#1424)

* ci: implement Phase 2 quality gates and reliability improvements

- Enforce coverage regression gate: compare-coverage.sh now exits
  non-zero on regression, and a final enforcement step fails the
  PR check while allowing e2e tests and cache saves to complete.
- Add dependency review workflow using actions/dependency-review-action
  to flag high-severity vulnerabilities in dependency changes on PRs.
- Add builder image workflow to validate Dockerfile.builder on PRs
  and push to GHCR on merge, with content-hash tagging and GHA cache.

Phase 2 item "Replace GHCR_TOKEN PAT with GITHUB_TOKEN" was already
completed upstream -- image push uses github.token.

Ref: llm-d#956
Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>

* ci: address review feedback on Phase 2 PR

Coverage gate: add configurable regression tolerance (default 2.0
percentage points). Regressions within the tolerance are reported but
do not fail the build, covering cases like deleted high-coverage code
or tests landing in a followup PR.

Builder workflow: removed for now. It can be added later as a
conditional job in ci-build-images.yaml, gated on Dockerfile.builder
path changes, which keeps all image builds in one place.

Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>

---------

Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
… and token representations (llm-d#1380)

* add Prompt, Tokens, EatractedCacheSalt to InferenceRequestBody

Signed-off-by: bobzetian <bobzetian@google.com>

* make tokenizedInput as an array

Signed-off-by: bobzetian <bobzetian@google.com>

* refactor(types): introduce unified slice-of-struct fields for user inputs

Introduce unified, protocol-agnostic slice-of-structs fields (Prompts []UnifiedPrompt and TokenInputs []TokenizedInput) in InferenceRequestBody to support batched user inputs and pre-tokenized token IDs cleanly across all parser extensions.

Signed-off-by: bobzetian <bobzetian@google.com>

* remove dead code and rename BlockTypeDocument

Signed-off-by: bobzetian <bobzetian@google.com>

---------

Signed-off-by: bobzetian <bobzetian@google.com>
Start the InferenceObjective and InferenceModelRewrite reconcilers when either the llm-d.ai/v1alpha2 or legacy inference.networking.x-k8s.io/v1alpha2 resources are advertised by discovery.

Add discovery tests for the supported API groups and partial resource availability.

Signed-off-by: Richard Li <tianqi.li@oracle.com>
…m-d#1428)

* feat: swap tokenzier USD by render

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: code review

- need set HF_HOME in example

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
…pends on PR llm-d#1248 (llm-d#1288)

* Conditionl decode logics implementation

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Conditional decoding logics depends ob presence of 'Prefer: if-available' header, and does not depend on 'EPP-Phase' header value

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* The 'Prefer: if-available' gate is moved above profile-handler layer, so it works also without 'disagg-profile-handler' configured

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Minor comments chnges

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Minor fix in comment syntax

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Export PrefixCacheMatchInfoKey to align producer/consumer key lookup

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Drop stale nil arg from conditional-decode Pick calls

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Revert disagg_profile_handler and approx-prefix producer changes

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Update pkg/common/routing/common.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dmitripikus <46105577+dmitripikus@users.noreply.github.com>

* Use errors.As to preserve typed scheduler errors through wrapping

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Drop redundant PrefixCacheMatchInfoKey constant

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Drop redundant comment in IsConditionalDecode

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Log false-return reasons in primaryEndpointHasCachedPrefix

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Cover wrong-type attribute branch in TestPrimaryEndpointHasCachedPrefix

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Drop duplicated semantics from IsConditionalDecode godoc

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

* Use datalayer.NewTestRuntime in conditional-decode test helper

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>

---------

Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com>
Signed-off-by: dmitripikus <46105577+dmitripikus@users.noreply.github.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
…h action (llm-d#1437)

Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
…eneric scorer (llm-d#1121)

Collapse the all-in-one precise-prefix-cache-scorer into a wrapper that
composes precise-prefix-cache-producer + the generic prefix-cache-scorer
behind the legacy plugin type. Existing YAML keeps working unchanged;
behavior tracks the new path because the two paths are now the same code.

Factory modes:
- Self-host: instantiate an internal producer + scorer pair from the
  legacy parameters and expose Scorer + DataProducer + PreRequest +
  EndpointExtractor by delegation.
- Defer-to-existing: when a precise-prefix-cache-producer is already
  configured in the handle, return a prefix-cache-scorer pointed at it
  and ignore the legacy parameters. Multiple precise producers raise an
  error since the wrapper cannot disambiguate.

The historical indexerConfig.tokenizersPoolConfig path is preserved
through an isolated legacyProducer that owns its own
tokenization.Pool: it pre-tokenizes the request prompt and stashes the
result on request.Body.TokenizedPrompt before delegating to the embedded
producer. The pool config is stripped from the inner indexer to avoid
double-construction.

Other behavior preserved from the original:
- PreRequest nil-guards on target and prefill endpoint metadata.
- Speculative TTL eviction bound to the callback's ctx.
- ttlcache.Start() in place of cleanCachePeriodically.
- OTel span llm_d.epp.scorer.prefix_cache on Score with the original
  attribute schema, so existing dashboards keep working.

Deletes the heavy scorer's standalone test files (uds, tokenized,
extractor) and testdata fixtures; the producer carries equivalent
coverage. Adds a small wrapper-focused test file (interface contract,
defer-mode behavior, legacy tokenizer pool path).

README rewritten as a migration guide.

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…mory (llm-d#1160)

The routing-side prefix-cache indexer keeps one LRU entry per (pod, block),
so a small block size (e.g., vLLM's default of 16) inflates indexer memory
by ~64x. Apply a minBlockSizeTokens=64 floor uniformly in GetBlockSize,
regardless of whether the value originates from an endpoint metric, the
autotuned fallback, or a manual configuration. Routing intentionally
measures matches at coarser granularity than the model server's true block
size.

The previous design asymmetrically honored manual values with a startup
warning. Uniform clamping is simpler and provides the same memory bound
without leaving an operator footgun in place.

The clamp constant is a package var rather than const so prefix-match
tests that exercise behavior with deliberately small block sizes can
lower it via a test helper (disableMinBlockSizeClamp) without leaking a
knob into the public configuration surface.

Fixes llm-d#1158

Signed-off-by: Greg Neighbors <26003+gkneighb@users.noreply.github.com>
Co-authored-by: Greg Neighbors <26003+gkneighb@users.noreply.github.com>
…lm-d#1449)

EPP cache configuration was hardcoded to use the new `llm-d.ai` API group
for `InferenceObjective` and `InferenceModelRewrite` resources. This caused
EPP to fail to start in clusters where only the legacy `inference.networking.x-k8s.io`
CRDs were installed, even though the config loader detected them.

This change:
- Updates `ControllerConfig` to detect and store the actual `GroupVersion` found in the cluster.
- Dynamically constructs the manager's `Scheme` using the detected `GroupVersion`s.
- Configures the cache to watch the correct GVK dynamically.
- Adds warning logs if both legacy and new CRDs are present, noting that EPP will prefer the new group.

TAG=agy
CONV=85a5b03d-2eab-46a0-8fe5-144ba375a5b7

Signed-off-by: ahg-g <ahg@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* test: add Ginkgo Extended labels to slow scheduler e2e test blocks

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* split ci test workflow

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* split further

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* split image build and cache reuse

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* cache layers up to commit sha

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* remove e2e build aggregate

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix render sidecar removal

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* share e2e images as artifacts

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* cleanup

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* tidy e2e image loading

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* preload renderer image

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* readinessProbe renderer

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix e2e-image action call to use push and buildx-outputs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* ci reviews

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage <80211083+sagearc@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* update: github action to scan image before push out

- steps: build amd64 image, generate tarball to scan, gate on high or critical
- if passed: build amd64 and arm64, push to registry

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: typo

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* metrics: Add plugin name and type labels to EPP metrics

Introduce plugin_name and plugin_type labels for the new EPP metrics
with prefix llm_d_router_epp under LLMDRouterEndpointPickerSubsystem.
This implements tracking of specific endpoint picker plugin details.

Signed-off-by: Cong Liu <conliu@google.com>

* test(predictedlatency): Add assertions for llmd predicted metrics

Add assertions in TestRecordRequestLatencyMetrics for newly introduced
llmd-prefixed predicted metrics and gauges, ensuring full test coverage
for plugin name and type labels in predicted latency metrics.

Signed-off-by: Cong Liu <conliu@google.com>

---------

Signed-off-by: Cong Liu <conliu@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* make sure skipped request is can also go through profiles

Signed-off-by: bobzetian <bobzetian@google.com>

* test(server): add assertions for skipped request routing

Signed-off-by: bobzetian <bobzetian@google.com>

* epp: unify skip routing and enforce RawPayload globally

EPP previously had inconsistent behavior when request parsing was skipped
(e.g., for unsupported gRPC paths in the vllmgrpc parser). The vllmgrpc
parser returned a nil body on skip, which caused nil-pointer panics in
the Scheduling Director when it attempted model rewrite or repackaging.

This change unifies the skip story by enforcing a non-nil body with
RawPayload globally at the EPP framework level (in server.go) whenever
a parser signals Skip. This ensures the Director can safely execute
smart routing, admission control, and subsetting for skipped requests
without panics, while still bypassing the expensive response-phase
interception.

The RequestSkipped state comments and logging have also been improved
to accurately reflect that skipped requests are smartly routed by the
director rather than falling back to random endpoints.

TEST=make test-unit

Signed-off-by: bobzetian <bobzetian@google.com>

* epp: simplify vertexai parser to return nil body on skip

Following the global Skip payload-force implementation in server.go,
individual parsers no longer need to create and return a dummy body
with RawPayload on skip.

This change simplifies the vertexai parser to return nil body on skip,
aligning it with the vllmgrpc parser and keeping parser implementations
boilerplate-free. The vertexai unit test has also been updated.

TEST=make test-unit

Signed-off-by: bobzetian <bobzetian@google.com>

* refactor: rename Skip to SkipResponseProcessing and RequestSkipped to RequestResponseProcessingSkipped

To improve readability, code clarity, and explicitly communicate the
architectural intent of EPP skip behavior, this change renames the
Skip field in the Parser ParseResult struct to SkipResponseProcessing.

Additionally, the corresponding internal state constant in server.go
is renamed from RequestSkipped to RequestResponseProcessingSkipped.

All parser implementations (openai, passthrough, vertexai, vllmgrpc)
and their corresponding unit tests, along with the main server unit
tests, have been updated to reflect this complete renaming.

TEST=make test-unit

Signed-off-by: bobzetian <bobzetian@google.com>

* enforce populate Body.Payload when SkipResponseProcessing is true

Signed-off-by: bobzetian <bobzetian@google.com>

---------

Signed-off-by: bobzetian <bobzetian@google.com>
…e-pool-namespace flags (llm-d#1416)

The --inference-pool flag (introduced alongside the deprecation) accepts
"namespace/name" or just "name" and is the only supported way to select
the InferencePool. INFERENCE_POOL_NAMESPACE and INFERENCE_POOL_NAME
environment variables are removed for the same reason; INFERENCE_POOL
remains.

The SSRF-protection error message and the env-var test are updated to
reference only the new flag and env var.

Signed-off-by: Etai Lev Ran <elevran@gmail.com>
vMaroon and others added 25 commits June 9, 2026 07:09
Corrected the GitHub username for the precise prefix cache.

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
* Add support to the EC Nixl connector.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Cleanup code.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Update pkg/sidecar/proxy/connector_epd_ec_common.go

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

* Update pkg/sidecar/proxy/connector_epd_ec_common.go

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

* Update pkg/sidecar/proxy/connector_epd_ec_common.go

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

* Update pkg/sidecar/proxy/connector_epd_ec.go

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Update pkg/sidecar/proxy/connector_epd_shared_storage.go

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

* Fix lint error.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Replace V(4) calls in connector_epd_shared_storage.go to V(logging.DEBUG)

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
…:latest (llm-d#1457)

* Add script and Makefile target to check simulator image does not use :latest

Add scripts/check-latest-tags.sh which scans YAML files for
llm-d-inference-sim:latest references and exits non-zero if any are
found. Wire it into the presubmit gate via a new check-latest-tags
Makefile target.

Closes llm-d#1306

Signed-off-by: Aswin Raj <aswinraj7e@gmail.com>
Signed-off-by: apollofps <aswinraj7e@gmail.com>

* Expand latest-tag check to all images and add --warn mode

Expand the scope from simulator-only to all container image references
per reviewer feedback. Add a --warn flag that prints violations but
exits 0; the presubmit target uses warn mode until existing :latest
references in GPU manifests are pinned.

Signed-off-by: Aswin Raj <aswinraj7e@gmail.com>
Signed-off-by: apollofps <aswinraj7e@gmail.com>

* Address Copilot review: harden grep portability and prune .git

- Use grep -Hn instead of -rn so filenames are always present in
  output even when xargs passes a single file.
- Prune .git, vendor, and node_modules from the find tree.
- Match "[[:space:]]image:" as a YAML key instead of bare substring
  to avoid false positives from unrelated fields.

Signed-off-by: Aswin Raj <aswinraj7e@gmail.com>
Signed-off-by: apollofps <aswinraj7e@gmail.com>

---------

Signed-off-by: Aswin Raj <aswinraj7e@gmail.com>
Signed-off-by: apollofps <aswinraj7e@gmail.com>
* disagg: add test coverage for encode edge cases

Signed-off-by: namgyu-youn <namgyu.dev@gmail.com>

* test(disagg): replace unicode arrow with ASCII dash in test name

Swap the non-ASCII → character for a plain - in the EPD test case
name to stay consistent with the ASCII-only style used elsewhere.

Signed-off-by: namgyu-youn <namgyu.dev@gmail.com>

---------

Signed-off-by: namgyu-youn <namgyu.dev@gmail.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* clean up references to inference-extension

Signed-off-by: ahg-g <ahg@google.com>

* Updated some variables to use epp prefix instead of router

Signed-off-by: ahg-g <ahg@google.com>

* fixed the helm verify error

Signed-off-by: ahg-g <ahg@google.com>

---------

Signed-off-by: ahg-g <ahg@google.com>
…timator for chatCompletion and messages (llm-d#1554)

* add tool estimation

Signed-off-by: bobzetian <bobzetian@google.com>

* move tools before system

Signed-off-by: bobzetian <bobzetian@google.com>

---------

Signed-off-by: bobzetian <bobzetian@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* Add EPP model server protocol to docs

Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com>

* Rename model-server-protocol to plugin-metric-protocol, adjust to focus on plugins adding a producer and consumer section to table

Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com>

* Break metrics out of table, individual sections for each metric

Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com>

* Remove doc link to fix lint

Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com>

---------

Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com>
* Add plugin state debug endpoint

Expose a metrics-server debug handler that dumps sanitized state from plugins that opt in via a new StateDumper interface. Add a concrete in-flight load state dumper and focused unit coverage for handler behavior and deterministic output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

* Address Copilot review feedback

- Use copyright year 2025 in new pkg/epp/server files to match
  surrounding files.
- Switch debug payload to {timestamp, plugins{name: {type, state}}}
  to match the shape requested in issue llm-d#1074; inject a clock
  function so tests stay deterministic.
- Gate /debug/plugins/state behind a new EnablePluginStateDebug option
  (defaults to true, mirroring EnablePprof) and skip registration with
  a log message when the plugin handle is unavailable instead of
  failing setup.
- Document that InFlightLoadProducer.DumpState snapshots its request
  and token trackers under separate locks and is therefore not
  internally atomic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

* Make plugin state debug endpoint opt-in

Default EnablePluginStateDebug to false so upgrades do not silently expose /debug/plugins/state on the metrics/admin server. Keep the explicit --enable-plugin-state-debug flag for operators who want the endpoint and cover both default and flag behavior in options tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

* Fix module imports after rebase

Update the new plugin state debug server files to use the current github.com/llm-d/llm-d-router module path after rebasing onto the latest main branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

* Address plugin state debug review comments

Use explicit JSON state dumps so StateDumper implementations own serialization, document when state dumps should be used instead of metrics, bound the in-flight load debug payload to the busiest endpoints, and register the endpoint through a generic metrics handler registrar with localhost-only access instead of a separate enable flag.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

* Document plugin state debug endpoint

Document the plugin state debug endpoint and include non-dumper plugins in the response with an explanatory message so operators can distinguish unsupported state collection from missing plugin configuration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

* Use metrics server access controls for plugin state debug

Rely on the metrics/admin server exposure and authentication controls for /debug/plugins/state instead of adding a handler-level localhost check, matching the other handlers registered on the same server.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

* Fix unused import after rebase

Remove a stale reflect import left from the rebase conflict resolution in the in-flight load producer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

* Keep plugin debug state responses partial

Report per-plugin dump errors in each plugin entry instead of failing the entire endpoint, and document that the endpoint is not exposed by the standalone file-discovery metrics mux.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

* Fix plugin debug state lint

Remove the unused error return from collectPluginState now that per-plugin dump errors are reported in each plugin entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>

---------

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lm-d#1193)

- similar to how sglang is doing but with different json payload
    - new --kv-connector=mooncake or --mooncake-bootstrap-port flag on main
    - http /query to prefill pod to get remote_engine_id from rank 0 (or
      keep next one) per each request
    - use default 8998 on prefill pod to concat remote_bootstrap_addr
    - generate uuid for transfer_id
    - concurrent request to prefill and decode
    - add docs for how we use different connector
    - add new connector_mooncake_test.go with tests for P/D requests
    - add mooncake to options_test.go connector validation
    - change to store engine_id into LRU only need first request to do query
    - update: add support for multi-rank
    - set request header X-data-prallel-rank with rank_id
    - wont have mooncake in common connector tests
    - update: bump llm-d-inference-sim version to include new endpoint for query
    - update: force set mooncake port to 8000 in test to work with llm-d-sim

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
…e producer (llm-d#1576)

The multimodal encoder-cache producer defined its own llmdSubsystem
constant set to "llm_d_router_epp", duplicating the centralized
LLMDRouterEndpointPickerSubsystem in pkg/epp/metrics. Same value today;
imports the constant to remove the drift risk and match the pattern in
approximateprefix and predictedlatency producers.

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* fix race condition in logger test

Signed-off-by: Nicole Xin <nxin@google.com>

* fix typo

Signed-off-by: Nicole Xin <nxin@google.com>

---------

Signed-off-by: Nicole Xin <nxin@google.com>
Full-path stress benchmark exercises the real producer->detector->controller
pipeline under concurrent load across multiple priority bands with leak
detection via counter assertions and memprofile support.

Follows the production sequence: EnqueueAndWait (admission) -> PreRequest
(tracking) -> ResponseBody (release).

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: Tessa Pham <tepham@redhat.com>
* add FirstTokenTimestamp to RequestContext

Signed-off-by: Tessa Pham <tepham@redhat.com>

* TTFT and TPOT histogram placeholders

Signed-off-by: Tessa Pham <tepham@redhat.com>

* define and register new TTFT and TPOT metrics

Signed-off-by: Tessa Pham <tepham@redhat.com>

* calculate FirstTokenTimestamp and record metrics

Signed-off-by: Tessa Pham <tepham@redhat.com>

* add tests

Signed-off-by: Tessa Pham <tepham@redhat.com>

* format

Signed-off-by: Tessa Pham <tepham@redhat.com>

* fix metrics name conflicts

Signed-off-by: Tessa Pham <tepham@redhat.com>

* retrigger CI

Signed-off-by: Tessa Pham <tepham@redhat.com>

* condense TTFT buckets above 120s

Signed-off-by: Tessa Pham <tepham@redhat.com>

* condense TPOT buckets above 2s

Signed-off-by: Tessa Pham <tepham@redhat.com>

* clarify TTFT metric for non-streaming requests

Signed-off-by: Tessa Pham <tepham@redhat.com>

* add streaming to TPOT metric name and clarify desc

Signed-off-by: Tessa Pham <tepham@redhat.com>

* add fairness_id and priority labels

Signed-off-by: Tessa Pham <tepham@redhat.com>

* update tests to add new labels

Signed-off-by: Tessa Pham <tepham@redhat.com>

* read fairnessID and priority from reqContext

Signed-off-by: Tessa Pham <tepham@redhat.com>

* add streaming label

Signed-off-by: Tessa Pham <tepham@redhat.com>

* add nil check for SchedulingRequest

Signed-off-by: Tessa Pham <tepham@redhat.com>

* retrigger CI

Signed-off-by: Tessa Pham <tepham@redhat.com>

* update metric names and descriptions

Signed-off-by: Tessa Pham <tepham@redhat.com>

* retrigger CI

Signed-off-by: Tessa Pham <tepham@redhat.com>

---------

Signed-off-by: Tessa Pham <tepham@redhat.com>
* docs: add README for requestcontrol/dataproducer directory

Summarises the seven data producer plugins, their produced attributes,
lifecycle hooks, and inter-plugin dependency ordering.

Signed-off-by: Rahul Gurnani <rahulgurnani@google.com>

* Add dataproducer diagram

Signed-off-by: Rahul Gurnani <rahulgurnani@google.com>

---------

Signed-off-by: Rahul Gurnani <rahulgurnani@google.com>
Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>
…m-d#1354)

* flowcontrol: move priority band provisioning off request hot path

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* Address comments from Luke and Shmuel

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* Address comments from Luke

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

---------

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
llm-d#1585)

* fix: strip query parameters from request path before parser resolution

The HTTP/2 :path pseudo-header includes query parameters (e.g.
/v1/messages?beta=true), which caused parser suffix matching and
per-parser endpoint validation to fail for clients that append query
strings. Strip query parameters at header ingestion in
HandleRequestHeaders so all downstream consumers see a clean path.

Signed-off-by: greg pereira <gpereira@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: greg pereira <grpereir@redhat.com>

* fix: add comment explaining query parameter stripping from :path

Signed-off-by: greg pereira <grpereir@redhat.com>

* make implementation non mutating

Signed-off-by: greg pereira <grpereir@redhat.com>

---------

Signed-off-by: greg pereira <gpereira@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>

# Conflicts:
#	OWNERS
#	go.mod
The midstream uses Prow OWNERS files for ownership; the upstream
CODEOWNERS references upstream-only teams and would auto-request
reviews that do not apply downstream.

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Kyle Lape <klape@redhat.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Important

Review skipped

Too many files!

This PR contains 299 files, which is 149 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9e49187f-2b71-46d3-86b5-da10dd43672f

📥 Commits

Reviewing files that changed from the base of the PR and between 3227f69 and ca6ad69.

⛔ Files ignored due to path filters (1)
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/image.png is excluded by !**/*.png
📒 Files selected for processing (299)
  • .github/ISSUE_TEMPLATE/new-release.md
  • .github/actions/docker-build-and-push/action.yml
  • .github/actions/e2e-runner-setup/action.yml
  • .github/actions/trivy-scan/action.yml
  • .github/dependabot.yml
  • .github/workflows/auto-assign.yaml
  • .github/workflows/check-typos.yaml
  • .github/workflows/ci-build-images.yaml
  • .github/workflows/ci-dependency-review.yaml
  • .github/workflows/ci-pr-checks.yaml
  • .github/workflows/pr-hold-gate.yml
  • .github/workflows/prow-github.yml
  • .github/workflows/prow-pr-automerge.yml
  • .github/workflows/prow-pr-remove-lgtm.yml
  • .lychee.toml
  • AGENTS.md
  • DEVELOPMENT.md
  • Dockerfile.builder
  • Dockerfile.epp
  • Dockerfile.epp.konflux
  • Dockerfile.sidecar
  • Dockerfile.sidecar.konflux
  • LEADS.md
  • Makefile
  • Makefile.tools.mk
  • apix/config/v1alpha1/endpointpickerconfig_types.go
  • apix/config/v1alpha1/endpointpickerconfig_types_test.go
  • apix/config/v1alpha1/zz_generated.deepcopy.go
  • apix/v1alpha2/inferencemodelrewrite_types.go
  • apix/v1alpha2/zz_generated.deepcopy.go
  • cmd/epp/main.go
  • cmd/epp/runner/health.go
  • cmd/epp/runner/health_test.go
  • cmd/epp/runner/runner.go
  • cmd/pd-sidecar/main.go
  • config/charts/README.md
  • config/charts/llm-d-router-gateway/templates/epp.yaml
  • config/charts/llm-d-router-gateway/templates/gke.yaml
  • config/charts/llm-d-router-gateway/templates/inferenceextension.yaml
  • config/charts/llm-d-router-gateway/templates/rbac.yaml
  • config/charts/llm-d-router-standalone/templates/epp.yaml
  • config/charts/llm-d-router-standalone/templates/inferenceextension.yaml
  • config/charts/routerlib/templates/_config.yaml
  • config/charts/routerlib/templates/_deployment.yaml
  • config/charts/routerlib/templates/_gke.yaml
  • config/charts/routerlib/templates/_helpers.tpl
  • config/charts/routerlib/templates/_inferenceobjective.yaml
  • config/charts/routerlib/templates/_inferencepool.yaml
  • config/charts/routerlib/templates/_leader-election-rbac.yaml
  • config/charts/routerlib/templates/_rbac.yaml
  • config/charts/routerlib/templates/_sa-token-secret.yaml
  • config/charts/routerlib/templates/_service.yaml
  • config/charts/routerlib/templates/_servicemonitor.yaml
  • config/charts/routerlib/values.yaml
  • config/manifests/vllm/sim-deployment.yaml
  • config/manifests/vllm/sim-grpc-deployment.yaml
  • deploy/components/inference-gateway/deployment.yaml
  • deploy/config/epp-mm-embeddings-cache-config.yaml
  • deploy/environments/dev/README.md
  • deploy/environments/dev/p-d/patch-decode.yaml
  • docs/disaggregation.md
  • docs/plugin-metric-protocol.md
  • docs/plugin_debug.md
  • go.mod
  • hack/test-e2e.sh
  • hack/verify-helm.sh
  • pkg/common/error/error.go
  • pkg/common/observability/tracing/telemetry.go
  • pkg/common/observability/tracing/telemetry_test.go
  • pkg/common/routing/common.go
  • pkg/common/routing/common_test.go
  • pkg/epp/config/config.go
  • pkg/epp/config/loader/configloader.go
  • pkg/epp/config/loader/configloader_test.go
  • pkg/epp/config/loader/defaults.go
  • pkg/epp/config/loader/defaults_test.go
  • pkg/epp/config/loader/flowcontrol.go
  • pkg/epp/config/loader/flowcontrol_test.go
  • pkg/epp/config/loader/testdata_test.go
  • pkg/epp/config/loader/validation.go
  • pkg/epp/controller/inferenceobjective_reconciler.go
  • pkg/epp/datalayer/collector.go
  • pkg/epp/datalayer/collector_test.go
  • pkg/epp/datalayer/data_graph.go
  • pkg/epp/datalayer/data_graph_test.go
  • pkg/epp/datalayer/factory.go
  • pkg/epp/datalayer/logger/logger_test.go
  • pkg/epp/datalayer/runtime.go
  • pkg/epp/datalayer/runtime_endpoint_dispatch_test.go
  • pkg/epp/datastore/datastore.go
  • pkg/epp/datastore/datastore_test.go
  • pkg/epp/flowcontrol/benchmark/benchmark.go
  • pkg/epp/flowcontrol/benchmark/benchmark_test.go
  • pkg/epp/flowcontrol/config.go
  • pkg/epp/flowcontrol/config_test.go
  • pkg/epp/flowcontrol/contracts/mocks/mocks.go
  • pkg/epp/flowcontrol/contracts/registry.go
  • pkg/epp/flowcontrol/controller/controller.go
  • pkg/epp/flowcontrol/controller/controller_test.go
  • pkg/epp/flowcontrol/controller/internal/processor.go
  • pkg/epp/flowcontrol/controller/internal/processor_test.go
  • pkg/epp/flowcontrol/registry/config.go
  • pkg/epp/flowcontrol/registry/config_test.go
  • pkg/epp/flowcontrol/registry/registry.go
  • pkg/epp/flowcontrol/registry/registry_helpers_test.go
  • pkg/epp/flowcontrol/registry/registry_test.go
  • pkg/epp/framework/common/request/headers.go
  • pkg/epp/framework/common/request/headers_test.go
  • pkg/epp/framework/interface/datalayer/attributemap.go
  • pkg/epp/framework/interface/datalayer/attributemap_test.go
  • pkg/epp/framework/interface/datalayer/endpoint_metadata.go
  • pkg/epp/framework/interface/datalayer/endpoint_metadata_test.go
  • pkg/epp/framework/interface/plugin/plugins.go
  • pkg/epp/framework/interface/requestcontrol/plugins.go
  • pkg/epp/framework/interface/requesthandling/plugins.go
  • pkg/epp/framework/interface/requesthandling/types.go
  • pkg/epp/framework/interface/requesthandling/types_test.go
  • pkg/epp/framework/interface/scheduling/types.go
  • pkg/epp/framework/plugins/datalayer/attribute/concurrency/data_types.go
  • pkg/epp/framework/plugins/datalayer/attribute/metrics/data_types.go
  • pkg/epp/framework/plugins/datalayer/attribute/multimodal/data_types.go
  • pkg/epp/framework/plugins/datalayer/attribute/prefix/data_types.go
  • pkg/epp/framework/plugins/datalayer/attribute/prefix/data_types_test.go
  • pkg/epp/framework/plugins/datalayer/extractor/metrics/README.md
  • pkg/epp/framework/plugins/datalayer/extractor/metrics/extractor.go
  • pkg/epp/framework/plugins/datalayer/extractor/metrics/extractor_test.go
  • pkg/epp/framework/plugins/datalayer/extractor/metrics/factories.go
  • pkg/epp/framework/plugins/datalayer/extractor/metrics/loraspec.go
  • pkg/epp/framework/plugins/datalayer/extractor/metrics/mapping.go
  • pkg/epp/framework/plugins/datalayer/extractor/metrics/metrics_extraction_from_config_test.go
  • pkg/epp/framework/plugins/datalayer/extractor/metrics/spec_test.go
  • pkg/epp/framework/plugins/datalayer/source/http/datasource.go
  • pkg/epp/framework/plugins/datalayer/source/http/datasource_test.go
  • pkg/epp/framework/plugins/flowcontrol/saturationdetector/concurrency/detector_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/README.md
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/README.md
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/hashing.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/hashing_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/indexer.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/indexer_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/metrics.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/metrics_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/plugin.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/plugin_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/token_estimator.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/token_estimator_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/approximateprefix/types.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/producer_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/token_estimator.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/inflightload/token_estimator_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/multimodal/README.md
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/multimodal/export_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/multimodal/metrics.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/multimodal/metrics_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/multimodal/prerequest.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/multimodal/producer.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/multimodal/producer_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/README.md
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/blockkeys.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/producer.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/producer_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/utils.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/utils_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/dataproducer_hooks.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/latencypredictorclient/tests/Dockerfile
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/metrics.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/metrics_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/plugin.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/plugin_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/prediction.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/prediction_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/requestcontrol_hooks.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/requestcontrol_hooks_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/training.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/predictedlatency/training_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/README.md
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/backend.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/estimate.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/estimate_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/mm_estimate.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/tokenizer.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/tokenizer_test.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/uds.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/vllm_http.go
  • pkg/epp/framework/plugins/requestcontrol/dataproducer/tokenizer/vllm_http_test.go
  • pkg/epp/framework/plugins/requesthandling/parsers/anthropic/README.md
  • pkg/epp/framework/plugins/requesthandling/parsers/anthropic/anthropic.go
  • pkg/epp/framework/plugins/requesthandling/parsers/anthropic/anthropic_test.go
  • pkg/epp/framework/plugins/requesthandling/parsers/openai/openai.go
  • pkg/epp/framework/plugins/requesthandling/parsers/openai/openai_test.go
  • pkg/epp/framework/plugins/requesthandling/parsers/passthrough/passthrough.go
  • pkg/epp/framework/plugins/requesthandling/parsers/passthrough/passthrough_test.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vertexai/vertexai.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vertexai/vertexai_test.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vllmgrpc/vllmgrpc.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vllmgrpc/vllmgrpc_test.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp.go
  • pkg/epp/framework/plugins/requesthandling/parsers/vllmhttp/vllmhttp_test.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/README.md
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/disagg_headers_handler.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/disagg_profile_handler.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/disagg_profile_handler_test.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/metrics.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/metrics_test.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/multimodal_helpers.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/pd_profile_handler.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/pd_profile_handler_test.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/prefix_based_pd_decider.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/prefix_based_pd_decider_test.go
  • pkg/epp/framework/plugins/scheduling/profilehandler/disagg/scheduler_test.go
  • pkg/epp/framework/plugins/scheduling/scorer/activerequest/active_request_test.go
  • pkg/epp/framework/plugins/scheduling/scorer/contextlengthaware/README.md
  • pkg/epp/framework/plugins/scheduling/scorer/contextlengthaware/context_length_aware.go
  • pkg/epp/framework/plugins/scheduling/scorer/contextlengthaware/context_length_aware_test.go
  • pkg/epp/framework/plugins/scheduling/scorer/mmcacheaffinity/README.md
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/README.md
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/legacy_producer.go
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/precise_prefix_cache.go
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/precise_prefix_cache_extractor_test.go
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/precise_prefix_cache_test.go
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/precise_prefix_cache_tokenized_test.go
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/precise_prefix_cache_uds_test.go
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/testdata/test-model/config.json
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/testdata/test-model/special_tokens_map.json
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/testdata/test-model/tokenizer.json
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/testdata/test-model/tokenizer_config.json
  • pkg/epp/framework/plugins/scheduling/scorer/preciseprefixcache/utils.go
  • pkg/epp/framework/plugins/scheduling/scorer/tokenload/token_load.go
  • pkg/epp/framework/plugins/scheduling/scorer/tokenload/token_load_test.go
  • pkg/epp/handlers/parsers.go
  • pkg/epp/handlers/parsers_test.go
  • pkg/epp/handlers/request_test.go
  • pkg/epp/handlers/response.go
  • pkg/epp/handlers/response_test.go
  • pkg/epp/handlers/server.go
  • pkg/epp/metrics/llm_d_router_metrics.go
  • pkg/epp/metrics/metrics.go
  • pkg/epp/metrics/metrics_test.go
  • pkg/epp/requestcontrol/director.go
  • pkg/epp/requestcontrol/director_test.go
  • pkg/epp/requestcontrol/plugin_executor.go
  • pkg/epp/requestcontrol/plugin_executor_test.go
  • pkg/epp/server/controller_config.go
  • pkg/epp/server/controller_config_test.go
  • pkg/epp/server/controller_manager.go
  • pkg/epp/server/plugin_state_debug.go
  • pkg/epp/server/plugin_state_debug_test.go
  • pkg/epp/server/runserver.go
  • pkg/epp/server/server_test.go
  • pkg/metrics/llm_d_router_metrics.go
  • pkg/sidecar/constants/constants.go
  • pkg/sidecar/proxy/chat_completions.go
  • pkg/sidecar/proxy/chat_completions_test.go
  • pkg/sidecar/proxy/connector_ec_common.go
  • pkg/sidecar/proxy/connector_ec_nixl.go
  • pkg/sidecar/proxy/connector_ec_nixl_test.go
  • pkg/sidecar/proxy/connector_ec_shared_storage.go
  • pkg/sidecar/proxy/connector_ec_shared_storage_test.go
  • pkg/sidecar/proxy/connector_epd_shared_storage.go
  • pkg/sidecar/proxy/connector_mooncake.go
  • pkg/sidecar/proxy/connector_mooncake_test.go
  • pkg/sidecar/proxy/connector_nixlv2.go
  • pkg/sidecar/proxy/connector_nixlv2_test.go
  • pkg/sidecar/proxy/connector_sglang.go
  • pkg/sidecar/proxy/connector_test.go
  • pkg/sidecar/proxy/decode.go
  • pkg/sidecar/proxy/options.go
  • pkg/sidecar/proxy/options_test.go
  • pkg/sidecar/proxy/proxy.go
  • pkg/sidecar/proxy/proxy_helpers.go
  • pkg/telemetry/tracing.go
  • release-notes.d/unreleased/1121.md
  • release-notes.d/unreleased/1160.md
  • release-notes.d/unreleased/1288.md
  • release-notes.d/unreleased/1402.md
  • release-notes.d/unreleased/1426.md
  • release-notes.d/unreleased/1436.md
  • release-notes.d/unreleased/1444.md
  • release-notes.d/unreleased/1449.md
  • release-notes.d/unreleased/1475.md
  • release-notes.d/unreleased/1488.md
  • release-notes.d/unreleased/1493.md
  • release-notes.d/unreleased/1509.md
  • release-notes.d/unreleased/1513.md
  • release-notes.d/unreleased/1515.md
  • release-notes.d/unreleased/1536.md
  • release-notes.d/unreleased/1539.md
  • release-notes.d/unreleased/1554.md
  • scripts/check-latest-tags.sh
  • scripts/compare-coverage.sh
  • scripts/kind-dev-env.sh
  • scripts/pull_images.sh
  • test/e2e/disruption_test.go
  • test/e2e/e2e_suite_test.go
  • test/e2e/e2e_test.go
  • test/e2e/epp/e2e_suite_test.go
  • test/e2e/epp/e2e_test.go
  • test/e2e/setup_test.go

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Based on changes in 05c59e4, where the
`pkg/telemetry` dir was removed.

Signed-off-by: Kyle Lape <klape@redhat.com>

@anishasthana anishasthana left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kylape kylape merged commit 6831d5f into opendatahub-io:main Jun 12, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.