[sync] upstream llm-d/llm-d-router 2136215b [2026-06-11]#253
Conversation
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
* Use build push action for image builds Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * cr review comments 3 and 4 Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> --------- Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* fix(datalayer): dispatch endpoint update notifications Existing endpoint metadata changes updated the in-memory endpoint but did not notify endpoint-notification-source consumers. Dispatch EventAddOrUpdate when metadata changes so endpoint-aware plugins can refresh per-endpoint state. Signed-off-by: zhouyou9505 <zhouyou9505@gmail.com> * fix(datalayer): skip no-op endpoint updates Signed-off-by: zhouyou9505 <zhouyou9505@gmail.com> --------- Signed-off-by: zhouyou9505 <zhouyou9505@gmail.com>
…llm-d#1333) * refactor: flow control, move plugin resolution into the config loader Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * apply suggestions Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * change copyright heading Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> --------- Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
- default SELinux require :z for Go cache to be mounted - fix permission denieds on mkdiir /go/cache Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…-d#1424) * ci: implement Phase 2 quality gates and reliability improvements - Enforce coverage regression gate: compare-coverage.sh now exits non-zero on regression, and a final enforcement step fails the PR check while allowing e2e tests and cache saves to complete. - Add dependency review workflow using actions/dependency-review-action to flag high-severity vulnerabilities in dependency changes on PRs. - Add builder image workflow to validate Dockerfile.builder on PRs and push to GHCR on merge, with content-hash tagging and GHA cache. Phase 2 item "Replace GHCR_TOKEN PAT with GITHUB_TOKEN" was already completed upstream -- image push uses github.token. Ref: llm-d#956 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> * ci: address review feedback on Phase 2 PR Coverage gate: add configurable regression tolerance (default 2.0 percentage points). Regressions within the tolerance are reported but do not fail the build, covering cases like deleted high-coverage code or tests landing in a followup PR. Builder workflow: removed for now. It can be added later as a conditional job in ci-build-images.yaml, gated on Dockerfile.builder path changes, which keeps all image builds in one place. Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com> --------- Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
Signed-off-by: ahg-g <ahg@google.com>
… and token representations (llm-d#1380) * add Prompt, Tokens, EatractedCacheSalt to InferenceRequestBody Signed-off-by: bobzetian <bobzetian@google.com> * make tokenizedInput as an array Signed-off-by: bobzetian <bobzetian@google.com> * refactor(types): introduce unified slice-of-struct fields for user inputs Introduce unified, protocol-agnostic slice-of-structs fields (Prompts []UnifiedPrompt and TokenInputs []TokenizedInput) in InferenceRequestBody to support batched user inputs and pre-tokenized token IDs cleanly across all parser extensions. Signed-off-by: bobzetian <bobzetian@google.com> * remove dead code and rename BlockTypeDocument Signed-off-by: bobzetian <bobzetian@google.com> --------- Signed-off-by: bobzetian <bobzetian@google.com>
Signed-off-by: ahg-g <ahg@google.com>
Start the InferenceObjective and InferenceModelRewrite reconcilers when either the llm-d.ai/v1alpha2 or legacy inference.networking.x-k8s.io/v1alpha2 resources are advertised by discovery. Add discovery tests for the supported API groups and partial resource availability. Signed-off-by: Richard Li <tianqi.li@oracle.com>
…m-d#1428) * feat: swap tokenzier USD by render Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: code review - need set HF_HOME in example Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>
…pends on PR llm-d#1248 (llm-d#1288) * Conditionl decode logics implementation Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Conditional decoding logics depends ob presence of 'Prefer: if-available' header, and does not depend on 'EPP-Phase' header value Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * The 'Prefer: if-available' gate is moved above profile-handler layer, so it works also without 'disagg-profile-handler' configured Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Minor comments chnges Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Minor fix in comment syntax Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Export PrefixCacheMatchInfoKey to align producer/consumer key lookup Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Drop stale nil arg from conditional-decode Pick calls Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Revert disagg_profile_handler and approx-prefix producer changes Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Update pkg/common/routing/common.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dmitripikus <46105577+dmitripikus@users.noreply.github.com> * Use errors.As to preserve typed scheduler errors through wrapping Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Drop redundant PrefixCacheMatchInfoKey constant Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Drop redundant comment in IsConditionalDecode Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Log false-return reasons in primaryEndpointHasCachedPrefix Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Cover wrong-type attribute branch in TestPrimaryEndpointHasCachedPrefix Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Drop duplicated semantics from IsConditionalDecode godoc Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> * Use datalayer.NewTestRuntime in conditional-decode test helper Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> --------- Signed-off-by: Dmitri Pikus <DPIKUS@il.ibm.com> Signed-off-by: dmitripikus <46105577+dmitripikus@users.noreply.github.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com>
…y prompt…" (llm-d#1440) This reverts commit 6c41b08.
…h action (llm-d#1437) Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
…eneric scorer (llm-d#1121) Collapse the all-in-one precise-prefix-cache-scorer into a wrapper that composes precise-prefix-cache-producer + the generic prefix-cache-scorer behind the legacy plugin type. Existing YAML keeps working unchanged; behavior tracks the new path because the two paths are now the same code. Factory modes: - Self-host: instantiate an internal producer + scorer pair from the legacy parameters and expose Scorer + DataProducer + PreRequest + EndpointExtractor by delegation. - Defer-to-existing: when a precise-prefix-cache-producer is already configured in the handle, return a prefix-cache-scorer pointed at it and ignore the legacy parameters. Multiple precise producers raise an error since the wrapper cannot disambiguate. The historical indexerConfig.tokenizersPoolConfig path is preserved through an isolated legacyProducer that owns its own tokenization.Pool: it pre-tokenizes the request prompt and stashes the result on request.Body.TokenizedPrompt before delegating to the embedded producer. The pool config is stripped from the inner indexer to avoid double-construction. Other behavior preserved from the original: - PreRequest nil-guards on target and prefill endpoint metadata. - Speculative TTL eviction bound to the callback's ctx. - ttlcache.Start() in place of cleanCachePeriodically. - OTel span llm_d.epp.scorer.prefix_cache on Score with the original attribute schema, so existing dashboards keep working. Deletes the heavy scorer's standalone test files (uds, tokenized, extractor) and testdata fixtures; the producer carries equivalent coverage. Adds a small wrapper-focused test file (interface contract, defer-mode behavior, legacy tokenizer pool path). README rewritten as a migration guide. Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
…mory (llm-d#1160) The routing-side prefix-cache indexer keeps one LRU entry per (pod, block), so a small block size (e.g., vLLM's default of 16) inflates indexer memory by ~64x. Apply a minBlockSizeTokens=64 floor uniformly in GetBlockSize, regardless of whether the value originates from an endpoint metric, the autotuned fallback, or a manual configuration. Routing intentionally measures matches at coarser granularity than the model server's true block size. The previous design asymmetrically honored manual values with a startup warning. Uniform clamping is simpler and provides the same memory bound without leaving an operator footgun in place. The clamp constant is a package var rather than const so prefix-match tests that exercise behavior with deliberately small block sizes can lower it via a test helper (disableMinBlockSizeClamp) without leaking a knob into the public configuration surface. Fixes llm-d#1158 Signed-off-by: Greg Neighbors <26003+gkneighb@users.noreply.github.com> Co-authored-by: Greg Neighbors <26003+gkneighb@users.noreply.github.com>
…lm-d#1449) EPP cache configuration was hardcoded to use the new `llm-d.ai` API group for `InferenceObjective` and `InferenceModelRewrite` resources. This caused EPP to fail to start in clusters where only the legacy `inference.networking.x-k8s.io` CRDs were installed, even though the config loader detected them. This change: - Updates `ControllerConfig` to detect and store the actual `GroupVersion` found in the cluster. - Dynamically constructs the manager's `Scheme` using the detected `GroupVersion`s. - Configures the cache to watch the correct GVK dynamically. - Adds warning logs if both legacy and new CRDs are present, noting that EPP will prefer the new group. TAG=agy CONV=85a5b03d-2eab-46a0-8fe5-144ba375a5b7 Signed-off-by: ahg-g <ahg@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* test: add Ginkgo Extended labels to slow scheduler e2e test blocks Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * split ci test workflow Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * split further Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * split image build and cache reuse Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * cache layers up to commit sha Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * remove e2e build aggregate Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * fix render sidecar removal Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * share e2e images as artifacts Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * cleanup Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * tidy e2e image loading Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * preload renderer image Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * readinessProbe renderer Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * fix e2e-image action call to use push and buildx-outputs Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * ci reviews Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> --------- Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Sage <80211083+sagearc@users.noreply.github.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* update: github action to scan image before push out - steps: build amd64 image, generate tarball to scan, gate on high or critical - if passed: build amd64 and arm64, push to registry Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: typo Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* metrics: Add plugin name and type labels to EPP metrics Introduce plugin_name and plugin_type labels for the new EPP metrics with prefix llm_d_router_epp under LLMDRouterEndpointPickerSubsystem. This implements tracking of specific endpoint picker plugin details. Signed-off-by: Cong Liu <conliu@google.com> * test(predictedlatency): Add assertions for llmd predicted metrics Add assertions in TestRecordRequestLatencyMetrics for newly introduced llmd-prefixed predicted metrics and gauges, ensuring full test coverage for plugin name and type labels in predicted latency metrics. Signed-off-by: Cong Liu <conliu@google.com> --------- Signed-off-by: Cong Liu <conliu@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* make sure skipped request is can also go through profiles Signed-off-by: bobzetian <bobzetian@google.com> * test(server): add assertions for skipped request routing Signed-off-by: bobzetian <bobzetian@google.com> * epp: unify skip routing and enforce RawPayload globally EPP previously had inconsistent behavior when request parsing was skipped (e.g., for unsupported gRPC paths in the vllmgrpc parser). The vllmgrpc parser returned a nil body on skip, which caused nil-pointer panics in the Scheduling Director when it attempted model rewrite or repackaging. This change unifies the skip story by enforcing a non-nil body with RawPayload globally at the EPP framework level (in server.go) whenever a parser signals Skip. This ensures the Director can safely execute smart routing, admission control, and subsetting for skipped requests without panics, while still bypassing the expensive response-phase interception. The RequestSkipped state comments and logging have also been improved to accurately reflect that skipped requests are smartly routed by the director rather than falling back to random endpoints. TEST=make test-unit Signed-off-by: bobzetian <bobzetian@google.com> * epp: simplify vertexai parser to return nil body on skip Following the global Skip payload-force implementation in server.go, individual parsers no longer need to create and return a dummy body with RawPayload on skip. This change simplifies the vertexai parser to return nil body on skip, aligning it with the vllmgrpc parser and keeping parser implementations boilerplate-free. The vertexai unit test has also been updated. TEST=make test-unit Signed-off-by: bobzetian <bobzetian@google.com> * refactor: rename Skip to SkipResponseProcessing and RequestSkipped to RequestResponseProcessingSkipped To improve readability, code clarity, and explicitly communicate the architectural intent of EPP skip behavior, this change renames the Skip field in the Parser ParseResult struct to SkipResponseProcessing. Additionally, the corresponding internal state constant in server.go is renamed from RequestSkipped to RequestResponseProcessingSkipped. All parser implementations (openai, passthrough, vertexai, vllmgrpc) and their corresponding unit tests, along with the main server unit tests, have been updated to reflect this complete renaming. TEST=make test-unit Signed-off-by: bobzetian <bobzetian@google.com> * enforce populate Body.Payload when SkipResponseProcessing is true Signed-off-by: bobzetian <bobzetian@google.com> --------- Signed-off-by: bobzetian <bobzetian@google.com>
…e-pool-namespace flags (llm-d#1416) The --inference-pool flag (introduced alongside the deprecation) accepts "namespace/name" or just "name" and is the only supported way to select the InferencePool. INFERENCE_POOL_NAMESPACE and INFERENCE_POOL_NAME environment variables are removed for the same reason; INFERENCE_POOL remains. The SSRF-protection error message and the env-var test are updated to reference only the new flag and env var. Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Corrected the GitHub username for the precise prefix cache. Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
* Add support to the EC Nixl connector. Signed-off-by: Revital Sur <eres@il.ibm.com> * Cleanup code. Signed-off-by: Revital Sur <eres@il.ibm.com> * Update pkg/sidecar/proxy/connector_epd_ec_common.go Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Revital Sur <eres@il.ibm.com> * Update pkg/sidecar/proxy/connector_epd_ec_common.go Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Revital Sur <eres@il.ibm.com> * Update pkg/sidecar/proxy/connector_epd_ec_common.go Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Revital Sur <eres@il.ibm.com> * Update pkg/sidecar/proxy/connector_epd_ec.go Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * Update pkg/sidecar/proxy/connector_epd_shared_storage.go Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Revital Sur <eres@il.ibm.com> * Fix lint error. Signed-off-by: Revital Sur <eres@il.ibm.com> * Replace V(4) calls in connector_epd_shared_storage.go to V(logging.DEBUG) Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
…:latest (llm-d#1457) * Add script and Makefile target to check simulator image does not use :latest Add scripts/check-latest-tags.sh which scans YAML files for llm-d-inference-sim:latest references and exits non-zero if any are found. Wire it into the presubmit gate via a new check-latest-tags Makefile target. Closes llm-d#1306 Signed-off-by: Aswin Raj <aswinraj7e@gmail.com> Signed-off-by: apollofps <aswinraj7e@gmail.com> * Expand latest-tag check to all images and add --warn mode Expand the scope from simulator-only to all container image references per reviewer feedback. Add a --warn flag that prints violations but exits 0; the presubmit target uses warn mode until existing :latest references in GPU manifests are pinned. Signed-off-by: Aswin Raj <aswinraj7e@gmail.com> Signed-off-by: apollofps <aswinraj7e@gmail.com> * Address Copilot review: harden grep portability and prune .git - Use grep -Hn instead of -rn so filenames are always present in output even when xargs passes a single file. - Prune .git, vendor, and node_modules from the find tree. - Match "[[:space:]]image:" as a YAML key instead of bare substring to avoid false positives from unrelated fields. Signed-off-by: Aswin Raj <aswinraj7e@gmail.com> Signed-off-by: apollofps <aswinraj7e@gmail.com> --------- Signed-off-by: Aswin Raj <aswinraj7e@gmail.com> Signed-off-by: apollofps <aswinraj7e@gmail.com>
* disagg: add test coverage for encode edge cases Signed-off-by: namgyu-youn <namgyu.dev@gmail.com> * test(disagg): replace unicode arrow with ASCII dash in test name Swap the non-ASCII → character for a plain - in the EPD test case name to stay consistent with the ASCII-only style used elsewhere. Signed-off-by: namgyu-youn <namgyu.dev@gmail.com> --------- Signed-off-by: namgyu-youn <namgyu.dev@gmail.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* clean up references to inference-extension Signed-off-by: ahg-g <ahg@google.com> * Updated some variables to use epp prefix instead of router Signed-off-by: ahg-g <ahg@google.com> * fixed the helm verify error Signed-off-by: ahg-g <ahg@google.com> --------- Signed-off-by: ahg-g <ahg@google.com>
…timator for chatCompletion and messages (llm-d#1554) * add tool estimation Signed-off-by: bobzetian <bobzetian@google.com> * move tools before system Signed-off-by: bobzetian <bobzetian@google.com> --------- Signed-off-by: bobzetian <bobzetian@google.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
* Add EPP model server protocol to docs Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com> * Rename model-server-protocol to plugin-metric-protocol, adjust to focus on plugins adding a producer and consumer section to table Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com> * Break metrics out of table, individual sections for each metric Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com> * Remove doc link to fix lint Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com> --------- Signed-off-by: BenjaminBraunDev <benjaminbraun@google.com>
* Add plugin state debug endpoint
Expose a metrics-server debug handler that dumps sanitized state from plugins that opt in via a new StateDumper interface. Add a concrete in-flight load state dumper and focused unit coverage for handler behavior and deterministic output.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
* Address Copilot review feedback
- Use copyright year 2025 in new pkg/epp/server files to match
surrounding files.
- Switch debug payload to {timestamp, plugins{name: {type, state}}}
to match the shape requested in issue llm-d#1074; inject a clock
function so tests stay deterministic.
- Gate /debug/plugins/state behind a new EnablePluginStateDebug option
(defaults to true, mirroring EnablePprof) and skip registration with
a log message when the plugin handle is unavailable instead of
failing setup.
- Document that InFlightLoadProducer.DumpState snapshots its request
and token trackers under separate locks and is therefore not
internally atomic.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
* Make plugin state debug endpoint opt-in
Default EnablePluginStateDebug to false so upgrades do not silently expose /debug/plugins/state on the metrics/admin server. Keep the explicit --enable-plugin-state-debug flag for operators who want the endpoint and cover both default and flag behavior in options tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
* Fix module imports after rebase
Update the new plugin state debug server files to use the current github.com/llm-d/llm-d-router module path after rebasing onto the latest main branch.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
* Address plugin state debug review comments
Use explicit JSON state dumps so StateDumper implementations own serialization, document when state dumps should be used instead of metrics, bound the in-flight load debug payload to the busiest endpoints, and register the endpoint through a generic metrics handler registrar with localhost-only access instead of a separate enable flag.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
* Document plugin state debug endpoint
Document the plugin state debug endpoint and include non-dumper plugins in the response with an explanatory message so operators can distinguish unsupported state collection from missing plugin configuration.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
* Use metrics server access controls for plugin state debug
Rely on the metrics/admin server exposure and authentication controls for /debug/plugins/state instead of adding a handler-level localhost check, matching the other handlers registered on the same server.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
* Fix unused import after rebase
Remove a stale reflect import left from the rebase conflict resolution in the in-flight load producer.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
* Keep plugin debug state responses partial
Report per-plugin dump errors in each plugin entry instead of failing the entire endpoint, and document that the endpoint is not exposed by the standalone file-discovery metrics mux.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
* Fix plugin debug state lint
Remove the unused error return from collectPluginState now that per-plugin dump errors are reported in each plugin entry.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
---------
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lm-d#1193) - similar to how sglang is doing but with different json payload - new --kv-connector=mooncake or --mooncake-bootstrap-port flag on main - http /query to prefill pod to get remote_engine_id from rank 0 (or keep next one) per each request - use default 8998 on prefill pod to concat remote_bootstrap_addr - generate uuid for transfer_id - concurrent request to prefill and decode - add docs for how we use different connector - add new connector_mooncake_test.go with tests for P/D requests - add mooncake to options_test.go connector validation - change to store engine_id into LRU only need first request to do query - update: add support for multi-rank - set request header X-data-prallel-rank with rank_id - wont have mooncake in common connector tests - update: bump llm-d-inference-sim version to include new endpoint for query - update: force set mooncake port to 8000 in test to work with llm-d-sim Signed-off-by: Wen Zhou <wenzhou@redhat.com>
…e producer (llm-d#1576) The multimodal encoder-cache producer defined its own llmdSubsystem constant set to "llm_d_router_epp", duplicating the centralized LLMDRouterEndpointPickerSubsystem in pkg/epp/metrics. Same value today; imports the constant to remove the drift risk and match the pattern in approximateprefix and predictedlatency producers. Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* fix race condition in logger test Signed-off-by: Nicole Xin <nxin@google.com> * fix typo Signed-off-by: Nicole Xin <nxin@google.com> --------- Signed-off-by: Nicole Xin <nxin@google.com>
Full-path stress benchmark exercises the real producer->detector->controller pipeline under concurrent load across multiple priority bands with leak detection via counter assertions and memprofile support. Follows the production sequence: EnqueueAndWait (admission) -> PreRequest (tracking) -> ResponseBody (release). Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
Signed-off-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com> Co-authored-by: llm-d-router-release-notes[bot] <287676111+llm-d-router-release-notes[bot]@users.noreply.github.com>
Signed-off-by: Tessa Pham <tepham@redhat.com>
* add FirstTokenTimestamp to RequestContext Signed-off-by: Tessa Pham <tepham@redhat.com> * TTFT and TPOT histogram placeholders Signed-off-by: Tessa Pham <tepham@redhat.com> * define and register new TTFT and TPOT metrics Signed-off-by: Tessa Pham <tepham@redhat.com> * calculate FirstTokenTimestamp and record metrics Signed-off-by: Tessa Pham <tepham@redhat.com> * add tests Signed-off-by: Tessa Pham <tepham@redhat.com> * format Signed-off-by: Tessa Pham <tepham@redhat.com> * fix metrics name conflicts Signed-off-by: Tessa Pham <tepham@redhat.com> * retrigger CI Signed-off-by: Tessa Pham <tepham@redhat.com> * condense TTFT buckets above 120s Signed-off-by: Tessa Pham <tepham@redhat.com> * condense TPOT buckets above 2s Signed-off-by: Tessa Pham <tepham@redhat.com> * clarify TTFT metric for non-streaming requests Signed-off-by: Tessa Pham <tepham@redhat.com> * add streaming to TPOT metric name and clarify desc Signed-off-by: Tessa Pham <tepham@redhat.com> * add fairness_id and priority labels Signed-off-by: Tessa Pham <tepham@redhat.com> * update tests to add new labels Signed-off-by: Tessa Pham <tepham@redhat.com> * read fairnessID and priority from reqContext Signed-off-by: Tessa Pham <tepham@redhat.com> * add streaming label Signed-off-by: Tessa Pham <tepham@redhat.com> * add nil check for SchedulingRequest Signed-off-by: Tessa Pham <tepham@redhat.com> * retrigger CI Signed-off-by: Tessa Pham <tepham@redhat.com> * update metric names and descriptions Signed-off-by: Tessa Pham <tepham@redhat.com> * retrigger CI Signed-off-by: Tessa Pham <tepham@redhat.com> --------- Signed-off-by: Tessa Pham <tepham@redhat.com>
* docs: add README for requestcontrol/dataproducer directory Summarises the seven data producer plugins, their produced attributes, lifecycle hooks, and inter-plugin dependency ordering. Signed-off-by: Rahul Gurnani <rahulgurnani@google.com> * Add dataproducer diagram Signed-off-by: Rahul Gurnani <rahulgurnani@google.com> --------- Signed-off-by: Rahul Gurnani <rahulgurnani@google.com>
Signed-off-by: weizhoublue <weizhou.lan@daocloud.io>
…m-d#1354) * flowcontrol: move priority band provisioning off request hot path Signed-off-by: Guangya Liu <gyliu513@gmail.com> * Address comments from Luke and Shmuel Signed-off-by: Guangya Liu <gyliu513@gmail.com> * Address comments from Luke Signed-off-by: Guangya Liu <gyliu513@gmail.com> --------- Signed-off-by: Guangya Liu <gyliu513@gmail.com>
llm-d#1585) * fix: strip query parameters from request path before parser resolution The HTTP/2 :path pseudo-header includes query parameters (e.g. /v1/messages?beta=true), which caused parser suffix matching and per-parser endpoint validation to fail for clients that append query strings. Strip query parameters at header ingestion in HandleRequestHeaders so all downstream consumers see a clean path. Signed-off-by: greg pereira <gpereira@redhat.com> Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: greg pereira <grpereir@redhat.com> * fix: add comment explaining query parameter stripping from :path Signed-off-by: greg pereira <grpereir@redhat.com> * make implementation non mutating Signed-off-by: greg pereira <grpereir@redhat.com> --------- Signed-off-by: greg pereira <gpereira@redhat.com> Signed-off-by: greg pereira <grpereir@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com> # Conflicts: # OWNERS # go.mod
The midstream uses Prow OWNERS files for ownership; the upstream CODEOWNERS references upstream-only teams and would auto-request reviews that do not apply downstream. Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Kyle Lape <klape@redhat.com>
|
Important Review skippedToo many files! This PR contains 299 files, which is 149 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (299)
You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Based on changes in 05c59e4, where the `pkg/telemetry` dir was removed. Signed-off-by: Kyle Lape <klape@redhat.com>
Syncs llm-d/llm-d-router main into opendatahub-io/llm-d-router main.
Upstream commit: llm-d@2136215
Summary by CodeRabbit
New Features
Documentation
Chores