Skip to content

Backport upstream v0.7.1#175

Merged
openshift-merge-bot[bot] merged 75 commits into
opendatahub-io:mainfrom
Gregory-Pereira:backport-upstream-v0.7.1
Apr 3, 2026
Merged

Backport upstream v0.7.1#175
openshift-merge-bot[bot] merged 75 commits into
opendatahub-io:mainfrom
Gregory-Pereira:backport-upstream-v0.7.1

Conversation

@Gregory-Pereira

@Gregory-Pereira Gregory-Pereira commented Apr 3, 2026

Copy link
Copy Markdown
Member

cc @anishasthana @zdtsw

Summary by CodeRabbit

  • New Features

    • Added support for disaggregated Encode/Prefill/Decode (E/P/D) inference topologies with new scheduling handlers and deciders.
    • Introduced external tokenizer via Unix Domain Socket (UDS) for distributed token processing.
    • Added Prometheus monitoring integration with ServiceMonitor/PodMonitor resources.
  • Improvements

    • Refactored build infrastructure to containerized tooling with builder image.
    • Enhanced configuration management with centralized options and validation.
    • Expanded test coverage with disruption and benchmarking scenarios.

yangligt2 and others added 30 commits February 19, 2026 19:28
* Fix panic in SGLang proxy handling of concurrent requests

Signed-off-by: YANG LI <yangligt@google.com>

* Add concurrency unit test for SGLang context logic

Signed-off-by: YANG LI <yangligt@google.com>

---------

Signed-off-by: YANG LI <yangligt@google.com>
* Add opentelemetry tracing

    Add centralized telemetry package and custom spans
    following the llm-d distributed tracing proposal.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* update Dockerfile.sidecar

Signed-off-by: sallyom <somalley@redhat.com>

* tracing: remove extra success results & startup spans and cleanup

Signed-off-by: sallyom <somalley@redhat.com>

* fix: avoid os.Exit bypassing defer in main

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* fix: address review nits for tracing PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* test: add edge case tests for StripScheme

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* remove extra comments from sidecar spans

Signed-off-by: sallyom <somalley@redhat.com>

* fix lint error

Signed-off-by: sallyom <somalley@redhat.com>

* protect against segfault on tests

Signed-off-by: greg pereira <grpereir@redhat.com>

---------

Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
…m-d#661)

Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.40.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.40.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (llm-d#662)

Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go).


Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.64.0 to 0.65.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.64.0...zpages/v0.65.0)

Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.39.0 to 1.40.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
  dependency-version: 0.65.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-version: 1.40.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
…build (llm-d#664)

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Bumps the kubernetes group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [k8s.io/api](https://github.com/kubernetes/api) | `0.34.4` | `0.34.5` |
| [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.4` | `0.34.5` |
| [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.4` | `0.34.5` |
| [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.4` | `0.34.5` |
| [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.4` | `0.34.5` |


Updates `k8s.io/api` from 0.34.4 to 0.34.5
- [Commits](kubernetes/api@v0.34.4...v0.34.5)

Updates `k8s.io/apiextensions-apiserver` from 0.34.4 to 0.34.5
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases)
- [Commits](kubernetes/apiextensions-apiserver@v0.34.4...v0.34.5)

Updates `k8s.io/apimachinery` from 0.34.4 to 0.34.5
- [Commits](kubernetes/apimachinery@v0.34.4...v0.34.5)

Updates `k8s.io/client-go` from 0.34.4 to 0.34.5
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.34.4...v0.34.5)

Updates `k8s.io/component-base` from 0.34.4 to 0.34.5
- [Commits](kubernetes/component-base@v0.34.4...v0.34.5)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apiextensions-apiserver
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apimachinery
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/client-go
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/component-base
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (llm-d#674)

Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go).


Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.65.0 to 0.66.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.65.0...zpages/v0.66.0)

Updates `go.opentelemetry.io/otel` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

Updates `go.opentelemetry.io/otel/sdk` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

Updates `go.opentelemetry.io/otel/trace` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
  dependency-version: 0.66.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/trace
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…m-d#671)

Bumps [lycheeverse/lychee-action](https://github.com/lycheeverse/lychee-action) from 2.7.0 to 2.8.0.
- [Release notes](https://github.com/lycheeverse/lychee-action/releases)
- [Commits](lycheeverse/lychee-action@v2.7.0...v2.8.0)

---
updated-dependencies:
- dependency-name: lycheeverse/lychee-action
  dependency-version: 2.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ci: add dev image workflow for main and release branches

Build and push -dev variants of EPP and sidecar container images
on pushes to main and release-* branches, tagged with commit SHA.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* ci: extract reusable build workflow and tag dev images by branch

Refactor ci-release and ci-dev to call a shared ci-build-images
reusable workflow, reducing duplication. Tag dev images with the
branch name instead of commit SHA so each branch has exactly one
image that gets overwritten on push, avoiding image accumulation.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Newlines at EOF

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.43.5 to 1.44.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.43.5...v1.44.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.44.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The v0.44.1 release was removed from GitHub, causing 404 errors
in the trivy-scan action. Update to the latest available version.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
* Allow sidecar server to reload TLS certificates

Enables TLS certificates to be rotated without restarting
sidecar and vLLM deployments.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Pass certPath to reloader

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Improvements

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Bumps the go-dependencies group with 7 updates:

| Package | From | To |
| --- | --- | --- |
| [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) | `0.66.0` | `0.67.0` |
| [go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` |
| [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` |
| [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` |
| [go.opentelemetry.io/otel/trace](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` |
| [golang.org/x/sync](https://github.com/golang/sync) | `0.19.0` | `0.20.0` |
| [google.golang.org/grpc](https://github.com/grpc/grpc-go) | `1.79.1` | `1.79.2` |


Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.66.0 to 0.67.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.66.0...zpages/v0.67.0)

Updates `go.opentelemetry.io/otel` from 1.41.0 to 1.42.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0)

Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.41.0 to 1.42.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0)

Updates `go.opentelemetry.io/otel/sdk` from 1.41.0 to 1.42.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0)

Updates `go.opentelemetry.io/otel/trace` from 1.41.0 to 1.42.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0)

Updates `golang.org/x/sync` from 0.19.0 to 0.20.0
- [Commits](golang/sync@v0.19.0...v0.20.0)

Updates `google.golang.org/grpc` from 1.79.1 to 1.79.2
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.79.1...v1.79.2)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
  dependency-version: 0.67.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/trace
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: golang.org/x/sync
  dependency-version: 0.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…gs (llm-d#682)

* feat(sidecar): simplify TLS command line options with StringSlice flags

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* address comments from Etai

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* keep deprecated flags

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

---------

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* fix: simplify InferencePool flag to namespace/name format

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* Address comments from Etai

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

---------

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* initial E/PD extension of the sidecar

Signed-off-by: roytman <roytman@il.ibm.com>

* add Encoding span.SetAttributes, fix bug in connector_sglang_test.go

Signed-off-by: roytman <roytman@il.ibm.com>

* fix e2e tests

Signed-off-by: roytman <roytman@il.ibm.com>

* Update pkg/sidecar/proxy/connector_epd_shared_storage.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Alexey Roytman <roytman@il.ibm.com>

* flags order

Signed-off-by: roytman <roytman@il.ibm.com>

* fix comments

Signed-off-by: roytman <roytman@il.ibm.com>

* remove redundant comment

Signed-off-by: roytman <roytman@il.ibm.com>

---------

Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Alexey Roytman <roytman@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
roytman and others added 17 commits March 26, 2026 18:29
* Prevent mismatch between new and deprecated APIs

Signed-off-by: roytman <roytman@il.ibm.com>

* simplify DisaggProfileHandlerFactory

Signed-off-by: roytman <roytman@il.ibm.com>

---------

Signed-off-by: roytman <roytman@il.ibm.com>
…lm-d#770)

* - import IGW 1.4.0
- turn tokenizer PrepareData plugin into a scorer and wrap the latter in no-build flag

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* address review

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
* Containerize build system with builder image

Introduce Dockerfile.builder based on the same Go image as production,
with all build tools (golangci-lint, typos, kind, kubectl, podman)
at pinned versions.

All build, test, lint, and format targets now run inside the builder
container. Go module and build caches use named volumes for persistence
across runs. Host requirements are reduced to a container runtime and
git. kubectl and envsubst are only needed for cluster deployment targets.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* CI: remove host Go/lint setup, use containerized targets

All CI steps now delegate to make targets that run inside the builder
container. Remove Go setup, go mod tidy, and golangci-lint-action.
Expand paths filter to cover Dockerfile.builder, go.sum, and
.golangci.yml.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Fix issues detected by linter

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add builder shell targets for debugging

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* e2e-tests: load images into kind directly to save space

Signed-off-by: Antonio Cardace <acardace@redhat.com>

---------

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: roytman <roytman@il.ibm.com>
* implement context-length-aware plugin (scorer/filter)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* fix newline

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* apply review suggestion

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* code improvements

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Update pkg/plugins/multi/context_length_aware_bench_test.go

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>

* refactor UX

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add tokenization and fix tests

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* use tokenizer plugin

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* fix lint

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* implement proximity for out-of-range requests

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* addressed comments

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* remove support for multiple ranges per pod

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* rebase on tokenizer PrepareData -> Scorer change

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* addressed comments

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* change simulator version to v0.8.1

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update simulator version

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* make e2e test timeout 20min

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
…gger() to inject it via zap.Encoder() (llm-d#778)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* Add E2E_KEEP_CLUSTER_ON_FAILURE env variable

Signed-off-by: roytman <roytman@il.ibm.com>

* Update Development guide

Signed-off-by: roytman <roytman@il.ibm.com>

* fix comments

Signed-off-by: roytman <roytman@il.ibm.com>

* Update test/e2e/e2e_suite_test.go

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Alexey Roytman <roytman@il.ibm.com>

---------

Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Alexey Roytman <roytman@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
…nizer chat rendering (llm-d#781)

* fix chat-completions -> RenderChatRequest building

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add tests

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Centralize MM conversion helper, add unit tests

Address review feedback:
- Export ChatCompletionsToRenderChatRequest in preparedata package
- Remove duplicate conversion in scorer, call shared helper
- Add table-driven tests covering single image, multiple images,
  system+MM user, multi-turn with image, and text-only (no regression)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Move MM conversion test to preparedata package

The test exercises ChatCompletionsToRenderChatRequest which lives in
preparedata — keep it co-located with the function it tests.

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Add test verifying structured MM content reaches RenderChat

TestTokenizerScorer_RenderChat_ForwardsStructuredContent captures the
RenderChatRequest passed to the mock tokenizer and asserts that
image_url content blocks from Content.Structured are forwarded
correctly through the Score -> tokenize -> ChatCompletionsToRenderChatRequest
path. Also verifies MMFeatures propagate to CycleState.

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
# Conflicts:
#	.github/actions/trivy-scan/action.yml
#	.github/workflows/ci-build-images.yaml
#	.github/workflows/ci-dev.yaml
#	DEVELOPMENT.md
#	Makefile
#	OWNERS
#	cmd/pd-sidecar/main.go
#	go.mod
#	go.sum
#	pkg/plugins/pre-request/pd_prerequest.go
#	pkg/plugins/profile/pd_profile_handler.go
#	pkg/plugins/scorer/precise_prefix_cache.go
#	pkg/sidecar/proxy/chat_completions.go
#	pkg/sidecar/proxy/connector_nixlv2.go
#	pkg/sidecar/proxy/connector_nixlv2_test.go
#	pkg/sidecar/proxy/connector_sglang.go
#	pkg/sidecar/proxy/connector_sglang_test.go
#	pkg/sidecar/proxy/connector_test.go
#	pkg/sidecar/proxy/proxy.go
#	pkg/sidecar/proxy/proxy_test.go
@github-actions

github-actions Bot commented Apr 3, 2026

Copy link
Copy Markdown

Cannot approve the pull request: Error: openshift-ci[bot] is not included in the approvers role in the OWNERS file

@coderabbitai

coderabbitai Bot commented Apr 3, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This pull request refactors the inference scheduling system to support Encode/Prefill/Decode (E/P/D) disaggregation with Unix Domain Socket–based tokenization. It containerizes the build process via a new builder image, updates GitHub Actions workflows, introduces a unified disaggregation profile handler and multiple new scheduling plugins (tokenizer, context-length-aware, multimodal-decider), refactors the sidecar proxy to support encoder and EPD protocol routing, and updates Kubernetes manifests and configurations to integrate tokenization services. Dependencies are upgraded and the codebase moves from host-based tool execution to containerized execution within a builder image.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes


Security & Design Issues

CWE-434 (Unrestricted Upload of File with Dangerous Type) / SSRF concerns:

  • pkg/sidecar/proxy/chat_completions.go: SSRF allowlist validation added for encoder targets via common.EncoderEndpointsHeader, but the parsing logic blindly splits comma-delimited values without validation of the format. Ensure encoder endpoint URLs are strictly validated against the allowlist before use in runEPDConnectorProtocol. The header injection in disagg_headers_handler.go constructs EncoderEndpointsHeader by concatenating user-controlled endpoints—verify these are allowlisted before propagation.

CWE-613 (Insufficient Session Expiration):

  • pkg/sidecar/proxy/connector_epd_shared_storage.go: The speculative encoder indexing via TTL cache in PrecisePrefixCacheScorer relies on SpeculativeTTL parsing and eviction callbacks. Confirm TTL is validated as a positive duration and cache eviction is guaranteed to fire; otherwise, speculative entries may persist indefinitely, causing memory leaks or stale cache state.

CWE-476 (Null Pointer Dereference):

  • pkg/plugins/preparedata/tokenizer.go / tokenizer_preparedata.go: PrepareRequestData ignores errors from p.tokenize(ctx, request) and proceeds with nil tokenization. If tokenization fails (network error, service down), the plugin silently skips tokenization without logging—clients may receive incorrect scheduling decisions based on missing token counts. Log tokenization errors with context.

CWE-640 (Weak Password Recovery Mechanism) / Configuration Management:

  • cmd/pd-sidecar/main.go: Migration from manual flag to proxy.Options / pflag centralizes config handling, but the options struct now holds deprecated fields (deprecatedConnector, deprecatedTLS*) alongside new ones. Ensure deprecation paths are enforced and old fields cannot override new ones in Complete(). Currently Complete() migrates deprecated fields into new ones, but no error is raised if both old and new are simultaneously provided—add validation.

CWE-200 (Exposure of Sensitive Information):

  • pkg/sidecar/proxy/options.go: Config.String() and Config.MarshalJSON() may expose sensitive data (decoder URLs, connector names). Verify these methods do not leak secrets if used in logs or error messages. Review call sites in proxy_helpers.go and error handling.

Build/Supply Chain:

  • Dockerfile.builder: Installs tools (Docker CLI, buildx, golangci-lint, kubectl, kustomize, kind, typos) from external sources via wget. Verify checksums/signatures are validated (not visible in diff). Missing checksum validation introduces supply-chain risk (CWE-347: Improper Verification of Cryptographic Signature).

Logical Error in Filter Registration:

  • pkg/plugins/register.go: Registers EncodeRoleType via EncodeRoleFactory, but also registers DecodeRoleType and PrefillRoleType. Ensure the filter logic correctly distinguishes encode-stage filtering vs. prefill/decode. Review pkg/plugins/filter/roles.go constructors to confirm role label matching is stage-specific.

Architectural & Logic Issues

Plugin State Consistency:

  • pkg/plugins/scorer/precise_prefix_cache.go: PrepareRequestData populates PluginState with precomputed scores, and Score reuses them. However, if PrepareRequestData is not invoked (e.g., when the prepare-data extension is disabled via build tag), Score falls back to getScores(...), creating two code paths. Ensure both paths produce consistent results; test both with and without PrepareRequestData.

Disaggregation Decider Logic:

  • pkg/plugins/profile/disagg_profile_handler.go: The Pick method marks skipped profiles by setting profileResults[stageName] = nil, then in subsequent Pick calls checks for nil to skip re-evaluation. This state mutation is fragile if Pick is called multiple times without ProcessResults being invoked in between. Document the expected call sequence and add assertions.

Encoder Endpoint Deduplication:

  • pkg/sidecar/proxy/connector_epd_shared_storage.go: fanoutEncoderPrimer deduplicates by URL for image_url/audio_url but not for input_audio (inline content). This asymmetry may result in redundant encoder calls. Clarify intent: should inline items be deduplicated by content hash, or is re-encoding acceptable?

Context Length Estimation Fallback:

  • pkg/plugins/multi/context_length_aware.go: When tokenized prompt is unavailable, falls back to character-based estimation using AverageCharactersPerToken = 4. This constant is duplicated in pkg/plugins/profile/prefix_based_pd_decider.go. Define once in a shared constants file to prevent inconsistency; otherwise, different components may make divergent estimates.

Missing Error Propagation:

  • pkg/sidecar/proxy/chat_completions.go: When encoder fan-out fails in runEPDConnectorProtocol, the function returns early with a 4xx/5xx response, but the prefill/decode flow is not executed. If a prefill+decoder fallback is desired, either implement it or document that EPD failures are terminal.

@Gregory-Pereira

Copy link
Copy Markdown
Member Author

OK this passes except for the end to ends fail to spin up. This is because I have my docker socket symlinked to podman for some ungodly reason. Ill figure that out later but for now I think its ok:
is-0.7.1-make-test-output.log

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (15)
.github/workflows/ci-dev.yaml (2)

10-39: ⚠️ Potential issue | 🟠 Major

Define explicit least-privilege permissions per job (CWE-269).

This workflow does not set job-level permissions, so token scope falls back to repo/org defaults and can be broader than required for image publishing. Add explicit minimal scopes per job.

Suggested hardening patch
 jobs:
   set-params:
+    permissions:
+      contents: read
     runs-on: ubuntu-latest
@@
   build-and-push:
+    permissions:
+      contents: read
+      packages: write
     needs: set-params
     uses: ./.github/workflows/ci-build-images.yaml

As per coding guidelines, "Set least-privilege permissions per job, not workflow level".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-dev.yaml around lines 10 - 39, The jobs "set-params"
and "build-and-push" currently lack job-level permissions; add explicit minimal
GitHub Actions permissions for each job (not workflow-level) to enforce least
privilege: for "set-params" (steps id: version, tag) grant only what is required
(e.g., contents: read if reading repo info), and for "build-and-push" (the job
that uses ./.github/workflows/ci-build-images.yaml and passes GHCR_TOKEN) grant
only the required package/registry and repo scopes (e.g., packages: write and
contents: read) — update the job blocks for set-params and build-and-push to
include a permissions map with the least-privilege scopes.

27-28: ⚠️ Potential issue | 🟠 Major

Do not write event-derived data directly in run output commands (CWE-94/CWE-77).

GITHUB_REF_NAME is event-derived and is being interpolated directly into a shell command that writes to $GITHUB_OUTPUT. Sanitize/validate before emitting, or avoid shell interpolation entirely.

Suggested hardening patch
       - name: Set branch name as tag
         id: tag
         run: |
-          echo "tag=${GITHUB_REF_NAME}" >> "$GITHUB_OUTPUT"
+          safe_tag="$(printf '%s' "$GITHUB_REF_NAME" | tr -c '[:alnum:]._-' '-')"
+          printf 'tag=%s\n' "$safe_tag" >> "$GITHUB_OUTPUT"

As per coding guidelines, "Never interpolate event data directly in run: blocks (script injection CWE-94)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-dev.yaml around lines 27 - 28, The run step writes
event-derived GITHUB_REF_NAME directly into $GITHUB_OUTPUT, risking script
injection; update the run block that writes "tag=${GITHUB_REF_NAME}" so it
validates/sanitizes GITHUB_REF_NAME before emitting (for example enforce a
strict allowlist/regex like only alphanumerics, dots, dashes and underscores)
and only echo to $GITHUB_OUTPUT when the value passes validation, or derive the
value using a safe workflow expression and assign to an output via the step
outputs mechanism instead; reference the run step, the GITHUB_REF_NAME variable
and $GITHUB_OUTPUT when making this change.
pkg/plugins/datalayer/models/factories.go (2)

60-62: ⚠️ Potential issue | 🔴 Critical

CWE-295: Default InsecureSkipVerify: true enables MITM attacks.

Default should be false. Operators who explicitly need to skip verification can override via configuration.

Proposed fix
 func defaultDataSourceConfigParams() *modelsDatasourceParams {
-	return &modelsDatasourceParams{Scheme: "http", Path: "/v1/models", InsecureSkipVerify: true}
+	return &modelsDatasourceParams{Scheme: "http", Path: "/v1/models", InsecureSkipVerify: false}
 }

As per coding guidelines: "No InsecureSkipVerify in TLS configs (enables MITM attacks)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/datalayer/models/factories.go` around lines 60 - 62, The
defaultDataSourceConfigParams function currently sets
modelsDatasourceParams.InsecureSkipVerify to true, which enables MITM risks;
change the default to false in defaultDataSourceConfigParams so TLS verification
is enabled by default and ensure any existing configuration path that allows
operators to explicitly override modelsDatasourceParams.InsecureSkipVerify
continues to accept true when explicitly configured.

64-72: ⚠️ Potential issue | 🟠 Major

CWE-770: Unbounded io.ReadAll allows memory exhaustion.

A malicious or misconfigured model server returning an oversized response body can OOM the scheduler. Wrap with io.LimitReader.

Proposed fix
+const maxResponseBodySize = 10 * 1024 * 1024 // 10 MiB
+
 func parseModels(data io.Reader) (any, error) {
-	body, err := io.ReadAll(data)
+	body, err := io.ReadAll(io.LimitReader(data, maxResponseBodySize))
 	if err != nil {
 		return nil, fmt.Errorf("failed to read response body: %v", err)
 	}

As per coding guidelines: "Use io.LimitReader for HTTP response bodies (prevent memory exhaustion)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/datalayer/models/factories.go` around lines 64 - 72, The
parseModels function uses io.ReadAll on the provided io.Reader which can OOM;
wrap the incoming data reader with io.LimitReader to cap the maximum bytes
(e.g., define a reasonable const like maxModelResponseBytes) before reading,
then read and unmarshal into ModelResponse as before and return a clear error if
the read exceeds or fails the limit; update parseModels to use
io.LimitReader(data, maxModelResponseBytes) and keep the unmarshalling into
&modelsResponse.
.github/workflows/auto-assign.yaml (3)

9-11: ⚠️ Potential issue | 🟡 Minor

Move permissions to job level.

As per coding guidelines, set least-privilege permissions per job, not workflow level.

📋 Proposed fix
-permissions:
-  contents: read
-  pull-requests: write
-
 jobs:
   auto-assign:
     runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
     steps:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/auto-assign.yaml around lines 9 - 11, The workflow
currently defines top-level permissions (permissions: contents: read,
pull-requests: write) which grants broad privileges to all jobs; move these
permission entries into each job that requires them so least-privilege is
enforced. Edit the workflow to remove or clear the top-level permissions block
and add a permissions map under the specific job definitions (e.g.,
jobs.<job_name>.permissions) specifying only the needed scopes like contents:
read or pull-requests: write for that job; ensure any job that needs no special
permissions omits the permissions block or sets minimal rights.

3-8: ⚠️ Potential issue | 🟠 Major

Use only pull_request_target, remove pull_request trigger.

Both triggers on the same event create duplicate runs with different security contexts:

  • pull_request: reads OWNERS from PR head (untrusted), allowing malicious PR to modify reviewer assignments
  • pull_request_target: reads OWNERS from base (trusted)

Keep only pull_request_target to ensure reviewer logic always reads the trusted OWNERS file. As per coding guidelines, avoid mixing secure and insecure checkout contexts.

🔒 Proposed fix
 on:
-  pull_request:
-    types: [opened]
   pull_request_target:
     types: [opened]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/auto-assign.yaml around lines 3 - 8, Remove the insecure
duplicate pull_request trigger so only pull_request_target remains: delete the
"pull_request" key and its nested "types: [opened]" block and ensure the
workflow defines only "pull_request_target: { types: [opened] }" (refer to the
top-level triggers "pull_request" and "pull_request_target" in the YAML); this
ensures reviewer/OWNERS logic runs in the trusted base context and avoids
duplicate runs.

18-18: ⚠️ Potential issue | 🔴 Critical

Pin action by full commit SHA, not tag.

actions/checkout@v6 uses a mutable tag. An attacker who compromises the upstream repository can retag and inject malicious code (CWE-494: Download of Code Without Integrity Check). Pin all actions by full SHA to prevent supply chain attacks.

Diff
-        uses: actions/checkout@v6
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/auto-assign.yaml at line 18, The workflow uses a mutable
tag "actions/checkout@v6"; replace this with the action pinned to its full
commit SHA to prevent supply-chain tampering: locate the `uses:
actions/checkout@v6` entry in the workflow and update it to the corresponding
`actions/checkout@<full-commit-sha>` (obtain the exact SHA from the
actions/checkout GitHub repo releases or commit history for v6), then commit the
updated workflow ensuring no other mutable tags remain.
.github/workflows/md-link-check.yml (1)

16-20: ⚠️ Potential issue | 🟠 Major

Pin workflow actions to immutable commit SHAs.

Both actions use mutable tags instead of commit SHAs:

  • actions/checkout@v6 (line 17)
  • lycheeverse/lychee-action@v2.8.0 (line 20)

If either tag is moved or compromised upstream, CI executes attacker-controlled code with repository token scope. (CWE-829, CWE-494)

Replace tags with 40-character commit SHAs per GitHub Actions security best practices.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/md-link-check.yml around lines 16 - 20, Replace the
mutable action tags with their immutable 40-character commit SHAs: update the
uses entries for actions/checkout (currently "actions/checkout@v6") and
lycheeverse/lychee-action (currently "lycheeverse/lychee-action@v2.8.0") to the
corresponding commit SHAs (e.g., "actions/checkout@<40-char-sha>" and
"lycheeverse/lychee-action@<40-char-sha>"); fetch the exact SHA from each
action's GitHub repository release/commit history and substitute it so the
workflow pins to the specific immutable commits.
pkg/metrics/metrics_test.go (1)

10-27: ⚠️ Potential issue | 🟡 Minor

Missing Reset() in TestSchedulerPDDecisionCount can cause flaky results.

The existing test doesn't reset SchedulerPDDecisionCount before running, unlike the new tests. If test execution order varies, accumulated counter values from other tests could cause spurious failures.

Proposed fix
 func TestSchedulerPDDecisionCount(t *testing.T) {
+	SchedulerPDDecisionCount.Reset()
+
 	model := "test-model"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/metrics/metrics_test.go` around lines 10 - 27, The test
TestSchedulerPDDecisionCount can be flaky because SchedulerPDDecisionCount isn't
reset before use; add a reset at the start of the test (call
SchedulerPDDecisionCount.Reset()) so prior test runs don't leak counts, then
proceed to call RecordPDDecision(model, ...) and the existing CollectAndCompare
assertion remains unchanged.
.github/workflows/check-typos.yaml (1)

12-16: ⚠️ Potential issue | 🟠 Major

Pin workflow actions to immutable commit SHAs (CWE-494).

Both actions/checkout@v6 and crate-ci/typos@v1.44.0 use semantic version tags instead of full commit SHAs. Tag-pinning allows tag retagging and compromised releases to run untrusted code with access to the job token.

Remediation
       - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@<40-char-commit-sha>

       - name: Check typos
-        uses: crate-ci/typos@v1.44.0
+        uses: crate-ci/typos@<40-char-commit-sha>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/check-typos.yaml around lines 12 - 16, Replace the
semantic version tags with immutable commit SHAs for the referenced actions to
prevent tag-retagging risks: update the usage strings for actions/checkout@v6
and crate-ci/typos@v1.44.0 to their corresponding full commit SHAs (e.g.,
actions/checkout@<commit-sha> and crate-ci/typos@<commit-sha>), fetching the
exact commit SHAs from the respective repositories/tags and ensuring the
workflow references those SHAs instead of version tags.
test/e2e/yaml/vllm-sim-dp.yaml (1)

73-104: ⚠️ Potential issue | 🟠 Major

Missing resource limits and securityContext on vllm container.

Same issue as uds-tokenizer — the main vllm container lacks resource constraints and security hardening.

As per coding guidelines: "KUBERNETES CONFIG SECURITY: Resource limits defined (memory, CPU) - prevents DoS" and "SecurityContext configured (runAsNonRoot, readOnlyRootFilesystem)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/yaml/vllm-sim-dp.yaml` around lines 73 - 104, The vllm container
spec (container name "vllm") is missing resource limits/requests and a hardened
securityContext; update the vllm container to add CPU and memory requests and
limits (reasonable values for e2e tests), mirror the same resource settings used
for "uds-tokenizer", and add a securityContext with runAsNonRoot: true, an
explicit runAsUser (non-zero), readOnlyRootFilesystem: true while ensuring the
mounted path /tmp/tokenizer remains writable (use an emptyDir or set writable
mount), disallow privilege escalation and drop all capabilities, and set
seccompProfile/runtimeDefault—apply the same security and resource pattern used
for the uds-tokenizer container so the vllm container meets the Kubernetes
config security guidelines.
test/e2e/e2e_test.go (1)

723-747: ⚠️ Potential issue | 🟡 Minor

http.Get without timeout risks resource exhaustion.

http.Get uses http.DefaultClient which has no timeout. If the metrics endpoint hangs, this test helper will block indefinitely. Use a client with explicit timeout.

Proposed fix
 func getCounterMetric(metricsURL, metricName, labelMatch string) int {
-	resp, err := http.Get(metricsURL)
+	client := &http.Client{Timeout: 10 * time.Second}
+	resp, err := client.Get(metricsURL)
 	gomega.Expect(err).ShouldNot(gomega.HaveOccurred())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/e2e_test.go` around lines 723 - 747, The helper getCounterMetric
uses http.Get (http.DefaultClient) which has no timeout; update getCounterMetric
to create and use a dedicated http.Client with an explicit Timeout (e.g., 5s)
and call client.Get(metricsURL) (or build a request and client.Do(req)), keep
the existing error checks and deferred resp.Body.Close() behavior, and ensure
imports include time; this prevents the test from hanging if the metrics
endpoint becomes unresponsive.
pkg/plugins/profile/pd_profile_handler.go (1)

196-208: ⚠️ Potential issue | 🔴 Critical

Guard TargetEndpoints[0] before using it.

Lines 196 and 233 both assume the decode profile returned at least one endpoint. A successful-but-empty decode result will panic here instead of turning into a clean scheduling failure.

Suggested fix
 func (h *PdProfileHandler) Pick(ctx context.Context, _ *scheduling.CycleState, request *scheduling.LLMRequest, profiles map[string]scheduling.SchedulerProfile,
 	profileResults map[string]*scheduling.ProfileRunResult) map[string]scheduling.SchedulerProfile {
+	decodeResult := profileResults[h.decodeProfile]
+	if len(decodeResult.TargetEndpoints) == 0 {
+		return map[string]scheduling.SchedulerProfile{}
+	}
+
-	if h.decider != nil && h.decider.disaggregate(ctx, request, profileResults[h.decodeProfile].TargetEndpoints[0]) {
+	if h.decider != nil && h.decider.disaggregate(ctx, request, decodeResult.TargetEndpoints[0]) {
 		metrics.RecordPDDecision(request.TargetModel, metrics.DecisionTypePrefillDecode) //nolint:staticcheck // intentional: pd-profile-handler is itself deprecated
 		return map[string]scheduling.SchedulerProfile{
 			h.prefillProfile: profiles[h.prefillProfile],
 		}
 	}
 func (h *PdProfileHandler) ProcessResults(_ context.Context, _ *scheduling.CycleState, request *scheduling.LLMRequest,
 	profileResults map[string]*scheduling.ProfileRunResult) (*scheduling.SchedulingResult, error) {
 	decodeRunResults := profileResults[h.decodeProfile]
 	if decodeRunResults == nil { // if decode profile failed to run, we should fail
 		return nil, errors.New("failed to find available decode workers")
 	}
+	if len(decodeRunResults.TargetEndpoints) == 0 {
+		return nil, errors.New("decode profile returned no target endpoints")
+	}
+
 	updatedResults := map[string]*scheduling.ProfileRunResult{}
 
 	if h.primaryPort != "" {
 		targetEndpoint := decodeRunResults.TargetEndpoints[0].GetMetadata()

Also applies to: 233-243

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/profile/pd_profile_handler.go` around lines 196 - 208, The code
indexes profileResults[h.decodeProfile].TargetEndpoints[0] without checking the
slice length; update the checks around h.decider.disaggregate(...) and the later
use (around the block that currently assumes TargetEndpoints[0]) to first verify
len(profileResults[h.decodeProfile].TargetEndpoints) > 0 before indexing; if the
slice is empty, skip the prefill-disaggregate branch (so execution falls through
to the DecodeOnly path / metrics.RecordPDDecision(..., DecisionTypeDecodeOnly))
and avoid any direct indexing on TargetEndpoints elsewhere in
pd_profile_handler.go (references: h.decider, h.decider.disaggregate,
profileResults[h.decodeProfile].TargetEndpoints, h.prefillProfile, profiles).
pkg/sidecar/proxy/chat_completions.go (1)

70-78: ⚠️ Potential issue | 🟠 Major

Split every header element before sampling or allowlisting.

Header.Values() can legitimately return multiple entries where some still contain comma-joined hosts. With the current len(...) == 1 guard, a value like ["a:1,b:2", "c:3"] leaves a:1,b:2 unsplit, so sampling and allowlist checks run on the unparsed comma-separated string instead of individual endpoints. This breaks SSRF protection and may cause incorrect routing.

Affects lines 70–78 (prefill) and 126–129 (encoder) equally. Implement a helper that unconditionally splits all header values by commas and trims whitespace.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/chat_completions.go` around lines 70 - 78, The code only
splits header values when there is exactly one entry, which leaves comma-joined
hosts inside some header elements (e.g., ["a:1,b:2","c:3"]); create a helper
(e.g., splitAndTrimHeaderValues) that takes r.Header.Values(...) for
common.PrefillEndpointHeader (and the encoder header usage) and unconditionally
splits each element on ',' then trims whitespace and drops empty tokens, and
replace the current prefillHostPorts and encoder host extraction with calls to
this helper so sampling/allowlist always sees individual host:port strings.
pkg/sidecar/proxy/proxy.go (1)

338-353: ⚠️ Potential issue | 🟠 Major

Major: reject InsecureSkipVerify for backend TLS connections.

tls.Config.InsecureSkipVerify (CWE-295) disables certificate validation and exposes all backend hops to MITM attacks. An on-path attacker can impersonate prefiller, encoder, or decoder services and inject forged responses. Remove the configurable skip-verify option entirely; require proper certificate validation or use explicit ServerName pinning with a CA bundle instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/proxy.go` around lines 338 - 353, The current reverse proxy
setup in NewSingleHostReverseProxy assigns a Transport whose TLSClientConfig
uses the insecureSkipVerify flag to set InsecureSkipVerify, which disables
certificate validation; remove any use of InsecureSkipVerify in the
TLSClientConfig within the newProxy.Transport and instead enforce certificate
validation by not setting InsecureSkipVerify, and if you need to restrict
accepted backends use TLSClientConfig.RootCAs (load a CA bundle) and/or set
TLSClientConfig.ServerName to pin the expected hostname for the backend; update
the code around NewSingleHostReverseProxy/newProxy.Transport/TLSClientConfig to
load and use a CA pool or explicit ServerName rather than allowing
insecureSkipVerify.
🟡 Minor comments (13)
test/scripts/run_e2e.sh-23-23 (1)

23-23: ⚠️ Potential issue | 🟡 Minor

Quote ${DIR} in the go test path argument.

Line 23 expands ${DIR} unquoted, so paths containing spaces/globs can be split unexpectedly; harden this invocation per shell injection safeguards (CWE-78).

Suggested patch
-go test -v -timeout 20m ${DIR}/../e2e/ -ginkgo.v
+go test -v -timeout 20m "${DIR}/../e2e/" -ginkgo.v

As per coding guidelines, **/*.sh: "Quote all variables to prevent injection".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/run_e2e.sh` at line 23, The go test invocation uses an unquoted
variable ${DIR} which can break on spaces or globs; update the test command in
run_e2e.sh (the line invoking go test -v -timeout 20m ${DIR}/../e2e/ -ginkgo.v)
to quote the path argument so the directory expansion is treated as a single
token (e.g., use the quoted form of ${DIR}/../e2e/), ensuring shell-safe
handling of spaces and preventing word-splitting or glob expansion.
test/e2e/yaml/vllm-sim-pd.yaml-31-32 (1)

31-32: ⚠️ Potential issue | 🟡 Minor

HF_TOKEN exposed as plain-text env var.

The token is set via value: "${HF_TOKEN}" rather than a Secret reference. Acceptable for E2E test manifests, but ensure this pattern is not copied to production deployments where the token should be injected from a Kubernetes Secret.

Also applies to: 114-115

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/yaml/vllm-sim-pd.yaml` around lines 31 - 32, The manifest exposes
HF_TOKEN as a plain-text env var via value: "${HF_TOKEN}"—replace this with a
Kubernetes Secret reference by using valueFrom.secretKeyRef on the container env
entry for HF_TOKEN (set secret name and key to the secret that holds the token),
and update both occurrences (the HF_TOKEN env entries) so the token is injected
from the Secret rather than inlined.
cmd/pd-sidecar/main.go-71-74 (1)

71-74: ⚠️ Potential issue | 🟡 Minor

Exit code missing on startup failure.

When proxyServer.Start(ctx) fails, the error is logged but the process exits with code 0. Use os.Exit(1) or return from a function that sets exit code.

Proposed fix
 	proxyServer := proxy.NewProxy(opts.Config)
 	if err := proxyServer.Start(ctx); err != nil {
 		logger.Error(err, "failed to start proxy server")
+		os.Exit(1)
 	}

Add "os" to imports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/pd-sidecar/main.go` around lines 71 - 74, The startup sequence logs
errors from proxyServer.Start(ctx) but doesn't set a non-zero exit code; update
the error handling after proxy.NewProxy(opts.Config) and proxyServer.Start to
call os.Exit(1) (or return an error that causes a non-zero exit) when Start
returns an error—add the "os" import and replace the current logger.Error(err,
"failed to start proxy server") call with logging followed by os.Exit(1) so
failures produce a non-zero exit status.
test/e2e/yaml/vllm-sim-dp.yaml-31-32 (1)

31-32: ⚠️ Potential issue | 🟡 Minor

HF_TOKEN exposed as plain environment variable.

Passing HF_TOKEN directly as an environment variable value exposes the token in pod specs and potentially in logs. For production, consider using a Secret reference.

Recommended approach for production
         - name: HF_TOKEN
-          value: "${HF_TOKEN}"
+          valueFrom:
+            secretKeyRef:
+              name: hf-token-secret
+              key: token

For e2e tests this may be acceptable, but ensure HF_TOKEN is not logged or exposed in CI artifacts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/yaml/vllm-sim-dp.yaml` around lines 31 - 32, Replace the plain
HF_TOKEN environment value with a Kubernetes Secret reference instead of
exposing the token inline: update the container spec's env entry for "HF_TOKEN"
to use valueFrom.secretKeyRef pointing to a Secret name (e.g., a new "hf-token"
secret) and the appropriate key (e.g., "token"); also update any e2e test
harness code that sets HF_TOKEN to create/populate that Secret before the pod
runs and ensure test logs/do-not-emit the token. Use the "HF_TOKEN" env name and
secretKeyRef when locating where to change the manifest and test setup.
docs/disaggregation.md-218-218 (1)

218-218: ⚠️ Potential issue | 🟡 Minor

Typo: "mumtimedia" → "multimedia".

📝 Fix
-- Cache coordinate (we can talk about 3 different types of cache: KV-cache, embeddings, and mumtimedia content)
+- Cache coordinate (we can talk about 3 different types of cache: KV-cache, embeddings, and multimedia content)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/disaggregation.md` at line 218, Replace the typo "mumtimedia" with the
correct spelling "multimedia" in the sentence that reads "Cache coordinate (we
can talk about 3 different types of cache: KV-cache, embeddings, and mumtimedia
content)"; update the phrase to "multimedia content" so the line reads
"...KV-cache, embeddings, and multimedia content" (edit the text near the "Cache
coordinate" sentence).
docs/disaggregation.md-525-525 (1)

525-525: ⚠️ Potential issue | 🟡 Minor

Broken URL - trailing } character.

The reference link has an errant closing brace.

📝 Fix
-- vLLM: [[RFC]: Prototype Separating Vision Encoder to Its Own Worker](https://github.com/vllm-project/vllm/issues/20799)}
+- vLLM: [[RFC]: Prototype Separating Vision Encoder to Its Own Worker](https://github.com/vllm-project/vllm/issues/20799)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/disaggregation.md` at line 525, The markdown reference for vLLM contains
an extra trailing brace in the link text "vLLM: [[RFC]: Prototype Separating
Vision Encoder to Its Own
Worker](https://github.com/vllm-project/vllm/issues/20799)}"; remove the errant
closing "}" so the link becomes a valid markdown link (i.e., ensure the URL ends
with ")") and verify the surrounding text still renders correctly.
pkg/plugins/register.go-41-41 (1)

41-41: ⚠️ Potential issue | 🟡 Minor

Fix typo in constant identifier: AlwaysDisaggMulimodalPluginTypeAlwaysDisaggMultimodalPluginType.

The Go identifier is missing a 't' in "Multimodal". Note: the constant's string value "always-disagg-multimodal-decider" is spelled correctly, so YAML configs won't be affected. However, the incorrect identifier should be fixed for code clarity and consistency with the comment on line 12 of always_disagg_mm_decider.go.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/register.go` at line 41, The register call uses a misspelled
identifier: replace profile.AlwaysDisaggMulimodalPluginType with
profile.AlwaysDisaggMultimodalPluginType in the plugin.Register call and update
any other usages to match the corrected constant name; ensure the constant
definition (in the file that defines it, referenced by the comment in
always_disagg_mm_decider.go) is renamed to AlwaysDisaggMultimodalPluginType so
references compile and remain consistent with the string value
"always-disagg-multimodal-decider".
.github/workflows/ci-pr-checks.yaml-40-56 (1)

40-56: ⚠️ Potential issue | 🟡 Minor

Shell variable expansion without quoting (SC2086).

The docker volume inspect output substitution should be quoted to prevent word splitting if paths contain spaces.

🛡️ Proposed fix
       run: |
         docker volume create llm-d-gomodcache
         docker volume create llm-d-gobuildcache
-        echo "mod=$(docker volume inspect llm-d-gomodcache -f '{{.Mountpoint}}')" >> $GITHUB_OUTPUT
-        echo "build=$(docker volume inspect llm-d-gobuildcache -f '{{.Mountpoint}}')" >> $GITHUB_OUTPUT
+        echo "mod=$(docker volume inspect llm-d-gomodcache -f '{{.Mountpoint}}')" >> "$GITHUB_OUTPUT"
+        echo "build=$(docker volume inspect llm-d-gobuildcache -f '{{.Mountpoint}}')" >> "$GITHUB_OUTPUT"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-pr-checks.yaml around lines 40 - 56, The CI step with
id "go-cache" uses unquoted shell variable expansion for the redirection target
($GITHUB_OUTPUT), which can break if the path contains spaces; update the two
echo lines that write to $GITHUB_OUTPUT (the lines inside the "Create and
resolve Go cache volumes" step) to quote the redirection target (use
"$GITHUB_OUTPUT" instead of $GITHUB_OUTPUT) so the outputs written by docker
volume inspect (Mountpoint) are handled safely.
pkg/plugins/preparedata/tokenizer_preparedata.go-45-59 (1)

45-59: ⚠️ Potential issue | 🟡 Minor

Silent error suppression reduces debuggability.

Line 50 discards the error from p.tokenize(). While fail-open is intentional, the error should be logged for operational visibility. Without logging, tokenization failures will be invisible in production.

Proposed fix
-func (p *TokenizerPlugin) PrepareRequestData(ctx context.Context, request *scheduling.LLMRequest, pods []scheduling.Endpoint) error {
+func (p *TokenizerPlugin) PrepareRequestData(ctx context.Context, request *scheduling.LLMRequest, _ []scheduling.Endpoint) error {
 	if request.TokenizedPrompt != nil {
 		return nil
 	}
 
-	tokenIDs, _ := p.tokenize(ctx, request)
+	tokenIDs, err := p.tokenize(ctx, request)
+	if err != nil {
+		p.logger.V(4).Info("tokenization failed (fail-open)", "error", err, "requestID", request.RequestId)
+	}
 	if tokenIDs == nil {
 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/preparedata/tokenizer_preparedata.go` around lines 45 - 59, In
PrepareRequestData, don’t discard the error returned by p.tokenize; capture its
error value and log it for operational visibility (e.g., "tokenization failed" +
err) while preserving the fail-open behavior (still set TokenizedPrompt only
when tokenIDs != nil and return nil). Update the call in
TokenizerPlugin.PrepareRequestData to assign both tokenIDs and err from
p.tokenize(ctx, request), and emit a clear log entry using the plugin's logger
(or a suitable logger) referencing the request or model context before
returning.
docs/architecture.md-731-732 (1)

731-732: ⚠️ Potential issue | 🟡 Minor

Update the stale plugin name in the numbered list.

This note still tells readers to configure PrefillHeader, but the page now documents DisaggHeadersHandler. That makes the migration guidance self-contradictory.

Suggested fix
-1. The `PrefillHeader`, `DisaggProfileHandler`, `DecodeFilter`, `PrefillFilter` and the `PrefixCachePlugin`
+1. The `DisaggHeadersHandler`, `DisaggProfileHandler`, `DecodeFilter`, `PrefillFilter` and the `PrefixCachePlugin`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture.md` around lines 731 - 732, The numbered plugin list is
stale: replace the old PrefillHeader entry with the current DisaggHeadersHandler
name so the docs match the code; also verify the list includes the other plugins
by their exact names (DisaggProfileHandler, DecodeFilter, PrefillFilter,
PrefixCachePlugin) to ensure the documented “plugins instantiated” list matches
the actual plugin class/handler names used elsewhere.
Makefile.tools.mk-17-22 (1)

17-22: ⚠️ Potential issue | 🟡 Minor

Make check-kustomize depend on check-kubectl.

Running this target directly without kubectl installed currently reports the wrong error (--enable-helm unsupported) because the missing binary is swallowed by the pipeline.

Suggested fix
 .PHONY: check-kustomize
-check-kustomize:
+check-kustomize: check-kubectl
 	`@kubectl` kustomize --help 2>&1 | grep -q -- '--enable-helm' || { \
 	  echo "ERROR: kubectl kustomize does not support --enable-helm."; \
 	  echo "Upgrade kubectl to v1.25 or later (https://kubernetes.io/docs/tasks/tools/)"; \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile.tools.mk` around lines 17 - 22, The check-kustomize target should
depend on or call check-kubectl so that the presence of kubectl is verified
before running kubectl kustomize; update the Makefile by adding check-kubectl as
a prerequisite for the check-kustomize target (or invoke $(MAKE) check-kubectl
at the start of the check-kustomize recipe) so missing kubectl yields the
correct error instead of the misleading '--enable-helm' message; reference the
Makefile target name check-kustomize and the existing check-kubectl target when
making this change.
docs/architecture.md-708-710 (1)

708-710: ⚠️ Potential issue | 🟡 Minor

Fix the sample YAML indentation.

parameters is over-indented here, so the snippet is invalid as copied and will fail with a YAML parser.

Suggested fix
 - type: prefix-based-pd-decider
-    parameters:
-      nonCachedTokens: 8
+  parameters:
+    nonCachedTokens: 8
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture.md` around lines 708 - 710, The YAML sample has incorrect
indentation: the parameters mapping is over-indented relative to the list item.
Edit the snippet so that the parameters key is aligned with the type key in the
same list item (the block containing "type: prefix-based-pd-decider"), and keep
"nonCachedTokens" nested under "parameters"; in other words, move "parameters:"
left to the same indentation level as "type: prefix-based-pd-decider" and indent
"nonCachedTokens: 8" under "parameters".
scripts/kind-dev-env.sh-264-266 (1)

264-266: ⚠️ Potential issue | 🟡 Minor

Quote the ConfigMap file paths.

Line 265 leaves EPP_CONFIG and TEMP_FILE subject to word splitting and glob expansion, so a path with spaces or wildcard characters can break the envsubst step or redirect output unexpectedly.

Suggested fix
-envsubst '$MODEL_NAME' < ${EPP_CONFIG} > ${TEMP_FILE}
+envsubst '$MODEL_NAME' < "${EPP_CONFIG}" > "${TEMP_FILE}"

As per coding guidelines, **/*.sh: Quote all variables to prevent injection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/kind-dev-env.sh` around lines 264 - 266, The envsubst invocation and
subsequent kubectl create use unquoted variables (EPP_CONFIG, TEMP_FILE and
KUBE_CONTEXT) which allows word-splitting/globbing; update the commands to quote
these variables everywhere: use envsubst '$MODEL_NAME' < "${EPP_CONFIG}" >
"${TEMP_FILE}", and pass the temp file to kubectl create configmap as
--from-file=epp-config.yaml="${TEMP_FILE}", and quote "${KUBE_CONTEXT}" in
kubectl calls so EPP_CONFIG, TEMP_FILE and KUBE_CONTEXT are protected from
splitting/globbing.
🧹 Nitpick comments (13)
pkg/plugins/scorer/active_request_test.go (1)

247-307: Missing test case for default MaxBusyScore behavior.

Tests explicitly set MaxBusyScore, but don't verify the documented default (1.0) when the parameter is omitted. Adding such a test would catch the initialization bug where unset MaxBusyScore defaults to 0.0 instead of 1.0.

Proposed additional test case
t.Run("default maxBusyScore should be 1.0 when not specified", func(t *testing.T) {
    params := &ActiveRequestParameters{
        RequestTimeout: "1m",
        // MaxBusyScore not specified - should default to 1.0
    }
    scorer := NewActiveRequest(ctx, params)

    podA := newTestEndpoint("pod-a", 0)
    podB := newTestEndpoint("pod-b", 0)

    // Send requests to make pod A busy
    req1 := newTestRequest("req-1")
    req2 := newTestRequest("req-2")
    result := newTestSchedulingResult(map[string]scheduling.Endpoint{"primary": podA})
    scorer.PreRequest(ctx, req1, result)
    scorer.PreRequest(ctx, req2, result)

    scores := scorer.Score(ctx, nil, nil, []scheduling.Endpoint{podA, podB})
    // With default maxBusyScore=1.0, busy pod should NOT be 0.0
    assert.Equal(t, 1.0, scores[podB], "Idle pod scores 1.0")
    assert.Equal(t, 0.0, scores[podA], "Busiest pod scores 0.0 with default maxBusyScore=1.0")
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/scorer/active_request_test.go` around lines 247 - 307, Add a test
verifying the documented default MaxBusyScore = 1.0 when
ActiveRequestParameters.MaxBusyScore is omitted, and fix NewActiveRequest to
initialize maxBusyScore to 1.0 when params.MaxBusyScore is the zero value (treat
omitted zero as default); update or add a test (in
TestActiveRequest_IdleThresholdAndMaxBusyScore or a new subtest) that creates
params without MaxBusyScore, exercises PreRequest/Score on busy vs idle
endpoints, and asserts the idle pod scores 1.0 while the busiest pod can score
0.0, using NewActiveRequest, ActiveRequestParameters, PreRequest, and Score to
locate the code to change.
pkg/plugins/preparedata/tokenizer_test.go (1)

53-101: Consider adding a positive test case for valid configuration.

TestTokenizerPluginFactory_Validation only tests error paths. A success case would verify the happy path wiring works correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/preparedata/tokenizer_test.go` around lines 53 - 101, Add a
positive "happy path" test to TestTokenizerPluginFactory_Validation that passes
a valid JSON params blob (e.g. including a valid "modelName") and asserts no
error and a non-nil plugin; call TokenizerPluginFactory with a non-empty
json.RawMessage, require.NoError(t, err), and assert.NotNil(t, p) to verify
successful construction and wiring for the TokenizerPluginFactory function.
pkg/plugins/profile/prefix_based_pd_decider.go (2)

77-79: Inconsistent logger usage.

Line 78 uses log.Log.Info() (package-level logger) while disaggregate uses log.FromContext(ctx) (context logger). This breaks structured logging correlation.

Proposed fix

The factory and constructor don't have context. Consider deferring this log to the first disaggregate call or accepting a logger parameter:

-	if config.NonCachedTokens == 0 {
-		log.Log.Info("Prefix-based PD disabled (NonCachedTokens=0)")
-	}

Or document this is intentional startup-time logging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/profile/prefix_based_pd_decider.go` around lines 77 - 79, The
startup message uses the package-level logger call log.Log.Info("Prefix-based PD
disabled (NonCachedTokens=0)"), which breaks structured correlation with the
contextual logger used in disaggregate; either move or defer this message into
the first call of disaggregate so it can use log.FromContext(ctx), or change the
constructor/factory to accept a logger parameter and replace log.Log.Info with
that contextual logger; update the code path that emits the message (the
NonCachedTokens==0 branch) to use the same logger strategy as disaggregate to
preserve structured logging.

150-166: Token estimation includes JSON serialization overhead.

json.Marshal on messages (line 161) adds structural characters (braces, quotes, colons) that inflate the character count. For chat completions, this may overestimate actual prompt length by ~10-20%.

Given AverageCharactersPerToken=4 is already a rough heuristic, this is likely acceptable but worth documenting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/profile/prefix_based_pd_decider.go` around lines 150 - 166,
getUserInputLenInTokens is overcounting tokens by using json.Marshal on
request.Body.ChatCompletions.Messages which adds JSON structural characters;
instead iterate request.Body.ChatCompletions.Messages and sum the lengths of the
actual message text fields (e.g., Content and optionally Role/Name) to compute
characters, then divide by AverageCharactersPerToken; update the logic that
reads request.Body.ChatCompletions.Messages to use this concatenation/length-sum
approach and add a brief comment that this avoids JSON serialization overhead.
test/sidecar/config/overlays/llmd/nixl/README.md (1)

6-6: Acceptable switch to kubectl kustomize.

The change aligns with other Makefile updates. The MD014 lint warning about $ prefix is a style nit for documentation; either remove the $ or show expected output to satisfy it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sidecar/config/overlays/llmd/nixl/README.md` at line 6, The README
command currently starts with a shell prompt "$ kubectl kustomize
test/config/overlays/llmd/nixl | kubectl apply -f -" which triggers the MD014
lint about the "$" prompt; update the README in
test/sidecar/config/overlays/llmd/nixl/README.md by either removing the leading
"$" so the command is shown as a plain code line or replace the snippet with the
command plus its expected output to satisfy the linter; ensure you modify the
exact line containing "kubectl kustomize test/config/overlays/llmd/nixl |
kubectl apply -f -" in the README.
pkg/plugins/profile/pd_profile_handler_test.go (1)

346-348: Inconsistent context creation.

TestPdProfileHandler_PickSeries uses context.Background() directly while other tests in this file use utils.NewTestContext(t). This inconsistency could cause the test to miss logging/tracing setup.

 func TestPdProfileHandler_PickSeries(t *testing.T) {
-	ctx := context.Background()
+	ctx := utils.NewTestContext(t)
 	prompt := "hello world, hello world, hello world, hello world, hello world, hello world, hello world!"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/profile/pd_profile_handler_test.go` around lines 346 - 348, The
test TestPdProfileHandler_PickSeries creates a context with context.Background()
while other tests use utils.NewTestContext(t); replace the direct
context.Background() call with ctx := utils.NewTestContext(t) so the test gets
the same logging/tracing test context as the rest of the suite (update any
references to ctx in TestPdProfileHandler_PickSeries accordingly).
pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go (2)

84-118: Test lacks assertion that TargetModel is read from request.

The test sets TargetModel: "test-model" on the request and captures capturedModel, but the assertion at line 117 hardcodes "test-model". Consider using request.TargetModel in the assertion to make the test more robust against copy-paste errors.

-	require.Equal(t, "test-model", capturedModel)
+	require.Equal(t, request.TargetModel, capturedModel)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go` around lines 84 -
118, In TestPrecisePrefixCacheScorer_UsesTokenizedPrompt, replace the hardcoded
assertion require.Equal(t, "test-model", capturedModel) with an assertion that
uses the request value (e.g., require.Equal(t, request.TargetModel,
capturedModel)) so the test verifies the scorer reads TargetModel from the
LLMRequest; update the assertion referencing the capturedModel variable
accordingly.

114-117: Return value of Score() not asserted.

The Score() return value (the scores map) is discarded. While this scorer returns zero scores, asserting the map is non-nil and has expected length confirms the full code path executes correctly.

Suggested improvement
-	scorer.Score(ctx, cycleState, request, testEndpoints)
+	scores := scorer.Score(ctx, cycleState, request, testEndpoints)
+	require.Len(t, scores, len(testEndpoints))

 	require.Equal(t, tokenIDs, capturedTokens)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go` around lines 114 -
117, The test currently discards the return value of scorer.Score; update the
call to capture the returned scores (e.g., scores := scorer.Score(ctx,
cycleState, request, testEndpoints)) and add assertions that scores is not nil
and has the expected length (e.g., len(scores) == len(tokenIDs)); optionally
assert that each entry in scores is zero to match the scorer's behavior. Ensure
you still keep the existing assertions for capturedTokens and capturedModel.
pkg/sidecar/proxy/data_parallel.go (2)

52-85: Duplicate URL parsing and port computation in both loops.

decoderPort, rankPort, and decoderURL are computed identically in the first loop (lines 52-61) and again in the second loop (lines 64-70). This duplicates allocations and error handling.

Consolidate into single loop
+	type dpRankInfo struct {
+		rankPort   string
+		hostPort   string
+		decoderURL *url.URL
+		handler    http.Handler
+	}
+	rankInfos := make([]dpRankInfo, 0, s.config.DataParallelSize-1)
+
 	// Fill in map of proxies, thus avoiding locks
 	for idx := range s.config.DataParallelSize - 1 {
 		decoderPort := strconv.Itoa(baseDecoderPort + idx + 1)
 		rankPort := strconv.Itoa(basePort + idx + 1)
 		hostPort := net.JoinHostPort(podIP, rankPort)
 		decoderURL, err := url.Parse(decoderScheme + "://localhost:" + decoderPort)
 		if err != nil {
 			return err
 		}
 		handler := s.createDecoderProxyHandler(decoderURL, s.config.InsecureSkipVerifyForDecoder)
 		s.dataParallelProxies[hostPort] = handler
+		rankInfos = append(rankInfos, dpRankInfo{rankPort: rankPort, hostPort: hostPort, decoderURL: decoderURL, handler: handler})
 	}
 
-	for idx := range s.config.DataParallelSize - 1 {
-		rankPort := strconv.Itoa(basePort + idx + 1)
-		decoderPort := strconv.Itoa(baseDecoderPort + idx + 1)
-		decoderURL, err := url.Parse(decoderScheme + "://localhost:" + decoderPort)
-		if err != nil {
-			return err
-		}
-
+	for _, info := range rankInfos {
 		clone := s.Clone()
-		clone.config.Port = rankPort
-		clone.config.DecoderURL = decoderURL
+		clone.config.Port = info.rankPort
+		clone.config.DecoderURL = info.decoderURL
 		clone.forwardDataParallel = false
 
 		grp.Go(func() error {
-			clone.logger = log.FromContext(ctx).WithName("proxy server on port " + rankPort)
+			clone.logger = log.FromContext(ctx).WithName("proxy server on port " + info.rankPort)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/data_parallel.go` around lines 52 - 85, The two loops
duplicate computation and parsing of decoderPort, rankPort, and decoderURL;
refactor into a single loop over s.config.DataParallelSize-1 that for each idx
computes decoderPort and rankPort, builds decoderURL (using decoderScheme,
podIP, baseDecoderPort, basePort), registers the proxy handler via
s.createDecoderProxyHandler into s.dataParallelProxies[hostPort], then clones s
(using s.Clone()), sets clone.config.Port = rankPort, clone.config.DecoderURL =
decoderURL, clone.forwardDataParallel = false, configures clone.logger,
clone.handler and clone.setKVConnector(), and finally spawns the goroutine with
grp.Go that calls clone.startHTTP(ctx); ensure error from url.Parse is handled
once per iteration.

19-21: Deprecation log message is confusing.

The log fires when DataParallelEndpointHeader is present, but the message "Use Istio >= 1.28.1" doesn't explain the deprecation context or what action the user should take.

-		s.logger.Info("The use of the x-data-parallel-host-port is deprecated. Use Istio >= 1.28.1.")
+		s.logger.Info("Header-based data-parallel routing is deprecated; prefer Istio >= 1.28.1 service mesh routing", "header", common.DataParallelEndpointHeader)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/data_parallel.go` around lines 19 - 21, The deprecation log
triggered when r.Header.Get(common.DataParallelEndpointHeader) returns a value
is unclear; update the message in the block that checks dataParallelPodHostPort
(and the surrounding code that calls s.logger.Info) to explicitly state that the
x-data-parallel-host-port header is deprecated, explain the recommended
replacement (e.g., remove the header and rely on Istio sidecar routing or
upgrade to Istio >= 1.28.1), and include a brief action for users (e.g., "remove
the header from requests or upgrade Istio to >=1.28.1 to enable built-in
routing") so operators know what to change.
pkg/sidecar/proxy/options_test.go (1)

142-154: Duplicate test case with identical expectations.

Test cases "empty string does not set values" (lines 142-147) and "deprecated flags take precedence when InferencePool is empty" (lines 148-154) have identical inputs and expectations. The second case name suggests it should test different behavior.

♻️ Either remove duplicate or test actual deprecated flag precedence
 		{
 			name:              "deprecated flags take precedence when InferencePool is empty",
 			inferencePool:     "",
-			expectedNamespace: "",
-			expectedName:      "",
+			// If testing deprecated flags, set them and verify they're used:
+			// This test case needs additional setup for deprecated flags
+			expectedNamespace: "from-deprecated",
+			expectedName:      "pool-from-deprecated",
 		},

Or remove the duplicate case if it's unintentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/options_test.go` around lines 142 - 154, The test table
contains two duplicate cases: "empty string does not set values" and "deprecated
flags take precedence when InferencePool is empty" have identical inputs and
expectations; update the second case to actually exercise deprecated-flag
precedence by populating the deprecated flag fields (the fields representing the
old namespace/name flags in the same test struct) and adjust
expectedNamespace/expectedName to match those deprecated values, or remove the
duplicate case entirely if it was unintentional; locate the test table in
options_test.go and modify the entry with name "deprecated flags take precedence
when InferencePool is empty" accordingly.
pkg/plugins/scorer/precise_prefix_cache_test.go (1)

917-918: time.Sleep in test creates flakiness risk.

Using time.Sleep(500 * time.Millisecond) to wait for TTL expiration is fragile under CI load. Consider using polling with Eventually or a test clock.

♻️ Alternative: use polling
-	// 3. Wait for TTL to expire and trigger eviction
-	time.Sleep(500 * time.Millisecond)
-	scorer.speculativeCache.DeleteExpired()
+	// 3. Wait for TTL to expire and trigger eviction
+	require.Eventually(t, func() bool {
+		scorer.speculativeCache.DeleteExpired()
+		item := scorer.speculativeCache.Get(request.RequestId)
+		return item == nil
+	}, 1*time.Second, 50*time.Millisecond)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/scorer/precise_prefix_cache_test.go` around lines 917 - 918,
Replace the fragile time.Sleep(500 * time.Millisecond) by actively polling for
TTL expiry: remove the call to time.Sleep and instead use a test
polling/assertion (e.g. Gomega's Eventually) that repeatedly invokes
scorer.speculativeCache.DeleteExpired (or advances the test clock if a clock is
available) and checks the concrete observable that an entry was removed (e.g.
the cache lookup/Contains/Get result for the key or
scorer.speculativeCache.IsEmpty) until it becomes true or a timeout is reached;
this ensures the test waits deterministically for DeleteExpired to have effect
rather than relying on a fixed sleep.
test/e2e/e2e_test.go (1)

779-807: runStreamingChatCompletion hardcodes model name, unlike its counterparts.

runStreamingCompletion accepts a theModel parameter, but runStreamingChatCompletion hardcodes simModelName at line 783. This inconsistency limits test flexibility.

Proposed fix for consistency
-func runStreamingChatCompletion(prompt string) (string, string) {
+func runStreamingChatCompletion(prompt, modelName string) (string, string) {
 	ginkgo.By(fmt.Sprintf("Sending Streaming Chat Completion Request: (port %s)", port))
 
 	// Use raw HTTP for streaming to capture headers
-	body := fmt.Sprintf(`{"model":"%s","messages":[{"role":"user","content":"%s"}],"stream":true}`, simModelName, prompt)
+	body := fmt.Sprintf(`{"model":"%s","messages":[{"role":"user","content":"%s"}],"stream":true}`, modelName, prompt)

Update call sites at lines 203, etc. to pass simModelName explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/e2e_test.go` around lines 779 - 807, The runStreamingChatCompletion
function currently hardcodes simModelName; change its signature to accept a
model parameter (e.g., theModel string) and use that when building the request
body instead of simModelName, mirroring runStreamingCompletion's API, and then
update all call sites that invoke runStreamingChatCompletion to pass the desired
model (e.g., simModelName) so tests remain flexible and consistent.

Comment thread cmd/pd-sidecar/main.go
Comment thread Dockerfile.builder
Comment thread pkg/plugins/pre-request/disagg_headers_handler.go
Comment thread pkg/plugins/scorer/precise_prefix_cache_test.go
Comment thread pkg/plugins/scorer/precise_prefix_cache.go

@anishasthana anishasthana left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci

openshift-ci Bot commented Apr 3, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anishasthana, Gregory-Pereira

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [Gregory-Pereira,anishasthana]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions

github-actions Bot commented Apr 3, 2026

Copy link
Copy Markdown

Cannot approve the pull request: Error: openshift-ci[bot] is not included in the approvers role in the OWNERS file

@openshift-merge-bot openshift-merge-bot Bot merged commit fb6905d into opendatahub-io:main Apr 3, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.