Backport upstream v0.7.1#175
Conversation
* Fix panic in SGLang proxy handling of concurrent requests Signed-off-by: YANG LI <yangligt@google.com> * Add concurrency unit test for SGLang context logic Signed-off-by: YANG LI <yangligt@google.com> --------- Signed-off-by: YANG LI <yangligt@google.com>
* Add opentelemetry tracing
Add centralized telemetry package and custom spans
following the llm-d distributed tracing proposal.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* update Dockerfile.sidecar
Signed-off-by: sallyom <somalley@redhat.com>
* tracing: remove extra success results & startup spans and cleanup
Signed-off-by: sallyom <somalley@redhat.com>
* fix: avoid os.Exit bypassing defer in main
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* fix: address review nits for tracing PR
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* test: add edge case tests for StripScheme
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* remove extra comments from sidecar spans
Signed-off-by: sallyom <somalley@redhat.com>
* fix lint error
Signed-off-by: sallyom <somalley@redhat.com>
* protect against segfault on tests
Signed-off-by: greg pereira <grpereir@redhat.com>
---------
Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
…m-d#661) Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.40.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.40.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (llm-d#662) Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go). Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.64.0 to 0.65.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.64.0...zpages/v0.65.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.39.0 to 1.40.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-version: 0.65.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-version: 1.40.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
…build (llm-d#664) Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Bumps the kubernetes group with 5 updates: | Package | From | To | | --- | --- | --- | | [k8s.io/api](https://github.com/kubernetes/api) | `0.34.4` | `0.34.5` | | [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.4` | `0.34.5` | | [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.4` | `0.34.5` | | [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.4` | `0.34.5` | | [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.4` | `0.34.5` | Updates `k8s.io/api` from 0.34.4 to 0.34.5 - [Commits](kubernetes/api@v0.34.4...v0.34.5) Updates `k8s.io/apiextensions-apiserver` from 0.34.4 to 0.34.5 - [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases) - [Commits](kubernetes/apiextensions-apiserver@v0.34.4...v0.34.5) Updates `k8s.io/apimachinery` from 0.34.4 to 0.34.5 - [Commits](kubernetes/apimachinery@v0.34.4...v0.34.5) Updates `k8s.io/client-go` from 0.34.4 to 0.34.5 - [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md) - [Commits](kubernetes/client-go@v0.34.4...v0.34.5) Updates `k8s.io/component-base` from 0.34.4 to 0.34.5 - [Commits](kubernetes/component-base@v0.34.4...v0.34.5) --- updated-dependencies: - dependency-name: k8s.io/api dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apiextensions-apiserver dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apimachinery dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/client-go dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/component-base dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (llm-d#674) Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go). Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.65.0 to 0.66.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.65.0...zpages/v0.66.0) Updates `go.opentelemetry.io/otel` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) Updates `go.opentelemetry.io/otel/sdk` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) Updates `go.opentelemetry.io/otel/trace` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-version: 0.66.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/trace dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…m-d#671) Bumps [lycheeverse/lychee-action](https://github.com/lycheeverse/lychee-action) from 2.7.0 to 2.8.0. - [Release notes](https://github.com/lycheeverse/lychee-action/releases) - [Commits](lycheeverse/lychee-action@v2.7.0...v2.8.0) --- updated-dependencies: - dependency-name: lycheeverse/lychee-action dependency-version: 2.8.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ci: add dev image workflow for main and release branches Build and push -dev variants of EPP and sidecar container images on pushes to main and release-* branches, tagged with commit SHA. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * ci: extract reusable build workflow and tag dev images by branch Refactor ci-release and ci-dev to call a shared ci-build-images reusable workflow, reducing duplication. Tag dev images with the branch name instead of commit SHA so each branch has exactly one image that gets overwritten on push, avoiding image accumulation. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Newlines at EOF Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.43.5 to 1.44.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.43.5...v1.44.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.44.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The v0.44.1 release was removed from GitHub, causing 404 errors in the trivy-scan action. Update to the latest available version. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
* Allow sidecar server to reload TLS certificates Enables TLS certificates to be rotated without restarting sidecar and vLLM deployments. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Pass certPath to reloader Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Improvements Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Bumps the go-dependencies group with 7 updates: | Package | From | To | | --- | --- | --- | | [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) | `0.66.0` | `0.67.0` | | [go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` | | [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` | | [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` | | [go.opentelemetry.io/otel/trace](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` | | [golang.org/x/sync](https://github.com/golang/sync) | `0.19.0` | `0.20.0` | | [google.golang.org/grpc](https://github.com/grpc/grpc-go) | `1.79.1` | `1.79.2` | Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.66.0 to 0.67.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.66.0...zpages/v0.67.0) Updates `go.opentelemetry.io/otel` from 1.41.0 to 1.42.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.41.0 to 1.42.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0) Updates `go.opentelemetry.io/otel/sdk` from 1.41.0 to 1.42.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0) Updates `go.opentelemetry.io/otel/trace` from 1.41.0 to 1.42.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0) Updates `golang.org/x/sync` from 0.19.0 to 0.20.0 - [Commits](golang/sync@v0.19.0...v0.20.0) Updates `google.golang.org/grpc` from 1.79.1 to 1.79.2 - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.79.1...v1.79.2) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-version: 0.67.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/trace dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: golang.org/x/sync dependency-version: 0.20.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: google.golang.org/grpc dependency-version: 1.79.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…gs (llm-d#682) * feat(sidecar): simplify TLS command line options with StringSlice flags Signed-off-by: Guangya Liu <gyliu513@gmail.com> * address comments from Etai Signed-off-by: Guangya Liu <gyliu513@gmail.com> * keep deprecated flags Signed-off-by: Guangya Liu <gyliu513@gmail.com> --------- Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* fix: simplify InferencePool flag to namespace/name format Signed-off-by: Guangya Liu <gyliu513@gmail.com> * Address comments from Etai Signed-off-by: Guangya Liu <gyliu513@gmail.com> --------- Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* initial E/PD extension of the sidecar Signed-off-by: roytman <roytman@il.ibm.com> * add Encoding span.SetAttributes, fix bug in connector_sglang_test.go Signed-off-by: roytman <roytman@il.ibm.com> * fix e2e tests Signed-off-by: roytman <roytman@il.ibm.com> * Update pkg/sidecar/proxy/connector_epd_shared_storage.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> * flags order Signed-off-by: roytman <roytman@il.ibm.com> * fix comments Signed-off-by: roytman <roytman@il.ibm.com> * remove redundant comment Signed-off-by: roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com>
* Prevent mismatch between new and deprecated APIs Signed-off-by: roytman <roytman@il.ibm.com> * simplify DisaggProfileHandlerFactory Signed-off-by: roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com>
…lm-d#770) * - import IGW 1.4.0 - turn tokenizer PrepareData plugin into a scorer and wrap the latter in no-build flag Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * address review Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
…-d#715) Signed-off-by: Guangya Liu <gyliu513@gmail.com>
* Containerize build system with builder image Introduce Dockerfile.builder based on the same Go image as production, with all build tools (golangci-lint, typos, kind, kubectl, podman) at pinned versions. All build, test, lint, and format targets now run inside the builder container. Go module and build caches use named volumes for persistence across runs. Host requirements are reduced to a container runtime and git. kubectl and envsubst are only needed for cluster deployment targets. Signed-off-by: Antonio Cardace <acardace@redhat.com> * CI: remove host Go/lint setup, use containerized targets All CI steps now delegate to make targets that run inside the builder container. Remove Go setup, go mod tidy, and golangci-lint-action. Expand paths filter to cover Dockerfile.builder, go.sum, and .golangci.yml. Signed-off-by: Antonio Cardace <acardace@redhat.com> * Fix issues detected by linter Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add builder shell targets for debugging Signed-off-by: Antonio Cardace <acardace@redhat.com> * e2e-tests: load images into kind directly to save space Signed-off-by: Antonio Cardace <acardace@redhat.com> --------- Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: roytman <roytman@il.ibm.com>
* implement context-length-aware plugin (scorer/filter) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * fix newline Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * apply review suggestion Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * code improvements Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Update pkg/plugins/multi/context_length_aware_bench_test.go Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com> * refactor UX Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add tokenization and fix tests Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * use tokenizer plugin Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * fix lint Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * implement proximity for out-of-range requests Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * addressed comments Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * remove support for multiple ranges per pod Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * rebase on tokenizer PrepareData -> Scorer change Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * addressed comments Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* change simulator version to v0.8.1 Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update simulator version Signed-off-by: Maya Barnea <mayab@il.ibm.com> * make e2e test timeout 20min Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
…gger() to inject it via zap.Encoder() (llm-d#778) Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* Add E2E_KEEP_CLUSTER_ON_FAILURE env variable Signed-off-by: roytman <roytman@il.ibm.com> * Update Development guide Signed-off-by: roytman <roytman@il.ibm.com> * fix comments Signed-off-by: roytman <roytman@il.ibm.com> * Update test/e2e/e2e_suite_test.go Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
…nizer chat rendering (llm-d#781) * fix chat-completions -> RenderChatRequest building Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add tests Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Centralize MM conversion helper, add unit tests Address review feedback: - Export ChatCompletionsToRenderChatRequest in preparedata package - Remove duplicate conversion in scorer, call shared helper - Add table-driven tests covering single image, multiple images, system+MM user, multi-turn with image, and text-only (no regression) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Move MM conversion test to preparedata package The test exercises ChatCompletionsToRenderChatRequest which lives in preparedata — keep it co-located with the function it tests. Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Add test verifying structured MM content reaches RenderChat TestTokenizerScorer_RenderChat_ForwardsStructuredContent captures the RenderChatRequest passed to the mock tokenizer and asserts that image_url content blocks from Content.Structured are forwarded correctly through the Score -> tokenize -> ChatCompletionsToRenderChatRequest path. Also verifies MMFeatures propagate to CycleState. Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
# Conflicts: # .github/actions/trivy-scan/action.yml # .github/workflows/ci-build-images.yaml # .github/workflows/ci-dev.yaml # DEVELOPMENT.md # Makefile # OWNERS # cmd/pd-sidecar/main.go # go.mod # go.sum # pkg/plugins/pre-request/pd_prerequest.go # pkg/plugins/profile/pd_profile_handler.go # pkg/plugins/scorer/precise_prefix_cache.go # pkg/sidecar/proxy/chat_completions.go # pkg/sidecar/proxy/connector_nixlv2.go # pkg/sidecar/proxy/connector_nixlv2_test.go # pkg/sidecar/proxy/connector_sglang.go # pkg/sidecar/proxy/connector_sglang_test.go # pkg/sidecar/proxy/connector_test.go # pkg/sidecar/proxy/proxy.go # pkg/sidecar/proxy/proxy_test.go
|
Cannot approve the pull request: Error: openshift-ci[bot] is not included in the approvers role in the OWNERS file |
📝 WalkthroughWalkthroughThis pull request refactors the inference scheduling system to support Encode/Prefill/Decode (E/P/D) disaggregation with Unix Domain Socket–based tokenization. It containerizes the build process via a new builder image, updates GitHub Actions workflows, introduces a unified disaggregation profile handler and multiple new scheduling plugins (tokenizer, context-length-aware, multimodal-decider), refactors the sidecar proxy to support encoder and EPD protocol routing, and updates Kubernetes manifests and configurations to integrate tokenization services. Dependencies are upgraded and the codebase moves from host-based tool execution to containerized execution within a builder image. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Security & Design IssuesCWE-434 (Unrestricted Upload of File with Dangerous Type) / SSRF concerns:
CWE-613 (Insufficient Session Expiration):
CWE-476 (Null Pointer Dereference):
CWE-640 (Weak Password Recovery Mechanism) / Configuration Management:
CWE-200 (Exposure of Sensitive Information):
Build/Supply Chain:
Logical Error in Filter Registration:
Architectural & Logic IssuesPlugin State Consistency:
Disaggregation Decider Logic:
Encoder Endpoint Deduplication:
Context Length Estimation Fallback:
Missing Error Propagation:
|
|
OK this passes except for the end to ends fail to spin up. This is because I have my docker socket symlinked to podman for some ungodly reason. Ill figure that out later but for now I think its ok: |
There was a problem hiding this comment.
Actionable comments posted: 5
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (15)
.github/workflows/ci-dev.yaml (2)
10-39:⚠️ Potential issue | 🟠 MajorDefine explicit least-privilege
permissionsper job (CWE-269).This workflow does not set job-level
permissions, so token scope falls back to repo/org defaults and can be broader than required for image publishing. Add explicit minimal scopes per job.Suggested hardening patch
jobs: set-params: + permissions: + contents: read runs-on: ubuntu-latest @@ build-and-push: + permissions: + contents: read + packages: write needs: set-params uses: ./.github/workflows/ci-build-images.yamlAs per coding guidelines, "Set least-privilege permissions per job, not workflow level".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci-dev.yaml around lines 10 - 39, The jobs "set-params" and "build-and-push" currently lack job-level permissions; add explicit minimal GitHub Actions permissions for each job (not workflow-level) to enforce least privilege: for "set-params" (steps id: version, tag) grant only what is required (e.g., contents: read if reading repo info), and for "build-and-push" (the job that uses ./.github/workflows/ci-build-images.yaml and passes GHCR_TOKEN) grant only the required package/registry and repo scopes (e.g., packages: write and contents: read) — update the job blocks for set-params and build-and-push to include a permissions map with the least-privilege scopes.
27-28:⚠️ Potential issue | 🟠 MajorDo not write event-derived data directly in
runoutput commands (CWE-94/CWE-77).
GITHUB_REF_NAMEis event-derived and is being interpolated directly into a shell command that writes to$GITHUB_OUTPUT. Sanitize/validate before emitting, or avoid shell interpolation entirely.Suggested hardening patch
- name: Set branch name as tag id: tag run: | - echo "tag=${GITHUB_REF_NAME}" >> "$GITHUB_OUTPUT" + safe_tag="$(printf '%s' "$GITHUB_REF_NAME" | tr -c '[:alnum:]._-' '-')" + printf 'tag=%s\n' "$safe_tag" >> "$GITHUB_OUTPUT"As per coding guidelines, "Never interpolate event data directly in run: blocks (script injection CWE-94)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci-dev.yaml around lines 27 - 28, The run step writes event-derived GITHUB_REF_NAME directly into $GITHUB_OUTPUT, risking script injection; update the run block that writes "tag=${GITHUB_REF_NAME}" so it validates/sanitizes GITHUB_REF_NAME before emitting (for example enforce a strict allowlist/regex like only alphanumerics, dots, dashes and underscores) and only echo to $GITHUB_OUTPUT when the value passes validation, or derive the value using a safe workflow expression and assign to an output via the step outputs mechanism instead; reference the run step, the GITHUB_REF_NAME variable and $GITHUB_OUTPUT when making this change.pkg/plugins/datalayer/models/factories.go (2)
60-62:⚠️ Potential issue | 🔴 CriticalCWE-295: Default
InsecureSkipVerify: trueenables MITM attacks.Default should be
false. Operators who explicitly need to skip verification can override via configuration.Proposed fix
func defaultDataSourceConfigParams() *modelsDatasourceParams { - return &modelsDatasourceParams{Scheme: "http", Path: "/v1/models", InsecureSkipVerify: true} + return &modelsDatasourceParams{Scheme: "http", Path: "/v1/models", InsecureSkipVerify: false} }As per coding guidelines: "No InsecureSkipVerify in TLS configs (enables MITM attacks)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/datalayer/models/factories.go` around lines 60 - 62, The defaultDataSourceConfigParams function currently sets modelsDatasourceParams.InsecureSkipVerify to true, which enables MITM risks; change the default to false in defaultDataSourceConfigParams so TLS verification is enabled by default and ensure any existing configuration path that allows operators to explicitly override modelsDatasourceParams.InsecureSkipVerify continues to accept true when explicitly configured.
64-72:⚠️ Potential issue | 🟠 MajorCWE-770: Unbounded
io.ReadAllallows memory exhaustion.A malicious or misconfigured model server returning an oversized response body can OOM the scheduler. Wrap with
io.LimitReader.Proposed fix
+const maxResponseBodySize = 10 * 1024 * 1024 // 10 MiB + func parseModels(data io.Reader) (any, error) { - body, err := io.ReadAll(data) + body, err := io.ReadAll(io.LimitReader(data, maxResponseBodySize)) if err != nil { return nil, fmt.Errorf("failed to read response body: %v", err) }As per coding guidelines: "Use io.LimitReader for HTTP response bodies (prevent memory exhaustion)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/datalayer/models/factories.go` around lines 64 - 72, The parseModels function uses io.ReadAll on the provided io.Reader which can OOM; wrap the incoming data reader with io.LimitReader to cap the maximum bytes (e.g., define a reasonable const like maxModelResponseBytes) before reading, then read and unmarshal into ModelResponse as before and return a clear error if the read exceeds or fails the limit; update parseModels to use io.LimitReader(data, maxModelResponseBytes) and keep the unmarshalling into &modelsResponse..github/workflows/auto-assign.yaml (3)
9-11:⚠️ Potential issue | 🟡 MinorMove permissions to job level.
As per coding guidelines, set least-privilege permissions per job, not workflow level.
📋 Proposed fix
-permissions: - contents: read - pull-requests: write - jobs: auto-assign: runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: write steps:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/auto-assign.yaml around lines 9 - 11, The workflow currently defines top-level permissions (permissions: contents: read, pull-requests: write) which grants broad privileges to all jobs; move these permission entries into each job that requires them so least-privilege is enforced. Edit the workflow to remove or clear the top-level permissions block and add a permissions map under the specific job definitions (e.g., jobs.<job_name>.permissions) specifying only the needed scopes like contents: read or pull-requests: write for that job; ensure any job that needs no special permissions omits the permissions block or sets minimal rights.
3-8:⚠️ Potential issue | 🟠 MajorUse only
pull_request_target, removepull_requesttrigger.Both triggers on the same event create duplicate runs with different security contexts:
pull_request: reads OWNERS from PR head (untrusted), allowing malicious PR to modify reviewer assignmentspull_request_target: reads OWNERS from base (trusted)Keep only
pull_request_targetto ensure reviewer logic always reads the trusted OWNERS file. As per coding guidelines, avoid mixing secure and insecure checkout contexts.🔒 Proposed fix
on: - pull_request: - types: [opened] pull_request_target: types: [opened]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/auto-assign.yaml around lines 3 - 8, Remove the insecure duplicate pull_request trigger so only pull_request_target remains: delete the "pull_request" key and its nested "types: [opened]" block and ensure the workflow defines only "pull_request_target: { types: [opened] }" (refer to the top-level triggers "pull_request" and "pull_request_target" in the YAML); this ensures reviewer/OWNERS logic runs in the trusted base context and avoids duplicate runs.
18-18:⚠️ Potential issue | 🔴 CriticalPin action by full commit SHA, not tag.
actions/checkout@v6uses a mutable tag. An attacker who compromises the upstream repository can retag and inject malicious code (CWE-494: Download of Code Without Integrity Check). Pin all actions by full SHA to prevent supply chain attacks.Diff
- uses: actions/checkout@v6 + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/auto-assign.yaml at line 18, The workflow uses a mutable tag "actions/checkout@v6"; replace this with the action pinned to its full commit SHA to prevent supply-chain tampering: locate the `uses: actions/checkout@v6` entry in the workflow and update it to the corresponding `actions/checkout@<full-commit-sha>` (obtain the exact SHA from the actions/checkout GitHub repo releases or commit history for v6), then commit the updated workflow ensuring no other mutable tags remain..github/workflows/md-link-check.yml (1)
16-20:⚠️ Potential issue | 🟠 MajorPin workflow actions to immutable commit SHAs.
Both actions use mutable tags instead of commit SHAs:
actions/checkout@v6(line 17)lycheeverse/lychee-action@v2.8.0(line 20)If either tag is moved or compromised upstream, CI executes attacker-controlled code with repository token scope. (CWE-829, CWE-494)
Replace tags with 40-character commit SHAs per GitHub Actions security best practices.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/md-link-check.yml around lines 16 - 20, Replace the mutable action tags with their immutable 40-character commit SHAs: update the uses entries for actions/checkout (currently "actions/checkout@v6") and lycheeverse/lychee-action (currently "lycheeverse/lychee-action@v2.8.0") to the corresponding commit SHAs (e.g., "actions/checkout@<40-char-sha>" and "lycheeverse/lychee-action@<40-char-sha>"); fetch the exact SHA from each action's GitHub repository release/commit history and substitute it so the workflow pins to the specific immutable commits.pkg/metrics/metrics_test.go (1)
10-27:⚠️ Potential issue | 🟡 MinorMissing
Reset()inTestSchedulerPDDecisionCountcan cause flaky results.The existing test doesn't reset
SchedulerPDDecisionCountbefore running, unlike the new tests. If test execution order varies, accumulated counter values from other tests could cause spurious failures.Proposed fix
func TestSchedulerPDDecisionCount(t *testing.T) { + SchedulerPDDecisionCount.Reset() + model := "test-model"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/metrics/metrics_test.go` around lines 10 - 27, The test TestSchedulerPDDecisionCount can be flaky because SchedulerPDDecisionCount isn't reset before use; add a reset at the start of the test (call SchedulerPDDecisionCount.Reset()) so prior test runs don't leak counts, then proceed to call RecordPDDecision(model, ...) and the existing CollectAndCompare assertion remains unchanged..github/workflows/check-typos.yaml (1)
12-16:⚠️ Potential issue | 🟠 MajorPin workflow actions to immutable commit SHAs (CWE-494).
Both
actions/checkout@v6andcrate-ci/typos@v1.44.0use semantic version tags instead of full commit SHAs. Tag-pinning allows tag retagging and compromised releases to run untrusted code with access to the job token.Remediation
- name: Checkout code - uses: actions/checkout@v6 + uses: actions/checkout@<40-char-commit-sha> - name: Check typos - uses: crate-ci/typos@v1.44.0 + uses: crate-ci/typos@<40-char-commit-sha>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/check-typos.yaml around lines 12 - 16, Replace the semantic version tags with immutable commit SHAs for the referenced actions to prevent tag-retagging risks: update the usage strings for actions/checkout@v6 and crate-ci/typos@v1.44.0 to their corresponding full commit SHAs (e.g., actions/checkout@<commit-sha> and crate-ci/typos@<commit-sha>), fetching the exact commit SHAs from the respective repositories/tags and ensuring the workflow references those SHAs instead of version tags.test/e2e/yaml/vllm-sim-dp.yaml (1)
73-104:⚠️ Potential issue | 🟠 MajorMissing resource limits and securityContext on
vllmcontainer.Same issue as
uds-tokenizer— the mainvllmcontainer lacks resource constraints and security hardening.As per coding guidelines: "KUBERNETES CONFIG SECURITY: Resource limits defined (memory, CPU) - prevents DoS" and "SecurityContext configured (runAsNonRoot, readOnlyRootFilesystem)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/yaml/vllm-sim-dp.yaml` around lines 73 - 104, The vllm container spec (container name "vllm") is missing resource limits/requests and a hardened securityContext; update the vllm container to add CPU and memory requests and limits (reasonable values for e2e tests), mirror the same resource settings used for "uds-tokenizer", and add a securityContext with runAsNonRoot: true, an explicit runAsUser (non-zero), readOnlyRootFilesystem: true while ensuring the mounted path /tmp/tokenizer remains writable (use an emptyDir or set writable mount), disallow privilege escalation and drop all capabilities, and set seccompProfile/runtimeDefault—apply the same security and resource pattern used for the uds-tokenizer container so the vllm container meets the Kubernetes config security guidelines.test/e2e/e2e_test.go (1)
723-747:⚠️ Potential issue | 🟡 Minor
http.Getwithout timeout risks resource exhaustion.
http.Getuseshttp.DefaultClientwhich has no timeout. If the metrics endpoint hangs, this test helper will block indefinitely. Use a client with explicit timeout.Proposed fix
func getCounterMetric(metricsURL, metricName, labelMatch string) int { - resp, err := http.Get(metricsURL) + client := &http.Client{Timeout: 10 * time.Second} + resp, err := client.Get(metricsURL) gomega.Expect(err).ShouldNot(gomega.HaveOccurred())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/e2e_test.go` around lines 723 - 747, The helper getCounterMetric uses http.Get (http.DefaultClient) which has no timeout; update getCounterMetric to create and use a dedicated http.Client with an explicit Timeout (e.g., 5s) and call client.Get(metricsURL) (or build a request and client.Do(req)), keep the existing error checks and deferred resp.Body.Close() behavior, and ensure imports include time; this prevents the test from hanging if the metrics endpoint becomes unresponsive.pkg/plugins/profile/pd_profile_handler.go (1)
196-208:⚠️ Potential issue | 🔴 CriticalGuard
TargetEndpoints[0]before using it.Lines 196 and 233 both assume the decode profile returned at least one endpoint. A successful-but-empty decode result will panic here instead of turning into a clean scheduling failure.
Suggested fix
func (h *PdProfileHandler) Pick(ctx context.Context, _ *scheduling.CycleState, request *scheduling.LLMRequest, profiles map[string]scheduling.SchedulerProfile, profileResults map[string]*scheduling.ProfileRunResult) map[string]scheduling.SchedulerProfile { + decodeResult := profileResults[h.decodeProfile] + if len(decodeResult.TargetEndpoints) == 0 { + return map[string]scheduling.SchedulerProfile{} + } + - if h.decider != nil && h.decider.disaggregate(ctx, request, profileResults[h.decodeProfile].TargetEndpoints[0]) { + if h.decider != nil && h.decider.disaggregate(ctx, request, decodeResult.TargetEndpoints[0]) { metrics.RecordPDDecision(request.TargetModel, metrics.DecisionTypePrefillDecode) //nolint:staticcheck // intentional: pd-profile-handler is itself deprecated return map[string]scheduling.SchedulerProfile{ h.prefillProfile: profiles[h.prefillProfile], } }func (h *PdProfileHandler) ProcessResults(_ context.Context, _ *scheduling.CycleState, request *scheduling.LLMRequest, profileResults map[string]*scheduling.ProfileRunResult) (*scheduling.SchedulingResult, error) { decodeRunResults := profileResults[h.decodeProfile] if decodeRunResults == nil { // if decode profile failed to run, we should fail return nil, errors.New("failed to find available decode workers") } + if len(decodeRunResults.TargetEndpoints) == 0 { + return nil, errors.New("decode profile returned no target endpoints") + } + updatedResults := map[string]*scheduling.ProfileRunResult{} if h.primaryPort != "" { targetEndpoint := decodeRunResults.TargetEndpoints[0].GetMetadata()Also applies to: 233-243
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/profile/pd_profile_handler.go` around lines 196 - 208, The code indexes profileResults[h.decodeProfile].TargetEndpoints[0] without checking the slice length; update the checks around h.decider.disaggregate(...) and the later use (around the block that currently assumes TargetEndpoints[0]) to first verify len(profileResults[h.decodeProfile].TargetEndpoints) > 0 before indexing; if the slice is empty, skip the prefill-disaggregate branch (so execution falls through to the DecodeOnly path / metrics.RecordPDDecision(..., DecisionTypeDecodeOnly)) and avoid any direct indexing on TargetEndpoints elsewhere in pd_profile_handler.go (references: h.decider, h.decider.disaggregate, profileResults[h.decodeProfile].TargetEndpoints, h.prefillProfile, profiles).pkg/sidecar/proxy/chat_completions.go (1)
70-78:⚠️ Potential issue | 🟠 MajorSplit every header element before sampling or allowlisting.
Header.Values()can legitimately return multiple entries where some still contain comma-joined hosts. With the currentlen(...) == 1guard, a value like["a:1,b:2", "c:3"]leavesa:1,b:2unsplit, so sampling and allowlist checks run on the unparsed comma-separated string instead of individual endpoints. This breaks SSRF protection and may cause incorrect routing.Affects lines 70–78 (prefill) and 126–129 (encoder) equally. Implement a helper that unconditionally splits all header values by commas and trims whitespace.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/chat_completions.go` around lines 70 - 78, The code only splits header values when there is exactly one entry, which leaves comma-joined hosts inside some header elements (e.g., ["a:1,b:2","c:3"]); create a helper (e.g., splitAndTrimHeaderValues) that takes r.Header.Values(...) for common.PrefillEndpointHeader (and the encoder header usage) and unconditionally splits each element on ',' then trims whitespace and drops empty tokens, and replace the current prefillHostPorts and encoder host extraction with calls to this helper so sampling/allowlist always sees individual host:port strings.pkg/sidecar/proxy/proxy.go (1)
338-353:⚠️ Potential issue | 🟠 MajorMajor: reject
InsecureSkipVerifyfor backend TLS connections.
tls.Config.InsecureSkipVerify(CWE-295) disables certificate validation and exposes all backend hops to MITM attacks. An on-path attacker can impersonate prefiller, encoder, or decoder services and inject forged responses. Remove the configurable skip-verify option entirely; require proper certificate validation or use explicitServerNamepinning with a CA bundle instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/proxy.go` around lines 338 - 353, The current reverse proxy setup in NewSingleHostReverseProxy assigns a Transport whose TLSClientConfig uses the insecureSkipVerify flag to set InsecureSkipVerify, which disables certificate validation; remove any use of InsecureSkipVerify in the TLSClientConfig within the newProxy.Transport and instead enforce certificate validation by not setting InsecureSkipVerify, and if you need to restrict accepted backends use TLSClientConfig.RootCAs (load a CA bundle) and/or set TLSClientConfig.ServerName to pin the expected hostname for the backend; update the code around NewSingleHostReverseProxy/newProxy.Transport/TLSClientConfig to load and use a CA pool or explicit ServerName rather than allowing insecureSkipVerify.
🟡 Minor comments (13)
test/scripts/run_e2e.sh-23-23 (1)
23-23:⚠️ Potential issue | 🟡 MinorQuote
${DIR}in thego testpath argument.Line 23 expands
${DIR}unquoted, so paths containing spaces/globs can be split unexpectedly; harden this invocation per shell injection safeguards (CWE-78).Suggested patch
-go test -v -timeout 20m ${DIR}/../e2e/ -ginkgo.v +go test -v -timeout 20m "${DIR}/../e2e/" -ginkgo.vAs per coding guidelines,
**/*.sh: "Quote all variables to prevent injection".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/scripts/run_e2e.sh` at line 23, The go test invocation uses an unquoted variable ${DIR} which can break on spaces or globs; update the test command in run_e2e.sh (the line invoking go test -v -timeout 20m ${DIR}/../e2e/ -ginkgo.v) to quote the path argument so the directory expansion is treated as a single token (e.g., use the quoted form of ${DIR}/../e2e/), ensuring shell-safe handling of spaces and preventing word-splitting or glob expansion.test/e2e/yaml/vllm-sim-pd.yaml-31-32 (1)
31-32:⚠️ Potential issue | 🟡 Minor
HF_TOKENexposed as plain-text env var.The token is set via
value: "${HF_TOKEN}"rather than a Secret reference. Acceptable for E2E test manifests, but ensure this pattern is not copied to production deployments where the token should be injected from a Kubernetes Secret.Also applies to: 114-115
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/yaml/vllm-sim-pd.yaml` around lines 31 - 32, The manifest exposes HF_TOKEN as a plain-text env var via value: "${HF_TOKEN}"—replace this with a Kubernetes Secret reference by using valueFrom.secretKeyRef on the container env entry for HF_TOKEN (set secret name and key to the secret that holds the token), and update both occurrences (the HF_TOKEN env entries) so the token is injected from the Secret rather than inlined.cmd/pd-sidecar/main.go-71-74 (1)
71-74:⚠️ Potential issue | 🟡 MinorExit code missing on startup failure.
When
proxyServer.Start(ctx)fails, the error is logged but the process exits with code 0. Useos.Exit(1)or return from a function that sets exit code.Proposed fix
proxyServer := proxy.NewProxy(opts.Config) if err := proxyServer.Start(ctx); err != nil { logger.Error(err, "failed to start proxy server") + os.Exit(1) }Add
"os"to imports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/pd-sidecar/main.go` around lines 71 - 74, The startup sequence logs errors from proxyServer.Start(ctx) but doesn't set a non-zero exit code; update the error handling after proxy.NewProxy(opts.Config) and proxyServer.Start to call os.Exit(1) (or return an error that causes a non-zero exit) when Start returns an error—add the "os" import and replace the current logger.Error(err, "failed to start proxy server") call with logging followed by os.Exit(1) so failures produce a non-zero exit status.test/e2e/yaml/vllm-sim-dp.yaml-31-32 (1)
31-32:⚠️ Potential issue | 🟡 Minor
HF_TOKENexposed as plain environment variable.Passing
HF_TOKENdirectly as an environment variable value exposes the token in pod specs and potentially in logs. For production, consider using a Secret reference.Recommended approach for production
- name: HF_TOKEN - value: "${HF_TOKEN}" + valueFrom: + secretKeyRef: + name: hf-token-secret + key: tokenFor e2e tests this may be acceptable, but ensure
HF_TOKENis not logged or exposed in CI artifacts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/yaml/vllm-sim-dp.yaml` around lines 31 - 32, Replace the plain HF_TOKEN environment value with a Kubernetes Secret reference instead of exposing the token inline: update the container spec's env entry for "HF_TOKEN" to use valueFrom.secretKeyRef pointing to a Secret name (e.g., a new "hf-token" secret) and the appropriate key (e.g., "token"); also update any e2e test harness code that sets HF_TOKEN to create/populate that Secret before the pod runs and ensure test logs/do-not-emit the token. Use the "HF_TOKEN" env name and secretKeyRef when locating where to change the manifest and test setup.docs/disaggregation.md-218-218 (1)
218-218:⚠️ Potential issue | 🟡 MinorTypo: "mumtimedia" → "multimedia".
📝 Fix
-- Cache coordinate (we can talk about 3 different types of cache: KV-cache, embeddings, and mumtimedia content) +- Cache coordinate (we can talk about 3 different types of cache: KV-cache, embeddings, and multimedia content)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/disaggregation.md` at line 218, Replace the typo "mumtimedia" with the correct spelling "multimedia" in the sentence that reads "Cache coordinate (we can talk about 3 different types of cache: KV-cache, embeddings, and mumtimedia content)"; update the phrase to "multimedia content" so the line reads "...KV-cache, embeddings, and multimedia content" (edit the text near the "Cache coordinate" sentence).docs/disaggregation.md-525-525 (1)
525-525:⚠️ Potential issue | 🟡 MinorBroken URL - trailing
}character.The reference link has an errant closing brace.
📝 Fix
-- vLLM: [[RFC]: Prototype Separating Vision Encoder to Its Own Worker](https://github.com/vllm-project/vllm/issues/20799)} +- vLLM: [[RFC]: Prototype Separating Vision Encoder to Its Own Worker](https://github.com/vllm-project/vllm/issues/20799)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/disaggregation.md` at line 525, The markdown reference for vLLM contains an extra trailing brace in the link text "vLLM: [[RFC]: Prototype Separating Vision Encoder to Its Own Worker](https://github.com/vllm-project/vllm/issues/20799)}"; remove the errant closing "}" so the link becomes a valid markdown link (i.e., ensure the URL ends with ")") and verify the surrounding text still renders correctly.pkg/plugins/register.go-41-41 (1)
41-41:⚠️ Potential issue | 🟡 MinorFix typo in constant identifier:
AlwaysDisaggMulimodalPluginType→AlwaysDisaggMultimodalPluginType.The Go identifier is missing a 't' in "Multimodal". Note: the constant's string value
"always-disagg-multimodal-decider"is spelled correctly, so YAML configs won't be affected. However, the incorrect identifier should be fixed for code clarity and consistency with the comment on line 12 ofalways_disagg_mm_decider.go.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/register.go` at line 41, The register call uses a misspelled identifier: replace profile.AlwaysDisaggMulimodalPluginType with profile.AlwaysDisaggMultimodalPluginType in the plugin.Register call and update any other usages to match the corrected constant name; ensure the constant definition (in the file that defines it, referenced by the comment in always_disagg_mm_decider.go) is renamed to AlwaysDisaggMultimodalPluginType so references compile and remain consistent with the string value "always-disagg-multimodal-decider"..github/workflows/ci-pr-checks.yaml-40-56 (1)
40-56:⚠️ Potential issue | 🟡 MinorShell variable expansion without quoting (SC2086).
The
docker volume inspectoutput substitution should be quoted to prevent word splitting if paths contain spaces.🛡️ Proposed fix
run: | docker volume create llm-d-gomodcache docker volume create llm-d-gobuildcache - echo "mod=$(docker volume inspect llm-d-gomodcache -f '{{.Mountpoint}}')" >> $GITHUB_OUTPUT - echo "build=$(docker volume inspect llm-d-gobuildcache -f '{{.Mountpoint}}')" >> $GITHUB_OUTPUT + echo "mod=$(docker volume inspect llm-d-gomodcache -f '{{.Mountpoint}}')" >> "$GITHUB_OUTPUT" + echo "build=$(docker volume inspect llm-d-gobuildcache -f '{{.Mountpoint}}')" >> "$GITHUB_OUTPUT"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci-pr-checks.yaml around lines 40 - 56, The CI step with id "go-cache" uses unquoted shell variable expansion for the redirection target ($GITHUB_OUTPUT), which can break if the path contains spaces; update the two echo lines that write to $GITHUB_OUTPUT (the lines inside the "Create and resolve Go cache volumes" step) to quote the redirection target (use "$GITHUB_OUTPUT" instead of $GITHUB_OUTPUT) so the outputs written by docker volume inspect (Mountpoint) are handled safely.pkg/plugins/preparedata/tokenizer_preparedata.go-45-59 (1)
45-59:⚠️ Potential issue | 🟡 MinorSilent error suppression reduces debuggability.
Line 50 discards the error from
p.tokenize(). While fail-open is intentional, the error should be logged for operational visibility. Without logging, tokenization failures will be invisible in production.Proposed fix
-func (p *TokenizerPlugin) PrepareRequestData(ctx context.Context, request *scheduling.LLMRequest, pods []scheduling.Endpoint) error { +func (p *TokenizerPlugin) PrepareRequestData(ctx context.Context, request *scheduling.LLMRequest, _ []scheduling.Endpoint) error { if request.TokenizedPrompt != nil { return nil } - tokenIDs, _ := p.tokenize(ctx, request) + tokenIDs, err := p.tokenize(ctx, request) + if err != nil { + p.logger.V(4).Info("tokenization failed (fail-open)", "error", err, "requestID", request.RequestId) + } if tokenIDs == nil { return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/preparedata/tokenizer_preparedata.go` around lines 45 - 59, In PrepareRequestData, don’t discard the error returned by p.tokenize; capture its error value and log it for operational visibility (e.g., "tokenization failed" + err) while preserving the fail-open behavior (still set TokenizedPrompt only when tokenIDs != nil and return nil). Update the call in TokenizerPlugin.PrepareRequestData to assign both tokenIDs and err from p.tokenize(ctx, request), and emit a clear log entry using the plugin's logger (or a suitable logger) referencing the request or model context before returning.docs/architecture.md-731-732 (1)
731-732:⚠️ Potential issue | 🟡 MinorUpdate the stale plugin name in the numbered list.
This note still tells readers to configure
PrefillHeader, but the page now documentsDisaggHeadersHandler. That makes the migration guidance self-contradictory.Suggested fix
-1. The `PrefillHeader`, `DisaggProfileHandler`, `DecodeFilter`, `PrefillFilter` and the `PrefixCachePlugin` +1. The `DisaggHeadersHandler`, `DisaggProfileHandler`, `DecodeFilter`, `PrefillFilter` and the `PrefixCachePlugin`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/architecture.md` around lines 731 - 732, The numbered plugin list is stale: replace the old PrefillHeader entry with the current DisaggHeadersHandler name so the docs match the code; also verify the list includes the other plugins by their exact names (DisaggProfileHandler, DecodeFilter, PrefillFilter, PrefixCachePlugin) to ensure the documented “plugins instantiated” list matches the actual plugin class/handler names used elsewhere.Makefile.tools.mk-17-22 (1)
17-22:⚠️ Potential issue | 🟡 MinorMake
check-kustomizedepend oncheck-kubectl.Running this target directly without
kubectlinstalled currently reports the wrong error (--enable-helmunsupported) because the missing binary is swallowed by the pipeline.Suggested fix
.PHONY: check-kustomize -check-kustomize: +check-kustomize: check-kubectl `@kubectl` kustomize --help 2>&1 | grep -q -- '--enable-helm' || { \ echo "ERROR: kubectl kustomize does not support --enable-helm."; \ echo "Upgrade kubectl to v1.25 or later (https://kubernetes.io/docs/tasks/tools/)"; \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile.tools.mk` around lines 17 - 22, The check-kustomize target should depend on or call check-kubectl so that the presence of kubectl is verified before running kubectl kustomize; update the Makefile by adding check-kubectl as a prerequisite for the check-kustomize target (or invoke $(MAKE) check-kubectl at the start of the check-kustomize recipe) so missing kubectl yields the correct error instead of the misleading '--enable-helm' message; reference the Makefile target name check-kustomize and the existing check-kubectl target when making this change.docs/architecture.md-708-710 (1)
708-710:⚠️ Potential issue | 🟡 MinorFix the sample YAML indentation.
parametersis over-indented here, so the snippet is invalid as copied and will fail with a YAML parser.Suggested fix
- type: prefix-based-pd-decider - parameters: - nonCachedTokens: 8 + parameters: + nonCachedTokens: 8🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/architecture.md` around lines 708 - 710, The YAML sample has incorrect indentation: the parameters mapping is over-indented relative to the list item. Edit the snippet so that the parameters key is aligned with the type key in the same list item (the block containing "type: prefix-based-pd-decider"), and keep "nonCachedTokens" nested under "parameters"; in other words, move "parameters:" left to the same indentation level as "type: prefix-based-pd-decider" and indent "nonCachedTokens: 8" under "parameters".scripts/kind-dev-env.sh-264-266 (1)
264-266:⚠️ Potential issue | 🟡 MinorQuote the ConfigMap file paths.
Line 265 leaves
EPP_CONFIGandTEMP_FILEsubject to word splitting and glob expansion, so a path with spaces or wildcard characters can break theenvsubststep or redirect output unexpectedly.Suggested fix
-envsubst '$MODEL_NAME' < ${EPP_CONFIG} > ${TEMP_FILE} +envsubst '$MODEL_NAME' < "${EPP_CONFIG}" > "${TEMP_FILE}"As per coding guidelines,
**/*.sh: Quote all variables to prevent injection.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/kind-dev-env.sh` around lines 264 - 266, The envsubst invocation and subsequent kubectl create use unquoted variables (EPP_CONFIG, TEMP_FILE and KUBE_CONTEXT) which allows word-splitting/globbing; update the commands to quote these variables everywhere: use envsubst '$MODEL_NAME' < "${EPP_CONFIG}" > "${TEMP_FILE}", and pass the temp file to kubectl create configmap as --from-file=epp-config.yaml="${TEMP_FILE}", and quote "${KUBE_CONTEXT}" in kubectl calls so EPP_CONFIG, TEMP_FILE and KUBE_CONTEXT are protected from splitting/globbing.
🧹 Nitpick comments (13)
pkg/plugins/scorer/active_request_test.go (1)
247-307: Missing test case for defaultMaxBusyScorebehavior.Tests explicitly set
MaxBusyScore, but don't verify the documented default (1.0) when the parameter is omitted. Adding such a test would catch the initialization bug where unsetMaxBusyScoredefaults to 0.0 instead of 1.0.Proposed additional test case
t.Run("default maxBusyScore should be 1.0 when not specified", func(t *testing.T) { params := &ActiveRequestParameters{ RequestTimeout: "1m", // MaxBusyScore not specified - should default to 1.0 } scorer := NewActiveRequest(ctx, params) podA := newTestEndpoint("pod-a", 0) podB := newTestEndpoint("pod-b", 0) // Send requests to make pod A busy req1 := newTestRequest("req-1") req2 := newTestRequest("req-2") result := newTestSchedulingResult(map[string]scheduling.Endpoint{"primary": podA}) scorer.PreRequest(ctx, req1, result) scorer.PreRequest(ctx, req2, result) scores := scorer.Score(ctx, nil, nil, []scheduling.Endpoint{podA, podB}) // With default maxBusyScore=1.0, busy pod should NOT be 0.0 assert.Equal(t, 1.0, scores[podB], "Idle pod scores 1.0") assert.Equal(t, 0.0, scores[podA], "Busiest pod scores 0.0 with default maxBusyScore=1.0") })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/scorer/active_request_test.go` around lines 247 - 307, Add a test verifying the documented default MaxBusyScore = 1.0 when ActiveRequestParameters.MaxBusyScore is omitted, and fix NewActiveRequest to initialize maxBusyScore to 1.0 when params.MaxBusyScore is the zero value (treat omitted zero as default); update or add a test (in TestActiveRequest_IdleThresholdAndMaxBusyScore or a new subtest) that creates params without MaxBusyScore, exercises PreRequest/Score on busy vs idle endpoints, and asserts the idle pod scores 1.0 while the busiest pod can score 0.0, using NewActiveRequest, ActiveRequestParameters, PreRequest, and Score to locate the code to change.pkg/plugins/preparedata/tokenizer_test.go (1)
53-101: Consider adding a positive test case for valid configuration.
TestTokenizerPluginFactory_Validationonly tests error paths. A success case would verify the happy path wiring works correctly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/preparedata/tokenizer_test.go` around lines 53 - 101, Add a positive "happy path" test to TestTokenizerPluginFactory_Validation that passes a valid JSON params blob (e.g. including a valid "modelName") and asserts no error and a non-nil plugin; call TokenizerPluginFactory with a non-empty json.RawMessage, require.NoError(t, err), and assert.NotNil(t, p) to verify successful construction and wiring for the TokenizerPluginFactory function.pkg/plugins/profile/prefix_based_pd_decider.go (2)
77-79: Inconsistent logger usage.Line 78 uses
log.Log.Info()(package-level logger) whiledisaggregateuseslog.FromContext(ctx)(context logger). This breaks structured logging correlation.Proposed fix
The factory and constructor don't have context. Consider deferring this log to the first
disaggregatecall or accepting a logger parameter:- if config.NonCachedTokens == 0 { - log.Log.Info("Prefix-based PD disabled (NonCachedTokens=0)") - }Or document this is intentional startup-time logging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/profile/prefix_based_pd_decider.go` around lines 77 - 79, The startup message uses the package-level logger call log.Log.Info("Prefix-based PD disabled (NonCachedTokens=0)"), which breaks structured correlation with the contextual logger used in disaggregate; either move or defer this message into the first call of disaggregate so it can use log.FromContext(ctx), or change the constructor/factory to accept a logger parameter and replace log.Log.Info with that contextual logger; update the code path that emits the message (the NonCachedTokens==0 branch) to use the same logger strategy as disaggregate to preserve structured logging.
150-166: Token estimation includes JSON serialization overhead.
json.Marshalon messages (line 161) adds structural characters (braces, quotes, colons) that inflate the character count. For chat completions, this may overestimate actual prompt length by ~10-20%.Given
AverageCharactersPerToken=4is already a rough heuristic, this is likely acceptable but worth documenting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/profile/prefix_based_pd_decider.go` around lines 150 - 166, getUserInputLenInTokens is overcounting tokens by using json.Marshal on request.Body.ChatCompletions.Messages which adds JSON structural characters; instead iterate request.Body.ChatCompletions.Messages and sum the lengths of the actual message text fields (e.g., Content and optionally Role/Name) to compute characters, then divide by AverageCharactersPerToken; update the logic that reads request.Body.ChatCompletions.Messages to use this concatenation/length-sum approach and add a brief comment that this avoids JSON serialization overhead.test/sidecar/config/overlays/llmd/nixl/README.md (1)
6-6: Acceptable switch tokubectl kustomize.The change aligns with other Makefile updates. The MD014 lint warning about
$prefix is a style nit for documentation; either remove the$or show expected output to satisfy it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/sidecar/config/overlays/llmd/nixl/README.md` at line 6, The README command currently starts with a shell prompt "$ kubectl kustomize test/config/overlays/llmd/nixl | kubectl apply -f -" which triggers the MD014 lint about the "$" prompt; update the README in test/sidecar/config/overlays/llmd/nixl/README.md by either removing the leading "$" so the command is shown as a plain code line or replace the snippet with the command plus its expected output to satisfy the linter; ensure you modify the exact line containing "kubectl kustomize test/config/overlays/llmd/nixl | kubectl apply -f -" in the README.pkg/plugins/profile/pd_profile_handler_test.go (1)
346-348: Inconsistent context creation.
TestPdProfileHandler_PickSeriesusescontext.Background()directly while other tests in this file useutils.NewTestContext(t). This inconsistency could cause the test to miss logging/tracing setup.func TestPdProfileHandler_PickSeries(t *testing.T) { - ctx := context.Background() + ctx := utils.NewTestContext(t) prompt := "hello world, hello world, hello world, hello world, hello world, hello world, hello world!"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/profile/pd_profile_handler_test.go` around lines 346 - 348, The test TestPdProfileHandler_PickSeries creates a context with context.Background() while other tests use utils.NewTestContext(t); replace the direct context.Background() call with ctx := utils.NewTestContext(t) so the test gets the same logging/tracing test context as the rest of the suite (update any references to ctx in TestPdProfileHandler_PickSeries accordingly).pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go (2)
84-118: Test lacks assertion thatTargetModelis read from request.The test sets
TargetModel: "test-model"on the request and capturescapturedModel, but the assertion at line 117 hardcodes"test-model". Consider usingrequest.TargetModelin the assertion to make the test more robust against copy-paste errors.- require.Equal(t, "test-model", capturedModel) + require.Equal(t, request.TargetModel, capturedModel)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go` around lines 84 - 118, In TestPrecisePrefixCacheScorer_UsesTokenizedPrompt, replace the hardcoded assertion require.Equal(t, "test-model", capturedModel) with an assertion that uses the request value (e.g., require.Equal(t, request.TargetModel, capturedModel)) so the test verifies the scorer reads TargetModel from the LLMRequest; update the assertion referencing the capturedModel variable accordingly.
114-117: Return value ofScore()not asserted.The
Score()return value (the scores map) is discarded. While this scorer returns zero scores, asserting the map is non-nil and has expected length confirms the full code path executes correctly.Suggested improvement
- scorer.Score(ctx, cycleState, request, testEndpoints) + scores := scorer.Score(ctx, cycleState, request, testEndpoints) + require.Len(t, scores, len(testEndpoints)) require.Equal(t, tokenIDs, capturedTokens)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go` around lines 114 - 117, The test currently discards the return value of scorer.Score; update the call to capture the returned scores (e.g., scores := scorer.Score(ctx, cycleState, request, testEndpoints)) and add assertions that scores is not nil and has the expected length (e.g., len(scores) == len(tokenIDs)); optionally assert that each entry in scores is zero to match the scorer's behavior. Ensure you still keep the existing assertions for capturedTokens and capturedModel.pkg/sidecar/proxy/data_parallel.go (2)
52-85: Duplicate URL parsing and port computation in both loops.
decoderPort,rankPort, anddecoderURLare computed identically in the first loop (lines 52-61) and again in the second loop (lines 64-70). This duplicates allocations and error handling.Consolidate into single loop
+ type dpRankInfo struct { + rankPort string + hostPort string + decoderURL *url.URL + handler http.Handler + } + rankInfos := make([]dpRankInfo, 0, s.config.DataParallelSize-1) + // Fill in map of proxies, thus avoiding locks for idx := range s.config.DataParallelSize - 1 { decoderPort := strconv.Itoa(baseDecoderPort + idx + 1) rankPort := strconv.Itoa(basePort + idx + 1) hostPort := net.JoinHostPort(podIP, rankPort) decoderURL, err := url.Parse(decoderScheme + "://localhost:" + decoderPort) if err != nil { return err } handler := s.createDecoderProxyHandler(decoderURL, s.config.InsecureSkipVerifyForDecoder) s.dataParallelProxies[hostPort] = handler + rankInfos = append(rankInfos, dpRankInfo{rankPort: rankPort, hostPort: hostPort, decoderURL: decoderURL, handler: handler}) } - for idx := range s.config.DataParallelSize - 1 { - rankPort := strconv.Itoa(basePort + idx + 1) - decoderPort := strconv.Itoa(baseDecoderPort + idx + 1) - decoderURL, err := url.Parse(decoderScheme + "://localhost:" + decoderPort) - if err != nil { - return err - } - + for _, info := range rankInfos { clone := s.Clone() - clone.config.Port = rankPort - clone.config.DecoderURL = decoderURL + clone.config.Port = info.rankPort + clone.config.DecoderURL = info.decoderURL clone.forwardDataParallel = false grp.Go(func() error { - clone.logger = log.FromContext(ctx).WithName("proxy server on port " + rankPort) + clone.logger = log.FromContext(ctx).WithName("proxy server on port " + info.rankPort)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/data_parallel.go` around lines 52 - 85, The two loops duplicate computation and parsing of decoderPort, rankPort, and decoderURL; refactor into a single loop over s.config.DataParallelSize-1 that for each idx computes decoderPort and rankPort, builds decoderURL (using decoderScheme, podIP, baseDecoderPort, basePort), registers the proxy handler via s.createDecoderProxyHandler into s.dataParallelProxies[hostPort], then clones s (using s.Clone()), sets clone.config.Port = rankPort, clone.config.DecoderURL = decoderURL, clone.forwardDataParallel = false, configures clone.logger, clone.handler and clone.setKVConnector(), and finally spawns the goroutine with grp.Go that calls clone.startHTTP(ctx); ensure error from url.Parse is handled once per iteration.
19-21: Deprecation log message is confusing.The log fires when
DataParallelEndpointHeaderis present, but the message "Use Istio >= 1.28.1" doesn't explain the deprecation context or what action the user should take.- s.logger.Info("The use of the x-data-parallel-host-port is deprecated. Use Istio >= 1.28.1.") + s.logger.Info("Header-based data-parallel routing is deprecated; prefer Istio >= 1.28.1 service mesh routing", "header", common.DataParallelEndpointHeader)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/data_parallel.go` around lines 19 - 21, The deprecation log triggered when r.Header.Get(common.DataParallelEndpointHeader) returns a value is unclear; update the message in the block that checks dataParallelPodHostPort (and the surrounding code that calls s.logger.Info) to explicitly state that the x-data-parallel-host-port header is deprecated, explain the recommended replacement (e.g., remove the header and rely on Istio sidecar routing or upgrade to Istio >= 1.28.1), and include a brief action for users (e.g., "remove the header from requests or upgrade Istio to >=1.28.1 to enable built-in routing") so operators know what to change.pkg/sidecar/proxy/options_test.go (1)
142-154: Duplicate test case with identical expectations.Test cases "empty string does not set values" (lines 142-147) and "deprecated flags take precedence when InferencePool is empty" (lines 148-154) have identical inputs and expectations. The second case name suggests it should test different behavior.
♻️ Either remove duplicate or test actual deprecated flag precedence
{ name: "deprecated flags take precedence when InferencePool is empty", inferencePool: "", - expectedNamespace: "", - expectedName: "", + // If testing deprecated flags, set them and verify they're used: + // This test case needs additional setup for deprecated flags + expectedNamespace: "from-deprecated", + expectedName: "pool-from-deprecated", },Or remove the duplicate case if it's unintentional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/options_test.go` around lines 142 - 154, The test table contains two duplicate cases: "empty string does not set values" and "deprecated flags take precedence when InferencePool is empty" have identical inputs and expectations; update the second case to actually exercise deprecated-flag precedence by populating the deprecated flag fields (the fields representing the old namespace/name flags in the same test struct) and adjust expectedNamespace/expectedName to match those deprecated values, or remove the duplicate case entirely if it was unintentional; locate the test table in options_test.go and modify the entry with name "deprecated flags take precedence when InferencePool is empty" accordingly.pkg/plugins/scorer/precise_prefix_cache_test.go (1)
917-918:time.Sleepin test creates flakiness risk.Using
time.Sleep(500 * time.Millisecond)to wait for TTL expiration is fragile under CI load. Consider using polling withEventuallyor a test clock.♻️ Alternative: use polling
- // 3. Wait for TTL to expire and trigger eviction - time.Sleep(500 * time.Millisecond) - scorer.speculativeCache.DeleteExpired() + // 3. Wait for TTL to expire and trigger eviction + require.Eventually(t, func() bool { + scorer.speculativeCache.DeleteExpired() + item := scorer.speculativeCache.Get(request.RequestId) + return item == nil + }, 1*time.Second, 50*time.Millisecond)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/scorer/precise_prefix_cache_test.go` around lines 917 - 918, Replace the fragile time.Sleep(500 * time.Millisecond) by actively polling for TTL expiry: remove the call to time.Sleep and instead use a test polling/assertion (e.g. Gomega's Eventually) that repeatedly invokes scorer.speculativeCache.DeleteExpired (or advances the test clock if a clock is available) and checks the concrete observable that an entry was removed (e.g. the cache lookup/Contains/Get result for the key or scorer.speculativeCache.IsEmpty) until it becomes true or a timeout is reached; this ensures the test waits deterministically for DeleteExpired to have effect rather than relying on a fixed sleep.test/e2e/e2e_test.go (1)
779-807:runStreamingChatCompletionhardcodes model name, unlike its counterparts.
runStreamingCompletionaccepts atheModelparameter, butrunStreamingChatCompletionhardcodessimModelNameat line 783. This inconsistency limits test flexibility.Proposed fix for consistency
-func runStreamingChatCompletion(prompt string) (string, string) { +func runStreamingChatCompletion(prompt, modelName string) (string, string) { ginkgo.By(fmt.Sprintf("Sending Streaming Chat Completion Request: (port %s)", port)) // Use raw HTTP for streaming to capture headers - body := fmt.Sprintf(`{"model":"%s","messages":[{"role":"user","content":"%s"}],"stream":true}`, simModelName, prompt) + body := fmt.Sprintf(`{"model":"%s","messages":[{"role":"user","content":"%s"}],"stream":true}`, modelName, prompt)Update call sites at lines 203, etc. to pass
simModelNameexplicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/e2e_test.go` around lines 779 - 807, The runStreamingChatCompletion function currently hardcodes simModelName; change its signature to accept a model parameter (e.g., theModel string) and use that when building the request body instead of simModelName, mirroring runStreamingCompletion's API, and then update all call sites that invoke runStreamingChatCompletion to pass the desired model (e.g., simModelName) so tests remain flexible and consistent.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: anishasthana, Gregory-Pereira The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Cannot approve the pull request: Error: openshift-ci[bot] is not included in the approvers role in the OWNERS file |
cc @anishasthana @zdtsw
Summary by CodeRabbit
New Features
Improvements