Test: LGTM - open-pr (#71)

revit13 · vMaroon · shmuelk · web-flow · commit 8dc2bfffa9ef · 2026-02-23T17:10:16.000Z
* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584) * update kvc version import Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add go.mod to testable changes Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Use 1.3.0 CRDs (llm-d#586) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * free disk space on ci-release (llm-d#587) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Match documentation with default model in scripts (llm-d#615) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Test: LGTM Workflow Automation (#32) * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Lgtm2 (#17) * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Revital Sur <eres@il.ibm.com> * test * test: automated LGTM workflow test (#19) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#20) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#21) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#22) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#24) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#26) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test This PR tests the /lgtm command workflow automation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: open-pr Tests that opening a PR triggers gatekeeper which blocks without lgtm label. Test timestamp: 1771188042 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
diff --git a/.github/workflows/dispatch-on-lgtm.yml b/.github/workflows/dispatch-on-lgtm.yml
@@ -0,0 +1,18 @@
+name: ChatOps Dispatcher
+on:
+  issue_comment:
+    types: [created]
+
+jobs:
+  dispatch:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Slash Command Dispatch
+        uses: peter-evans/slash-command-dispatch@v3
+        with:
+          token: ${{ secrets.GITHUB_TOKEN }}
+          commands: lgtm
+          # 1. Basic Security: Only allow users with 'write' access to even trigger this
+          permission: write
+          issue-type: pull-request
+          reactions: false
diff --git a/.github/workflows/lgtm-command.yml b/.github/workflows/lgtm-command.yml
@@ -0,0 +1,134 @@
+# ============================================================================
+# LGTM Command Worker
+# ============================================================================
+# Handles /lgtm commands from chatops-dispatcher
+#
+# Flow:
+# 1. User comments /lgtm on PR
+# 2. chatops-dispatcher catches it and dispatches here
+# 3. This workflow:
+#    - Verifies user is listed under 'reviewers' in OWNERS file
+#    - Checks if PR is draft
+#    - Checks for blocking labels (hold)
+#    - Adds lgtm label
+#    - Enables auto-merge
+# ============================================================================
+
+name: LGTM Command Worker
+on:
+  repository_dispatch:
+    types: [lgtm-command]
+
+env:
+  BLOCKING_LABELS: "hold"
+
+jobs:
+  apply-lgtm:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+      pull-requests: write
+      issues: write
+    steps:
+      - uses: actions/checkout@v4
+      - uses: tibdex/github-app-token@v1
+        id: generate-token
+        with:
+          app_id: ${{ secrets.VLLMD_BOT_APP_ID }}
+          private_key: ${{ secrets.VLLMD_BOT_APP_PRIVATE_KEY }}
+          repository: ${{ github.repository }}
+
+      # -----------------------------------------------------------------------
+      # STEP 1: AUTHORIZATION - Verify user is in OWNERS file
+      # -----------------------------------------------------------------------
+      # Only users listed as reviewers in OWNERS can use /lgtm
+      # This prevents unauthorized users from applying the lgtm label
+      - name: Check Permissions
+        env:
+          ACTOR: ${{ github.event.client_payload.github.actor }}
+        run: |
+          # Extract only the reviewers section and check if ACTOR is listed
+          REVIEWERS=$(sed -n '/^reviewers:/,/^[^ -]/p' OWNERS | grep -v '^reviewers:')
+          if echo "$REVIEWERS" | grep -q "^\s*-\s*$ACTOR\s*$"; then
+            echo "User $ACTOR is authorized"
+          else
+            echo "::error:: User $ACTOR is not a reviewer."
+            exit 1
+          fi
+
+      # -----------------------------------------------------------------------
+      # STEP 2: VALIDATION - Check if PR is in draft mode
+      # -----------------------------------------------------------------------
+      # Draft PRs cannot be approved - they must be marked ready for review
+      # This prevents accidental approval of incomplete work
+      - name: Check Draft Status
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_NUMBER: ${{ github.event.client_payload.github.payload.issue.number }}
+          REPO: ${{ github.repository }}
+        run: |
+          IS_DRAFT=$(gh pr view $PR_NUMBER --repo "$REPO" --json isDraft --jq '.isDraft')
+          if [ "$IS_DRAFT" = "true" ]; then
+            echo "::error:: Cannot LGTM a Draft PR."
+            gh issue comment $PR_NUMBER --repo "$REPO" --body "⚠️ **LGTM Failed**: PR is a Draft."
+            exit 1
+          fi
+
+      # -----------------------------------------------------------------------
+      # STEP 3: BLOCKING LABELS - Check for hold
+      # -----------------------------------------------------------------------
+      # If any blocking label exists, fail immediately
+      # This prevents approving PRs that are explicitly marked as not ready
+      - name: Check for Blocking Labels
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_URL: ${{ github.event.client_payload.github.payload.issue.html_url }}
+        run: |
+          LABELS=$(gh pr view "$PR_URL" --json labels --jq '.labels[].name')
+          if echo "$LABELS" | grep -Eiq "^($BLOCKING_LABELS)$"; then
+            echo "::error:: PR is blocked by label."
+            gh issue comment "$PR_URL" --body "  **Merge Blocked**: Please remove the \`hold\` label before merging."
+            exit 1
+          fi
+
+      # -----------------------------------------------------------------------
+      # STEP 4: APPLY LGTM - Add label, wait, then enable auto-merge
+      # -----------------------------------------------------------------------
+      # 1. Add lgtm label (triggers gatekeeper to validate)
+      # 2. Enable auto-merge (PR will merge when all checks pass)
+      - name: Apply or Cancel LGTM
+        env:
+          GH_TOKEN: ${{ steps.generate-token.outputs.token }}
+          PR_NUMBER: ${{ github.event.client_payload.github.payload.issue.number }}
+          PR_URL: ${{ github.event.client_payload.github.payload.issue.html_url }}
+          # Extract the full comment body from the dispatcher payload
+          COMMENT_BODY: ${{ github.event.client_payload.github.payload.comment.body }}
+        run: |
+          # Check if the command is a cancellation
+          if echo "$COMMENT_BODY" | grep -q "/lgtm cancel"; then
+            echo "🚨 Retracting LGTM status..."
+
+            # 1. Remove lgtm label
+            gh issue edit "$PR_NUMBER" --remove-label "lgtm" || echo "Label already gone"
+
+            # 2. Disable Auto-Merge
+            gh pr merge --disable-auto "$PR_URL" || echo "Auto-merge was not enabled"
+
+            # 3. Notify user
+            gh issue comment "$PR_URL" --body "Retracted: **LGTM** label removed and auto-merge disabled by @$ACTOR."
+          
+          else
+            echo "✅ Applying LGTM status..."
+
+            # 1. Add lgtm label
+            gh issue edit "$PR_NUMBER" --add-label "lgtm"
+
+            # 2. Enable auto-merge (Squash)
+            if ! gh pr merge --auto --squash "$PR_URL" 2>&1 | tee merge_output.txt; then
+              ERROR_MSG=$(cat merge_output.txt)
+              gh issue comment "$PR_URL" --body "⚠️ **Auto-merge failed**: $ERROR_MSG"
+              exit 1
+            fi
+
+            gh issue comment "$PR_URL" --body "✅ **LGTM**: auto-merge enabled."
+          fi
diff --git a/.github/workflows/lgtm-gatekeeper.yml b/.github/workflows/lgtm-gatekeeper.yml
@@ -0,0 +1,57 @@
+# ============================================================================
+# LGTM Gatekeeper - Required Status Check
+# ============================================================================
+# Rules Enforced:
+# 1. PR MUST have "lgtm" and "approve" labels
+# 2. PR MUST NOT have blocking labels (hold)
+# ============================================================================
+
+name: LGTM Gatekeeper
+on:
+  pull_request:
+    # Run on PR open/reopen and label changes
+    # NOT on synchronize (handled by lgtm-reset.yml)
+    types: [opened, labeled, unlabeled, reopened]
+
+env:
+  BLOCKING_LABELS: "hold"
+
+jobs:
+  validate-pr:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Enforce LGTM & Blockers
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_NUMBER: ${{ github.event.pull_request.number }}
+          REPO: ${{ github.repository }}
+        run: |
+          # Fetch current labels
+          LABELS=$(gh pr view $PR_NUMBER --repo "$REPO" --json labels --jq '.labels[].name')
+          
+          # Check 1: IS IT BLOCKED?
+          if echo "$LABELS" | grep -Eiq "^($BLOCKING_LABELS)$"; then
+            echo "::error:: ⛔ FAILED: PR is blocked by a 'hold' label."
+            exit 1
+          fi
+
+          # Check 2: IS IT APPROVED? (both lgtm and approve required)
+          HAS_LGTM=false
+          HAS_APPROVE=false
+
+          if echo "$LABELS" | grep -Fqx "lgtm"; then
+            HAS_LGTM=true
+          fi
+
+          if echo "$LABELS" | grep -Fqx "approve"; then
+            HAS_APPROVE=true
+          fi
+
+          if [ "$HAS_LGTM" = false ] || [ "$HAS_APPROVE" = false ]; then
+            echo "::error:: ⛔ FAILED: PR requires both 'lgtm' and 'approve' labels."
+            echo "  - lgtm: $HAS_LGTM"
+            echo "  - approve: $HAS_APPROVE"
+            exit 1
+          fi
+          
+          echo "✅ PASSED: Both lgtm and approve labels present, no blockers."
diff --git a/.github/workflows/lgtm-reset.yml b/.github/workflows/lgtm-reset.yml
@@ -0,0 +1,46 @@
+# ============================================================================
+# LGTM Reset - Auto-Remove LGTM on New Commits
+# ============================================================================
+# Kubernetes Prow behavior: When new commits are pushed, approval is invalidated
+#
+# What It Does:
+# 1. Detects when new commits are pushed to a PR
+# 2. Removes the "lgtm" label (if present)
+# 3. Disables auto-merge (safety net)
+# 4. Posts a comment explaining why
+# ============================================================================
+
+name: LGTM Reset
+on:
+  pull_request:
+    types: [synchronize] # Triggers instantly on new commits
+
+jobs:
+  reset-lgtm:
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: write
+    steps:
+      - uses: tibdex/github-app-token@v1
+        id: generate-token
+        with:
+          app_id: ${{ secrets.VLLMD_BOT_APP_ID }}
+          private_key: ${{ secrets.VLLMD_BOT_APP_PRIVATE_KEY }}
+          repository: ${{ github.repository }}
+      - name: Invalidate LGTM
+        env:
+          GH_TOKEN: ${{ steps.generate-token.outputs.token }}
+          PR_NUMBER: ${{ github.event.pull_request.number }}
+          REPO: ${{ github.repository }}
+        run: |
+          echo "🚨 New code pushed. Resetting LGTM status..."
+
+          # 1. Remove the label (This triggers the Gatekeeper to run again)
+          gh pr edit $PR_NUMBER --repo "$REPO" --remove-label "lgtm" || true
+
+          # 2. Disable Auto-Merge (Safety net)
+          gh pr merge --disable-auto $PR_NUMBER --repo "$REPO" || true
+
+          # 3. Notify user
+          gh issue comment $PR_NUMBER --repo "$REPO" --body "🔄 **Reset**: New commits pushed. LGTM removed."
+          
diff --git a/.github/workflows/prow-github.yml b/.github/workflows/prow-github.yml
@@ -18,7 +18,7 @@ jobs:
     steps:
       - uses: jpmcb/prow-github-actions@v2.0.0
         with:
-          github-token: "${{ secrets.GITHUB_TOKEN }}"
+          github-token: "${{ secrets.BOT_TOKEN }}"
           prow-commands: "/assign
             /unassign
             /approve
@@ -27,7 +27,6 @@ jobs:
             /kind
             /priority
             /remove
-            /lgtm
             /close
             /reopen
             /lock
diff --git a/.github/workflows/prow-pr-automerge.yml b/.github/workflows/prow-pr-automerge.yml
diff --git a/.github/workflows/prow-pr-remove-lgtm.yml b/.github/workflows/prow-pr-remove-lgtm.yml
diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
@@ -355,3 +355,31 @@ helm uninstall kgateway-crds -n kgateway-system
 ```
 
 For more details, see the Gateway API inference Extension [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/)
+
+## PR Approval Process
+
+The project uses a Prow-inspired system to manage PR approvals. The process works as follows:
+
+### Approving a PR
+
+To approve a PR, comment /lgtm. The system will then:
+
+- Authorize: Verify you are an approved reviewer in the OWNERS file.
+- Validate: Ensure the PR is not a Draft and has no blocking labels.
+- Finalize: Apply the lgtm label and trigger auto-merge (squash).
+
+The PR will merge automatically once all required status checks pass.
+
+### Approval Reset on New Commits
+
+When new commits are pushed to an approved PR, the `lgtm` label is automatically removed and auto-merge is disabled. This ensures approvals always reflect the latest code. The author must request a new `/lgtm` after pushing changes.
+
+### Blocking Labels
+
+The following labels prevent a PR from being approved or merged:
+
+- `hold` — PR is on hold
+- `wip` — Work in progress
+- `do-not-merge` — Explicitly blocked from merging
+
+Remove these labels before requesting `/lgtm`.
diff --git a/OWNERS b/OWNERS
@@ -1,4 +1,5 @@
 approvers:
+- revit13
 - elevran
 - kfswain
 - nilig
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 
 # Inference Scheduler
 
-This scheduler makes optimized routing decisions for inference requests to
+This schedulejjr makes optimized routing decisions for inference requests to
 the llm-d inference framework.
 
 ## About
diff --git a/test-lgtm-1770898248.txt b/test-lgtm-1770898248.txt
@@ -0,0 +1 @@
+This is a test file for LGTM workflow automation - Thu Feb 12 14:10:50 IST 2026
diff --git a/test-lgtm-1770899120.txt b/test-lgtm-1770899120.txt
@@ -0,0 +1 @@
+This is a test file for LGTM workflow automation - Thu Feb 12 14:25:22 IST 2026
diff --git a/test-lgtm-1770899284.txt b/test-lgtm-1770899284.txt
@@ -0,0 +1 @@
+This is a test file for LGTM workflow automation - Thu Feb 12 14:28:06 IST 2026
diff --git a/test-lgtm-1770900845.txt b/test-lgtm-1770900845.txt
@@ -0,0 +1 @@
+This is a test file for LGTM workflow automation - Thu Feb 12 14:54:07 IST 2026
diff --git a/test-lgtm-1770901687.txt b/test-lgtm-1770901687.txt
@@ -0,0 +1 @@
+This is a test file for LGTM workflow automation - Thu Feb 12 15:08:08 IST 2026
diff --git a/test-lgtm-1770902441.txt b/test-lgtm-1770902441.txt
@@ -0,0 +1 @@
+This is a test file for LGTM workflow automation - Thu Feb 12 15:20:43 IST 2026
diff --git a/test-lgtm-1771122992.txt b/test-lgtm-1771122992.txt
@@ -0,0 +1 @@
+This is a test file for LGTM workflow automation - Sun Feb 15 04:36:34 IST 2026
diff --git a/test-open-pr-1771188042.txt b/test-open-pr-1771188042.txt
@@ -0,0 +1 @@
+Test file for: open-pr - Sun Feb 15 22:40:45 IST 2026

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,5 @@`
`1`	`1`	`approvers:`
	`2`	`+- revit13`
`2`	`3`	`- elevran`
`3`	`4`	`- kfswain`
`4`	`5`	`- nilig`