Commit 8dc2bff
Test: LGTM - open-pr (#71)
* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)
* update kvc version import
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
* add go.mod to testable changes
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
---------
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
* Use 1.3.0 CRDs (llm-d#586)
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* free disk space on ci-release (llm-d#587)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)
* feat: use Tinyllama as the "model" for kind test
- in order to test precies-prefix-cache-score we cannot use
fool-reviewer since it need call kv-cache-manager to get tokenizer by
getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
local test
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* update: revert back some config to keep using prefix-cache-scorer
- revert file renaming
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
---------
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* Update linter configuration (llm-d#588)
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* fix: config should use new precise-prefix-cache-scorer (llm-d#576)
- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
need migrate from the old one to the new one with spec.
- rename plugin name
- remove parameters.autoTune and parameters.mode: cache_tracking and
lruCapacityPerServer
- move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
KV and PD need both be enabled which is not done yet
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)
---
updated-dependencies:
- dependency-name: crate-ci/typos
dependency-version: 1.42.2
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Updated to more recent GIE (llm-d#592)
* Updated to more recent GIE
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* Updated to latest GIE and chnages due to review comments
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* Added a true mock SchedulerProfile
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* Exploited mock SchedulerProfile
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
---------
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* pull kvc v0.5.0 libs (llm-d#595)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)
---
updated-dependencies:
- dependency-name: crate-ci/typos
dependency-version: 1.43.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* address nil,nil return linter error in test mock (llm-d#598)
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)
Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).
Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)
Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)
---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
dependency-version: 2.28.1
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
dependency-version: 1.39.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: go-dependencies
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Models extractor (llm-d#553)
* Models extractor
Signed-off-by: irar2 <irar@il.ibm.com>
* Update register.go
Signed-off-by: Ira Rosen <irar@il.ibm.com>
* Updated for the newer GIE
Signed-off-by: irar2 <irar@il.ibm.com>
* Review comments
Signed-off-by: irar2 <irar@il.ibm.com>
* Check the scheme
Signed-off-by: irar2 <irar@il.ibm.com>
---------
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)
* feat: implement decode first flow on lmcache connector
- if cache_hit_threshold field is present in completion request, then we perform a decode first flow
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: error handling
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: add back todo comment
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: reduce code complexity and duplication
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: improve header copying
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: add comment explaning the cache_hit_threshold field and the new decode first flow
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: enhance logging for cache hit threshold in decode flow
- decrease verbosity for common log
- add cache_hit_threshold attribute
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: improve error handling and observability when failing to unmarshal decode response
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: add deleted informational comments
Signed-off-by: kyano <kyanokashi2@gmail.com>
* typo
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: make error logs more descriptive of the failure reason
Signed-off-by: kyano <kyanokashi2@gmail.com>
* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: typo
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: assign 0 cache_hit_threshold before final decode attempt
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: update comment according to feedback
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: remove istio workaround
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: set cache hit threshold to 0 in prefill request for consistent execution
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: update the log
Signed-off-by: kyano <kyanokashi2@gmail.com>
* feat: support online decoding
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: preserve request body in lmcache connector
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: support sse format for streamed decode
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: add and improve log descriptions
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: typo
Signed-off-by: kyano <kyanokashi2@gmail.com>
* nit: undo capitalization
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: typos
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: improve error log observability
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: encapsulate http error checking in function and reuse
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: encapsulate and reuse code better
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: lint error
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: improve code encapsulation and reduce duplication
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: rename and simplify SSE event signaling logic
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: rename lmcache to shared storage protocol
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: remove unused function
Signed-off-by: kyano <kyanokashi2@gmail.com>
* test: e2e tests
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* chore: claude gitignore
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* fix: sim deployment
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* feat: make linter running on new code configurable
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* fix: lint errors
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
---------
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)
* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* update version of GIE + fix lint
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update docs/architecture.md
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* code review
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* code review
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* update version of GIE, update prefix_disagr_decider accordingly
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix typo
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix PD for short inputs
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update docs/architecture.md
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update pkg/plugins/profile/always_disaggr_decider.go
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update pkg/plugins/profile/always_disaggr_decider.go
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update pkg/plugins/profile/prefix_disagg_decider.go
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* updates according the PR comments
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix test
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix e2e tests
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* changes according the pr comments
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix e2e test
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* add explanation about pd deciders to disagg_pd doc
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* rename always_disaggr_decider to always_disagg_decider
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
---------
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
* chore: fix wrong port for NIXL (llm-d#593)
- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)
* fix: resolve JSON serialization error in active-request-scorer debug logs
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
* feat: Add raw scores to debug
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
---------
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
* Match documentation with default model in scripts (llm-d#615)
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* Test: LGTM Workflow Automation (#32)
* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)
* feat: use Tinyllama as the "model" for kind test
- in order to test precies-prefix-cache-score we cannot use
fool-reviewer since it need call kv-cache-manager to get tokenizer by
getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
local test
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* update: revert back some config to keep using prefix-cache-scorer
- revert file renaming
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
---------
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* Update linter configuration (llm-d#588)
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* fix: config should use new precise-prefix-cache-scorer (llm-d#576)
- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
need migrate from the old one to the new one with spec.
- rename plugin name
- remove parameters.autoTune and parameters.mode: cache_tracking and
lruCapacityPerServer
- move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
KV and PD need both be enabled which is not done yet
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)
---
updated-dependencies:
- dependency-name: crate-ci/typos
dependency-version: 1.42.2
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Updated to more recent GIE (llm-d#592)
* Updated to more recent GIE
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* Updated to latest GIE and chnages due to review comments
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* Added a true mock SchedulerProfile
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* Exploited mock SchedulerProfile
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
---------
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* pull kvc v0.5.0 libs (llm-d#595)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)
---
updated-dependencies:
- dependency-name: crate-ci/typos
dependency-version: 1.43.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* address nil,nil return linter error in test mock (llm-d#598)
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)
Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).
Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)
Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)
---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
dependency-version: 2.28.1
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
dependency-version: 1.39.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: go-dependencies
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Models extractor (llm-d#553)
* Models extractor
Signed-off-by: irar2 <irar@il.ibm.com>
* Update register.go
Signed-off-by: Ira Rosen <irar@il.ibm.com>
* Updated for the newer GIE
Signed-off-by: irar2 <irar@il.ibm.com>
* Review comments
Signed-off-by: irar2 <irar@il.ibm.com>
* Check the scheme
Signed-off-by: irar2 <irar@il.ibm.com>
---------
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)
* feat: implement decode first flow on lmcache connector
- if cache_hit_threshold field is present in completion request, then we perform a decode first flow
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: error handling
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: add back todo comment
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: reduce code complexity and duplication
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: improve header copying
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: add comment explaning the cache_hit_threshold field and the new decode first flow
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: enhance logging for cache hit threshold in decode flow
- decrease verbosity for common log
- add cache_hit_threshold attribute
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: improve error handling and observability when failing to unmarshal decode response
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: add deleted informational comments
Signed-off-by: kyano <kyanokashi2@gmail.com>
* typo
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: make error logs more descriptive of the failure reason
Signed-off-by: kyano <kyanokashi2@gmail.com>
* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: typo
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: assign 0 cache_hit_threshold before final decode attempt
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: update comment according to feedback
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: remove istio workaround
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: set cache hit threshold to 0 in prefill request for consistent execution
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: update the log
Signed-off-by: kyano <kyanokashi2@gmail.com>
* feat: support online decoding
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: preserve request body in lmcache connector
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: support sse format for streamed decode
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: add and improve log descriptions
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: typo
Signed-off-by: kyano <kyanokashi2@gmail.com>
* nit: undo capitalization
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: typos
Signed-off-by: kyano <kyanokashi2@gmail.com>
* chore: improve error log observability
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: encapsulate http error checking in function and reuse
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: encapsulate and reuse code better
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: lint error
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: improve code encapsulation and reduce duplication
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: rename and simplify SSE event signaling logic
Signed-off-by: kyano <kyanokashi2@gmail.com>
* refactor: rename lmcache to shared storage protocol
Signed-off-by: kyano <kyanokashi2@gmail.com>
* fix: remove unused function
Signed-off-by: kyano <kyanokashi2@gmail.com>
* test: e2e tests
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* chore: claude gitignore
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* fix: sim deployment
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* feat: make linter running on new code configurable
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* fix: lint errors
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
---------
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)
* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* update version of GIE + fix lint
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update docs/architecture.md
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* code review
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* code review
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* update version of GIE, update prefix_disagr_decider accordingly
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix typo
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix PD for short inputs
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update docs/architecture.md
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update pkg/plugins/profile/always_disaggr_decider.go
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update pkg/plugins/profile/always_disaggr_decider.go
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* Update pkg/plugins/profile/prefix_disagg_decider.go
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* updates according the PR comments
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix test
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix e2e tests
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* changes according the pr comments
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* fix e2e test
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* add explanation about pd deciders to disagg_pd doc
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
* rename always_disaggr_decider to always_disagg_decider
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
---------
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
* chore: fix wrong port for NIXL (llm-d#593)
- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)
* fix: resolve JSON serialization error in active-request-scorer debug logs
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
* feat: Add raw scores to debug
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
---------
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
* Implement "LGTM" ChatOps Workflow.
Signed-off-by: Revital Sur <eres@il.ibm.com>
* test
Signed-off-by: Revital Sur <eres@il.ibm.com>
* Lgtm2 (#17)
* Implement "LGTM" ChatOps Workflow.
Signed-off-by: Revital Sur <eres@il.ibm.com>
* test
Signed-off-by: Revital Sur <eres@il.ibm.com>
---------
Signed-off-by: Revital Sur <eres@il.ibm.com>
* test
* test: automated LGTM workflow test (#19)
This PR tests the /lgtm command workflow automation.
Test suite: all
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* test: automated LGTM workflow test (#20)
This PR tests the /lgtm command workflow automation.
Test suite: all
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* test: automated LGTM workflow test (#21)
This PR tests the /lgtm command workflow automation.
Test suite: all
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* test: automated LGTM workflow test (#22)
This PR tests the /lgtm command workflow automation.
Test suite: reset
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* test
Signed-off-by: Revital Sur <eres@il.ibm.com>
* test: automated LGTM workflow test (#24)
This PR tests the /lgtm command workflow automation.
Test suite: reset
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* test
Signed-off-by: Revital Sur <eres@il.ibm.com>
* test: automated LGTM workflow test (#26)
This PR tests the /lgtm command workflow automation.
Test suite: reset
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* test
Signed-off-by: Revital Sur <eres@il.ibm.com>
* Address review comments.
Signed-off-by: Revital Sur <eres@il.ibm.com>
* test: automated LGTM workflow test
This PR tests the /lgtm command workflow automation.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
---------
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* test
Signed-off-by: Revital Sur <eres@il.ibm.com>
* test: open-pr
Tests that opening a PR triggers gatekeeper which blocks without lgtm label.
Test timestamp: 1771188042
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>1 parent a0c8d17 commit 8dc2bff
File tree
18 files changed
+294
-32
lines changed- .github/workflows
18 files changed
+294
-32
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | 30 | | |
32 | 31 | | |
33 | 32 | | |
| |||
This file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
355 | 355 | | |
356 | 356 | | |
357 | 357 | | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
0 commit comments