Skip to content

Match documentation with default model in scripts#615

Merged
github-actions[bot] merged 1 commit intollm-d:mainfrom
shmuelk:dev-fix
Feb 12, 2026
Merged

Match documentation with default model in scripts#615
github-actions[bot] merged 1 commit intollm-d:mainfrom
shmuelk:dev-fix

Conversation

@shmuelk
Copy link
Collaborator

@shmuelk shmuelk commented Feb 12, 2026

This PR updates the example curl command to use the model now used in the development setups.

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
@nirrozenbaum
Copy link
Collaborator

/lgtm
/approve

@github-actions github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 12, 2026
@github-actions github-actions bot merged commit cdd8760 into llm-d:main Feb 12, 2026
10 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in llm-d-inference-scheduler Feb 12, 2026
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 23, 2026
* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)

* update kvc version import

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add go.mod to testable changes

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Use 1.3.0 CRDs (llm-d#586)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* free disk space on ci-release (llm-d#587)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Match documentation with default model in scripts (llm-d#615)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Test: LGTM Workflow Automation (#32)

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: open-pr

Tests that opening a PR triggers gatekeeper which blocks without lgtm label.

Test timestamp: 1771188042

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants