Skip to content

Extend support for different ways to decide if disaggregated PD is required#531

Merged
github-actions[bot] merged 21 commits intollm-d:mainfrom
mayabar:pd-decider
Feb 10, 2026
Merged

Extend support for different ways to decide if disaggregated PD is required#531
github-actions[bot] merged 21 commits intollm-d:mainfrom
mayabar:pd-decider

Conversation

@mayabar
Copy link
Contributor

@mayabar mayabar commented Jan 5, 2026

Update PD profile handler to allow easier support for different ways to decide whether disaggregated PD is required.
Use prefix data stored in prepare data stage.
Currently supports prefix plugin, next step is to add support for precise prefix scorer plugin (#530)

Fixes #519

This PR should be merges after PR in GIE which stores in prefix scorer data number of tokens instead of number of blocks.

The GIE PR contains change in prefix plugin configuration structure, waiting with this PR until it will be merged.

@mayabar mayabar changed the title Extend support for different ways to decide if disaggregated PD is required [WIP] Extend support for different ways to decide if disaggregated PD is required Jan 6, 2026
@elevran
Copy link
Collaborator

elevran commented Jan 22, 2026

@mayabar can you kindly rebase and resolve conflicts?

@elevran elevran added this to the v0.6 milestone Jan 22, 2026
@mayabar mayabar changed the title [WIP] Extend support for different ways to decide if disaggregated PD is required Extend support for different ways to decide if disaggregated PD is required Feb 2, 2026
elevran
elevran previously approved these changes Feb 2, 2026
@elevran elevran self-requested a review February 2, 2026 20:07
@elevran
Copy link
Collaborator

elevran commented Feb 2, 2026

LGTM with the exception of failing tests

@mayabar
Copy link
Contributor Author

mayabar commented Feb 4, 2026

@elevran updated according your comments + tests are ok now

@elevran
Copy link
Collaborator

elevran commented Feb 5, 2026

/lgtm
/approve

@github-actions github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2026
github-actions[bot]
github-actions bot previously approved these changes Feb 5, 2026
@elevran
Copy link
Collaborator

elevran commented Feb 5, 2026

@shmuelk could you please clear the "requested changes" state?

- type: prefix-cache-scorer
parameters:
blockSizeTokens: 8
autoTune: false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not turn this one instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that turning this on could be confusing for the users since the block size will depends on the value defined in the vllm/simulator.
Removed this since false is the default.

- pluginRef: max-score-picker
- pluginRef: prefix-cache-scorer
weight: 2
- pluginRef: queue-scorer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do some evaluation here, kv-cache utilization might be useful to balance on across decode workers.

Copy link
Contributor Author

@mayabar mayabar Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, there could be multiple factors, including kv-cache utilization that are useful for balancing decode workers. I'll open an issue for the evaluation (#604)

- type: prefill-header-handler
- type: prefix-cache-scorer
parameters:
blockSizeTokens: 8
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this changed from the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed for easier testing, reverted


const (
// AlwaysDisaggrDeciderPluginType is the type-name of the alwaysDisaggrPDDecider plugin.
AlwaysDisaggrDeciderPluginType = "always-disaggr-pd-decider"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disaggr is an uncommon abbreviation of disaggregation, I recommend to fully spell it or use "disagg"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

type pdDeciderPlugin interface {
plugin.Plugin
// shouldDisaggregate checks if disaggregated PD is required for the given request and endpoint.
shouldDisaggregate(ctx context.Context, inputTokens int, endpoint scheduling.Endpoint) bool
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is more idiomatic to name it only disaggregate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)

const (
// PrefixBasedPDDeciderPluginType is the type-name of the prefixBasedPDDecider plugin.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to document the behavior of the plugin, how it decides when to disaggregate and when not to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to disagg_pd.md

// number of cached blocks
hitPrefixTokens := float64(prefixCacheMatchInfo.MatchBlocks() * prefixCacheMatchInfo.BlockSizeTokens())
// length of the cached part in percentages
hitPercentagePrefix := hitPrefixTokens / float64(inputTokens)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we calculating this? the threshold is in absolute tokens. We calculate the percentage here and in line 125 below we multiply it back by the inputTokens when we compare with the threshold, why not just compare with hitPrefixTokens directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

…cision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
mayabar and others added 11 commits February 8, 2026 21:17
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
…d and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
@mayabar
Copy link
Contributor Author

mayabar commented Feb 9, 2026

@ahg-g I addressed your comments, can you please review

@ahg-g
Copy link

ahg-g commented Feb 9, 2026

Thanks, looks good with one minor issue, the file name "always_disaggr_decider.go‎" includes the abbreviation disaggr instead of disagg

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
@ahg-g
Copy link

ahg-g commented Feb 9, 2026

/lgtm

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Cannot apply the lgtm label because Error: ahg-g is not included in the reviewers role in the OWNERS file

@elevran
Copy link
Collaborator

elevran commented Feb 9, 2026

@shmuelk please resubmit a review using GH UI - currently blocked on your "Request Changes".
/lgtm

@elevran
Copy link
Collaborator

elevran commented Feb 9, 2026

/approve

@github-actions github-actions bot merged commit 3ed3d5c into llm-d:main Feb 10, 2026
8 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in llm-d-inference-scheduler Feb 10, 2026
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 15, 2026
* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 16, 2026
* chore: bump gie to v1.2.1 (llm-d#504)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* deps(go): bump sigs.k8s.io/gateway-api in the kubernetes group (llm-d#508)

Bumps the kubernetes group with 1 update: [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api).


Updates `sigs.k8s.io/gateway-api` from 1.4.0 to 1.4.1
- [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/gateway-api@v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/gateway-api
  dependency-version: 1.4.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump the go-dependencies group with 3 updates (llm-d#507)

Bumps the go-dependencies group with 3 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo), [github.com/onsi/gomega](https://github.com/onsi/gomega) and [golang.org/x/sync](https://github.com/golang/sync).


Updates `github.com/onsi/ginkgo/v2` from 2.27.2 to 2.27.3
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.2...v2.27.3)

Updates `github.com/onsi/gomega` from 1.38.2 to 1.38.3
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.38.2...v1.38.3)

Updates `golang.org/x/sync` from 0.18.0 to 0.19.0
- [Commits](golang/sync@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.38.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: golang.org/x/sync
  dependency-version: 0.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Miscellaneous dependency updates (llm-d#510)

* Miscelaneous dependency updates

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Use latest GIE CRDs

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Fixed references to kv-cache-manager

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* deps(go): bump the kubernetes group with 5 updates (llm-d#513)

Bumps the kubernetes group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [k8s.io/api](https://github.com/kubernetes/api) | `0.34.2` | `0.34.3` |
| [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.2` | `0.34.3` |
| [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.2` | `0.34.3` |
| [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.2` | `0.34.3` |
| [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.2` | `0.34.3` |


Updates `k8s.io/api` from 0.34.2 to 0.34.3
- [Commits](kubernetes/api@v0.34.2...v0.34.3)

Updates `k8s.io/apiextensions-apiserver` from 0.34.2 to 0.34.3
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases)
- [Commits](kubernetes/apiextensions-apiserver@v0.34.2...v0.34.3)

Updates `k8s.io/apimachinery` from 0.34.2 to 0.34.3
- [Commits](kubernetes/apimachinery@v0.34.2...v0.34.3)

Updates `k8s.io/client-go` from 0.34.2 to 0.34.3
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.34.2...v0.34.3)

Updates `k8s.io/component-base` from 0.34.2 to 0.34.3
- [Commits](kubernetes/component-base@v0.34.2...v0.34.3)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apiextensions-apiserver
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apimachinery
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/client-go
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/component-base
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix kind-dev-env.sh (llm-d#512)

Running `make env-dev-kind` will fail if the vllm simulator image hasn't
been already pulled.

This fixes it by skipping the manual load & save of the image unless we're
dealing with a custom locally built image (using the dev tag).

The kubelet will anyway pull the right image when deploying the pod.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* test: add precise_prefix_cache_test (llm-d#505)

* test: add precise_prefix_cache_test

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* test: add precise_prefix_cache_test

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* test: reuse upstream data store and enable logr in unit tests (llm-d#518)

* enable logr in ut

Signed-off-by: MregXN <mregxn@gmail.com>

* fix package impoert order

Signed-off-by: MregXN <mregxn@gmail.com>

* apply comments

Signed-off-by: MregXN <mregxn@gmail.com>

---------

Signed-off-by: MregXN <mregxn@gmail.com>

* feat: allow pd_profile_handler to handle diverse plugin types (llm-d#516)

* Store the precise prefix cache score in cycleState.

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit test code

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

---------

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* deps(actions): bump crate-ci/typos from 1.40.0 to 1.40.1 (llm-d#526)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.0 to 1.40.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.40.0...v1.40.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.40.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump google.golang.org/grpc in the go-dependencies group (llm-d#527)

Bumps the go-dependencies group with 1 update: [google.golang.org/grpc](https://github.com/grpc/grpc-go).


Updates `google.golang.org/grpc` from 1.77.0 to 1.78.0
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.77.0...v1.78.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.78.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(metrics): add model_name label to PD decision metric (llm-d#528)

Signed-off-by: CYJiang <googs1025@gmail.com>

* deps(actions): bump crate-ci/typos from 1.40.1 to 1.41.0 (llm-d#532)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.1 to 1.41.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.40.1...v1.41.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Configure dependabot ignores Go version updates (llm-d#533)

* dependabot ignores Go version updates

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* allow semver patch level updates to Go

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

---------

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* Updates the architecture description with reference to BBR and support for multiple GenAI models and LoRAs to remove confusion about llm-d only supporing one model per cluster (llm-d#525)

* finer control over package updates (llm-d#542)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* port auto-assign action from llm-d-kv-cache (llm-d#551)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* refactor: set python version and pin docker image with tag (llm-d#543)

- default set to 3.12 for python
- set 9.7(the current latest) for ubi image

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* chore(test): update API version for nixl test (llm-d#555)

- extentionRef was in old v1alpha2, in v1 it should be updated to
  endpointPickerRef
- remove InferenceModel
- update docs for test/sidecar

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#558)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.3 to 2.27.4
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.3...v2.27.4)

Updates `github.com/onsi/gomega` from 1.38.3 to 1.39.0
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.38.3...v1.39.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(actions): bump crate-ci/typos from 1.41.0 to 1.42.0 (llm-d#557)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.41.0 to 1.42.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.41.0...v1.42.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(actions): bump actions/checkout from 4 to 6 (llm-d#556)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update auto-assign logic (llm-d#560)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* remove newline in unsigned commit message (llm-d#561)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* bump gie to v1.3.0 rc2 (llm-d#562)

* update OWNERS (llm-d#559)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* refactor: Makefile, update docs (llm-d#463)

* refactor: Makefile, update docs

- split Makefile
  1. tools: include install tools, check tools, download dependency(gcc
     etc) and tokenizer. these will be download into "bin" folder than
     global path
  2. cluster: include k8s and ocp
  3. kind
- rename "openshift-base" to "kubernetes-base" to be clear for purpose
- uplift Go lint version to 2.1.6 to align with the same one set in
  Github Action
- rename make targets for better visibility, deprcating old ones
- add more print in "make env"

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: code review

- move image tags from Makefile.tools.mk back to Makefile
- update docuement to reflact how image and tag are created
- do not export image tag env variables IMG_TAG
- fix patch-deployments.yaml after EPP_TAG is not used but should only
  use EPP_IMAGE
- fix kubernetes-dev-env.sh for EPP_IMAGE
- remove flag on golangci_lint fmt

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* code review:

- revert back to 1.3.0
- remove comments
- set default as default namespace

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update Makefile

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* docs: fix broken link in the docs

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>

* feat: add metrics validation in e2e test (llm-d#529)

Signed-off-by: CYJiang <googs1025@gmail.com>

* feat: make no-hit-lru P/D-aware (llm-d#522)

* feat: make no-hit-lru P/D-aware

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* hardcode prefill profile

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* remove spammy log

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* apply suggestions

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* Update disaggregated Prefill/Decode inference serving documentation (llm-d#571)

* update pd docs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typos

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 (llm-d#572)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.0 to 1.42.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.0...v1.42.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump github.com/onsi/ginkgo/v2 in the go-dependencies group (llm-d#573)

Bumps the go-dependencies group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo).


Updates `github.com/onsi/ginkgo/v2` from 2.27.4 to 2.27.5
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.4...v2.27.5)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix reviewers auto assign minor bug (llm-d#575)

* fix(scorer): make active request pd aware (llm-d#569)

* fix: decrement all pods on request complete instead of only final pod

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: append all pod endpoints from profile results

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* test(e2e): cleanup kind cluster (llm-d#563)

- if e2e-tests cluster exist, it fails to run "make test-e2e"
- main cleanup should be done in AfterSuite() call
- in certain case(kill/terminate) cluster might remain locally
  this PR is to add trap to preperly clean i up

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* refactor: add early validation in DP profile handler (llm-d#554)

- validate number of schedulingProfiles in EPP to be 1 otherwise return
  empty map to reduce computation on filter and scores.
- add unit test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(go): bump the kubernetes group with 2 updates (llm-d#574)

Bumps the kubernetes group with 2 updates: [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) and [sigs.k8s.io/gateway-api-inference-extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension).


Updates `sigs.k8s.io/controller-runtime` from 0.22.4 to 0.22.5
- [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases)
- [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/controller-runtime@v0.22.4...v0.22.5)

Updates `sigs.k8s.io/gateway-api-inference-extension` from 1.3.0-rc.2 to 1.3.0-rc.3
- [Release notes](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/gateway-api-inference-extension@v1.3.0-rc.2...v1.3.0-rc.3)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/controller-runtime
  dependency-version: 0.22.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: sigs.k8s.io/gateway-api-inference-extension
  dependency-version: 1.3.0-rc.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* refactor: kv cache manager repo (llm-d#570)

* refactor: kv cache manager repo name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* go mod tidy

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fetch kv cache upstream instead of my fork

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* revert dockerfile to fetch kv cache manager from upstream instead of go mod replace

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update chat preprocessing structs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager version

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor kvblock.Key to kvblock.BlockHash

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add context

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add parent block key

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor encode

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* validate model name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* run setup.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* clone vllm into build

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit lint

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* delete fetch-python-wrapper.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit git workflow

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* refactor TokenProcessorConfig in config

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix kv cache repo name in docker file

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix e2e tests

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add ignore

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* update architecture docs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: HyunKyun Moon <mhg5303@gmail.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* bumping IGW version to the full released version (llm-d#583)

Signed-off-by: Kellen Swain <kfswain@google.com>

* Enable prefix-cache awareness in active-active multi-replica scheduler deployments (llm-d#578)

* - active-active-ha support

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>

* lint

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* Switch to pre-built vLLM wheels for CPU builds (llm-d#582)

* try use official vllm wheels in dockerfile.epp

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* use wheels in makefile

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* write permissions to setup.sh

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager commit

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* try instal py deps wo sudo

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* CR changes

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)

* update kvc version import

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add go.mod to testable changes

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Use 1.3.0 CRDs (llm-d#586)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* free disk space on ci-release (llm-d#587)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Signed-off-by: MregXN <mregxn@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: CYJiang <googs1025@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Kellen Swain <kfswain@google.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Antonio Cardace <anto.cardace@gmail.com>
Co-authored-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Co-authored-by: MregXN <46479059+MregXN@users.noreply.github.com>
Co-authored-by: Hyunkyun Moon <mhg5303@gmail.com>
Co-authored-by: CYJiang <86391540+googs1025@users.noreply.github.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: David Breitgand <davidbreitgand@users.noreply.github.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com>
Co-authored-by: Kellen Swain <kfswain@google.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 23, 2026
* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)

* update kvc version import

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add go.mod to testable changes

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Use 1.3.0 CRDs (llm-d#586)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* free disk space on ci-release (llm-d#587)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Match documentation with default model in scripts (llm-d#615)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Test: LGTM Workflow Automation (#32)

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: open-pr

Tests that opening a PR triggers gatekeeper which blocks without lgtm label.

Test timestamp: 1771188042

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

generalize pd-profile-handler behavior for better usability and unified token-based configuration

4 participants