Skip to content

Models extractor#553

Merged
github-actions[bot] merged 13 commits intollm-d:mainfrom
irar2:models
Feb 5, 2026
Merged

Models extractor#553
github-actions[bot] merged 13 commits intollm-d:mainfrom
irar2:models

Conversation

@irar2
Copy link
Contributor

@irar2 irar2 commented Jan 12, 2026

This PR adds an ability to collect information from /v1/models and store it in endpoint's attributes.

Closes #466

Signed-off-by: irar2 <irar@il.ibm.com>
// ModelInfo defines model's data returned from /v1/models API
type ModelInfo struct {
ID string `json:"id"`
Parent string `json:"parent,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent field is not part of OpenAI standardization.
it's specific to vllm and might not work with other model servers.
I also don't think it's used (or should be used) anywhere.
I recommend removing this field.

OpenAI standard here:
https://platform.openai.com/docs/api-reference/models/list

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

  • If not present, the omitempty kicks in so I don't see the downside of having it.
  • For use cases that need the parent information for Base/LoRA relations, if it is not provided by model extraction then one must assume the base model name is provided elsewhere. There is currently no other source of truth...

I think it is fine to rely on vLLM specific for that.

  1. It can be treated as part of the "contract" (same as the case when other model servers are expected to provide the MSP metrics even if by a different name).
  2. configuration of data sources is per EPP so you can always not enable this for other model servers . This is valid usage as long as we use homogeneous model server in a pool (other code breaks as well when this is not the case...)

@vMaroon vMaroon requested a review from nirrozenbaum January 13, 2026 10:19
Signed-off-by: irar2 <irar@il.ibm.com>
@elevran
Copy link
Collaborator

elevran commented Jan 14, 2026

/hold
this should go in post v0.5

@github-actions github-actions bot added the hold PRs that are blocked on design, other features, release cycle, etc. label Jan 14, 2026
@elevran elevran added this to the v0.6 milestone Jan 22, 2026
@elevran elevran moved this to In review in llm-d-inference-scheduler Jan 22, 2026
@elevran elevran removed the hold PRs that are blocked on design, other features, release cycle, etc. label Jan 26, 2026
irar2 added 5 commits January 29, 2026 08:02
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
}

// NewModelExtractor returns a new model extractor.
func NewModelExtractor() (*ModelExtractor, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: at least in theory, the plugin could have a name...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelExtractor is a plugin. A plugin has a type and an optional name.
The code does not support setting a plugin name and it should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the WithName() method now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.
I was also thinking NewModelExtractor() should be extended with a name string parameter. If empty it is set to the type and WithName() is called internally.
I think that that would have been more consistent with other plugins.

}
}

ds := http.NewHTTPDataSource(cfg.Scheme, cfg.Path, cfg.InsecureSkipVerify, ModelsDataSourceType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q; does NewHTTPDataSource validate the scheme?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is only a check if it's https

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use the scheme passed in by the user it should at least sanitize it to ensure it's one one of a known set of acceptable values (e.g., "http" and "https").
Can be in this PR or separate adding scheme validation to the HTTPDataSource

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.
Please open a tracking issue to move this check into HTTPDataSource in GAIE. It should not be up to each data source, IMO.

@elevran
Copy link
Collaborator

elevran commented Feb 3, 2026

/lgtm
/approve
/hold

overall looks good. minor comments left so placing a hold. Leaving to your discretion if you want to amend or cancel the hold to allow merging as-is

@github-actions github-actions bot added hold PRs that are blocked on design, other features, release cycle, etc. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Feb 3, 2026
github-actions[bot]
github-actions bot previously approved these changes Feb 3, 2026
irar2 added 2 commits February 3, 2026 12:39
Signed-off-by: irar2 <irar@il.ibm.com>
irar2 added 2 commits February 3, 2026 12:48
Signed-off-by: irar2 <irar@il.ibm.com>
@elevran elevran self-requested a review February 4, 2026 07:52
@elevran
Copy link
Collaborator

elevran commented Feb 4, 2026

/lgtm
/approve

@elevran
Copy link
Collaborator

elevran commented Feb 4, 2026

As a follow up, we need a filter and a scorer to take advantage of the v1/models information in request scheduling.

@irar2
Copy link
Contributor Author

irar2 commented Feb 5, 2026

/hold cancel

@github-actions github-actions bot removed the hold PRs that are blocked on design, other features, release cycle, etc. label Feb 5, 2026
@github-actions github-actions bot merged commit cf638f5 into llm-d:main Feb 5, 2026
8 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in llm-d-inference-scheduler Feb 5, 2026
@irar2 irar2 deleted the models branch February 5, 2026 10:46
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 15, 2026
* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 16, 2026
* chore: bump gie to v1.2.1 (llm-d#504)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* deps(go): bump sigs.k8s.io/gateway-api in the kubernetes group (llm-d#508)

Bumps the kubernetes group with 1 update: [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api).


Updates `sigs.k8s.io/gateway-api` from 1.4.0 to 1.4.1
- [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/gateway-api@v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/gateway-api
  dependency-version: 1.4.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump the go-dependencies group with 3 updates (llm-d#507)

Bumps the go-dependencies group with 3 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo), [github.com/onsi/gomega](https://github.com/onsi/gomega) and [golang.org/x/sync](https://github.com/golang/sync).


Updates `github.com/onsi/ginkgo/v2` from 2.27.2 to 2.27.3
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.2...v2.27.3)

Updates `github.com/onsi/gomega` from 1.38.2 to 1.38.3
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.38.2...v1.38.3)

Updates `golang.org/x/sync` from 0.18.0 to 0.19.0
- [Commits](golang/sync@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.38.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: golang.org/x/sync
  dependency-version: 0.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Miscellaneous dependency updates (llm-d#510)

* Miscelaneous dependency updates

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Use latest GIE CRDs

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Fixed references to kv-cache-manager

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* deps(go): bump the kubernetes group with 5 updates (llm-d#513)

Bumps the kubernetes group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [k8s.io/api](https://github.com/kubernetes/api) | `0.34.2` | `0.34.3` |
| [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.2` | `0.34.3` |
| [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.2` | `0.34.3` |
| [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.2` | `0.34.3` |
| [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.2` | `0.34.3` |


Updates `k8s.io/api` from 0.34.2 to 0.34.3
- [Commits](kubernetes/api@v0.34.2...v0.34.3)

Updates `k8s.io/apiextensions-apiserver` from 0.34.2 to 0.34.3
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases)
- [Commits](kubernetes/apiextensions-apiserver@v0.34.2...v0.34.3)

Updates `k8s.io/apimachinery` from 0.34.2 to 0.34.3
- [Commits](kubernetes/apimachinery@v0.34.2...v0.34.3)

Updates `k8s.io/client-go` from 0.34.2 to 0.34.3
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.34.2...v0.34.3)

Updates `k8s.io/component-base` from 0.34.2 to 0.34.3
- [Commits](kubernetes/component-base@v0.34.2...v0.34.3)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apiextensions-apiserver
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apimachinery
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/client-go
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/component-base
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix kind-dev-env.sh (llm-d#512)

Running `make env-dev-kind` will fail if the vllm simulator image hasn't
been already pulled.

This fixes it by skipping the manual load & save of the image unless we're
dealing with a custom locally built image (using the dev tag).

The kubelet will anyway pull the right image when deploying the pod.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* test: add precise_prefix_cache_test (llm-d#505)

* test: add precise_prefix_cache_test

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* test: add precise_prefix_cache_test

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* test: reuse upstream data store and enable logr in unit tests (llm-d#518)

* enable logr in ut

Signed-off-by: MregXN <mregxn@gmail.com>

* fix package impoert order

Signed-off-by: MregXN <mregxn@gmail.com>

* apply comments

Signed-off-by: MregXN <mregxn@gmail.com>

---------

Signed-off-by: MregXN <mregxn@gmail.com>

* feat: allow pd_profile_handler to handle diverse plugin types (llm-d#516)

* Store the precise prefix cache score in cycleState.

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit test code

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

---------

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* deps(actions): bump crate-ci/typos from 1.40.0 to 1.40.1 (llm-d#526)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.0 to 1.40.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.40.0...v1.40.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.40.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump google.golang.org/grpc in the go-dependencies group (llm-d#527)

Bumps the go-dependencies group with 1 update: [google.golang.org/grpc](https://github.com/grpc/grpc-go).


Updates `google.golang.org/grpc` from 1.77.0 to 1.78.0
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.77.0...v1.78.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.78.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(metrics): add model_name label to PD decision metric (llm-d#528)

Signed-off-by: CYJiang <googs1025@gmail.com>

* deps(actions): bump crate-ci/typos from 1.40.1 to 1.41.0 (llm-d#532)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.1 to 1.41.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.40.1...v1.41.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Configure dependabot ignores Go version updates (llm-d#533)

* dependabot ignores Go version updates

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* allow semver patch level updates to Go

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

---------

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* Updates the architecture description with reference to BBR and support for multiple GenAI models and LoRAs to remove confusion about llm-d only supporing one model per cluster (llm-d#525)

* finer control over package updates (llm-d#542)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* port auto-assign action from llm-d-kv-cache (llm-d#551)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* refactor: set python version and pin docker image with tag (llm-d#543)

- default set to 3.12 for python
- set 9.7(the current latest) for ubi image

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* chore(test): update API version for nixl test (llm-d#555)

- extentionRef was in old v1alpha2, in v1 it should be updated to
  endpointPickerRef
- remove InferenceModel
- update docs for test/sidecar

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#558)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.3 to 2.27.4
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.3...v2.27.4)

Updates `github.com/onsi/gomega` from 1.38.3 to 1.39.0
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.38.3...v1.39.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(actions): bump crate-ci/typos from 1.41.0 to 1.42.0 (llm-d#557)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.41.0 to 1.42.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.41.0...v1.42.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(actions): bump actions/checkout from 4 to 6 (llm-d#556)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update auto-assign logic (llm-d#560)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* remove newline in unsigned commit message (llm-d#561)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* bump gie to v1.3.0 rc2 (llm-d#562)

* update OWNERS (llm-d#559)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* refactor: Makefile, update docs (llm-d#463)

* refactor: Makefile, update docs

- split Makefile
  1. tools: include install tools, check tools, download dependency(gcc
     etc) and tokenizer. these will be download into "bin" folder than
     global path
  2. cluster: include k8s and ocp
  3. kind
- rename "openshift-base" to "kubernetes-base" to be clear for purpose
- uplift Go lint version to 2.1.6 to align with the same one set in
  Github Action
- rename make targets for better visibility, deprcating old ones
- add more print in "make env"

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: code review

- move image tags from Makefile.tools.mk back to Makefile
- update docuement to reflact how image and tag are created
- do not export image tag env variables IMG_TAG
- fix patch-deployments.yaml after EPP_TAG is not used but should only
  use EPP_IMAGE
- fix kubernetes-dev-env.sh for EPP_IMAGE
- remove flag on golangci_lint fmt

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* code review:

- revert back to 1.3.0
- remove comments
- set default as default namespace

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update Makefile

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* docs: fix broken link in the docs

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>

* feat: add metrics validation in e2e test (llm-d#529)

Signed-off-by: CYJiang <googs1025@gmail.com>

* feat: make no-hit-lru P/D-aware (llm-d#522)

* feat: make no-hit-lru P/D-aware

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* hardcode prefill profile

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* remove spammy log

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* apply suggestions

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* Update disaggregated Prefill/Decode inference serving documentation (llm-d#571)

* update pd docs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typos

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 (llm-d#572)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.0 to 1.42.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.0...v1.42.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump github.com/onsi/ginkgo/v2 in the go-dependencies group (llm-d#573)

Bumps the go-dependencies group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo).


Updates `github.com/onsi/ginkgo/v2` from 2.27.4 to 2.27.5
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.4...v2.27.5)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix reviewers auto assign minor bug (llm-d#575)

* fix(scorer): make active request pd aware (llm-d#569)

* fix: decrement all pods on request complete instead of only final pod

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: append all pod endpoints from profile results

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* test(e2e): cleanup kind cluster (llm-d#563)

- if e2e-tests cluster exist, it fails to run "make test-e2e"
- main cleanup should be done in AfterSuite() call
- in certain case(kill/terminate) cluster might remain locally
  this PR is to add trap to preperly clean i up

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* refactor: add early validation in DP profile handler (llm-d#554)

- validate number of schedulingProfiles in EPP to be 1 otherwise return
  empty map to reduce computation on filter and scores.
- add unit test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(go): bump the kubernetes group with 2 updates (llm-d#574)

Bumps the kubernetes group with 2 updates: [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) and [sigs.k8s.io/gateway-api-inference-extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension).


Updates `sigs.k8s.io/controller-runtime` from 0.22.4 to 0.22.5
- [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases)
- [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/controller-runtime@v0.22.4...v0.22.5)

Updates `sigs.k8s.io/gateway-api-inference-extension` from 1.3.0-rc.2 to 1.3.0-rc.3
- [Release notes](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/gateway-api-inference-extension@v1.3.0-rc.2...v1.3.0-rc.3)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/controller-runtime
  dependency-version: 0.22.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: sigs.k8s.io/gateway-api-inference-extension
  dependency-version: 1.3.0-rc.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* refactor: kv cache manager repo (llm-d#570)

* refactor: kv cache manager repo name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* go mod tidy

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fetch kv cache upstream instead of my fork

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* revert dockerfile to fetch kv cache manager from upstream instead of go mod replace

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update chat preprocessing structs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager version

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor kvblock.Key to kvblock.BlockHash

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add context

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add parent block key

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor encode

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* validate model name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* run setup.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* clone vllm into build

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit lint

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* delete fetch-python-wrapper.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit git workflow

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* refactor TokenProcessorConfig in config

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix kv cache repo name in docker file

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix e2e tests

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add ignore

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* update architecture docs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: HyunKyun Moon <mhg5303@gmail.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* bumping IGW version to the full released version (llm-d#583)

Signed-off-by: Kellen Swain <kfswain@google.com>

* Enable prefix-cache awareness in active-active multi-replica scheduler deployments (llm-d#578)

* - active-active-ha support

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>

* lint

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* Switch to pre-built vLLM wheels for CPU builds (llm-d#582)

* try use official vllm wheels in dockerfile.epp

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* use wheels in makefile

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* write permissions to setup.sh

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager commit

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* try instal py deps wo sudo

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* CR changes

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)

* update kvc version import

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add go.mod to testable changes

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Use 1.3.0 CRDs (llm-d#586)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* free disk space on ci-release (llm-d#587)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Signed-off-by: MregXN <mregxn@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: CYJiang <googs1025@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Kellen Swain <kfswain@google.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Antonio Cardace <anto.cardace@gmail.com>
Co-authored-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Co-authored-by: MregXN <46479059+MregXN@users.noreply.github.com>
Co-authored-by: Hyunkyun Moon <mhg5303@gmail.com>
Co-authored-by: CYJiang <86391540+googs1025@users.noreply.github.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: David Breitgand <davidbreitgand@users.noreply.github.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com>
Co-authored-by: Kellen Swain <kfswain@google.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 23, 2026
* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)

* update kvc version import

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add go.mod to testable changes

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Use 1.3.0 CRDs (llm-d#586)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* free disk space on ci-release (llm-d#587)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Match documentation with default model in scripts (llm-d#615)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Test: LGTM Workflow Automation (#32)

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: open-pr

Tests that opening a PR triggers gatekeeper which blocks without lgtm label.

Test timestamp: 1771188042

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Enable collection of configured / loaded models in each inference serving endpoint

3 participants