Skip to content

feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present#509

Merged
github-actions[bot] merged 50 commits intollm-d:mainfrom
kyanokashi:feat/sidecar/lmcache-connector/decode-first
Feb 6, 2026
Merged

feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present#509
github-actions[bot] merged 50 commits intollm-d:mainfrom
kyanokashi:feat/sidecar/lmcache-connector/decode-first

Conversation

@kyanokashi
Copy link
Contributor

@kyanokashi kyanokashi commented Dec 9, 2025

This PR updates the lmcache connector on the sidecar to decode first when cache_hit_threshold is present in the completion request.

The new flow goes as follows:

  1. If cache_hit_threshold present; then decode first
  2. If decode is successful write success response; return
  3. If finish reason is cache_threshold that means the decode node didn't meet the cache hit threshold
  4. Then we continue with the original flow of prefill, then decode

Test Report

Environment:

  • Kind Cluster: llm-d-inference-scheduler-dev
  • vLLM Simulator Image: kyanokashi/llm-d-inference-sim:dev
  • Sidecar Connector: lmcache
  • Log Level: --zap-log-level=4

Test Scenarios

Scenario 1: Non-Streaming WITHOUT cache_hit_threshold

Description: Basic request without lmcache protocol activation.

Result: ✅ PASS - Request proxied directly to decoder.


Scenario 2: Non-Streaming WITH cache_hit_threshold

Description: Request with cache_hit_threshold field, no cache_threshold triggered.

Result: ✅ PASS - LMCache protocol runs, response returned successfully.


Scenario 3: Non-Streaming WITH cache_hit_threshold AND X-Cache-Threshold: true

Description: Request configured to trigger cache_threshold finish_reason.

Result: ✅ PASS - LMCache protocol detects cache_threshold, triggers prefill→decode.


Scenario 4: Streaming WITHOUT cache_hit_threshold

Description: Streaming request without lmcache protocol activation.

Result: ✅ PASS - SSE chunks streamed directly from decoder.


Scenario 5: Streaming WITH cache_hit_threshold

Description: Streaming request with cache_hit_threshold, no header.

Result: ✅ PASS - LMCache protocol parses SSE chunks, streams through when no cache_threshold found.


Scenario 6: Streaming WITH cache_hit_threshold AND X-Cache-Threshold: true

Description: Streaming request with full lmcache configuration.

Result: ✅ PASS - LMCache protocol detects cache_threshold in SSE, triggers prefill→decode.


Failure Injection Test

To enable failure injection on the simulator:

--failure-injection-rate=100
--failure-types=rate_limit

Scenario 7: Streaming Decode Error (tryDecodeStreaming)

Request:

curl -s http://localhost:30080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "food-review",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 5,
    "cache_hit_threshold": 0.5,
    "stream": true
  }'

Path: tryDecodeStreaming → error

Result: ✅ PASS

  • HTTP Status: 429
  • Error returned as JSON (not SSE)

Response:

{
  "error": {
    "code": 429,
    "message": "Rate limit reached for food-review...",
    "type": "RateLimitError"
  }
}

Summary

Scenario Streaming cache_hit_threshold X-Cache-Threshold Result
1 No No No ✅ PASS
2 No Yes No ✅ PASS
3 No Yes Yes ✅ PASS
4 Yes No No ✅ PASS
5 Yes Yes No ✅ PASS
6 Yes Yes Yes ✅ PASS
7 Yes Yes No (error) ✅ PASS

All scenarios passed successfully.

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

🚨 Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits,
please see GitHub Documentation.

@kyanokashi kyanokashi changed the title feat: implement decode first flow on lmcache connector feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present Dec 9, 2025
@kyanokashi kyanokashi force-pushed the feat/sidecar/lmcache-connector/decode-first branch from 77b64c5 to 2a6e437 Compare December 9, 2025 23:37
- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>
kyanokashi and others added 2 commits December 9, 2025 18:55
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
@kyanokashi kyanokashi force-pushed the feat/sidecar/lmcache-connector/decode-first branch from a120186 to a6ae771 Compare December 10, 2025 17:34
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
@elevran elevran moved this to In review in llm-d-inference-scheduler Dec 11, 2025
Copy link

@kfirwolfson kfirwolfson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice implementation, @kyanokashi. Please see comments inline, there are some things that I think should be modified for it to work properly.

}

// Create prefiller request. Set max_tokens to 1.
if s.forwardDataParallel && s.dataParallelHandler(w, r) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole "if" code block and its contents should be removed. We don't want to start with "prefill". We want to start with "tryDecode" flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to account for the temporary workaround @shmuelk mentioned he put in place for something related to istio. In that case we would only prefill, therefore, we don't want to call tryDecode.

Copy link

@kfirwolfson kfirwolfson Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if DP is in use, it's only for the Decode phase, and prefill works normally. @shmuelk please correct me if I am mistaken. Assuming this code will be removed soon, maybe we can avoid having a special handling for s.forwardDataParallel == true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is for the workaround we have for Istio 1.28.0.

Istio 1.28.1 has a fix for this issue. We will be removing this code in the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be when sending to prefill or decode, @shmuelk ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the istio workaround specifically intends to avoid decoding and only prefill for some scenarios.

I guess the question is whether the condition caused from the istio workaround and the cache_hit_threshold being present in the request could exist. Technically it shouldn't because that field is only relevant for decoding correct me if I'm wrong @kfirwolfson

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure anyone will try running the connector_lmcache.py code with s.forwardDataParallel==True, let alone with cache_hit_threshold>0, before the s.forwardDataParallel code will be removed, so I don't think it matters much.
But for now, let's just remove this whole "if" statement. It's inaccurate for when s.forwardDataParallel==True and irrelevant for when s.forwardDataParallel==False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kfirwolfson
Copy link

I suggest focusing on d-first use-case in this PR, and handing the Decode preemption problem (explained in detail in vLLM issue 24256 ), in separate PR.

…w decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>
- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>
…marshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
…regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this would work with Streaming (vLLM online inference with partial responses). Would it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm this is tricky.

Previously we were passing a pass through writer to the decode proxy which was responsible for writing responses back to the client.

Now, because we need to parse the response to determine the finish reason, we use a buffered writer so we could read the first choice.

Let me explore some options. I'm thinking of either updating bufferedResponseWriter to support streaming or implementing a new writer type that handles the cache_threshold case specifically

Copy link
Contributor Author

@kyanokashi kyanokashi Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I ended up doing 88739c6

It's a bit complex, but couldn't think of a better way to do it.

Still need to test this

@kfirwolfson

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work.

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
@kyanokashi kyanokashi force-pushed the feat/sidecar/lmcache-connector/decode-first branch from 2ebde52 to 515b385 Compare December 17, 2025 18:54
Signed-off-by: kyano <kyanokashi2@gmail.com>

// ConnectorLMCache enables (now deprecated) P/D LMCache protocol
ConnectorLMCache = "lmcache"
// ConnectorSharedStorage enables (now deprecated) P/D Shared Storage protocol

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest removing the "(now deprecated)" text. We're working to enable it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it in the follow up refactor PR. It has massive refactoring and I prefer not changing the base if I can.

@kfirwolfson
Copy link

kfirwolfson commented Jan 18, 2026

Overall, both logic and code look good to me, @elevran.

@elevran elevran added this to the v0.6 milestone Jan 22, 2026
@kyanokashi
Copy link
Contributor Author

@kfirwolfson @elevran @vMaroon what's left before we can merge this?

@kfirwolfson
Copy link

@kfirwolfson @elevran @vMaroon what's left before we can merge this?

Looks great to me.

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
@kyanokashi
Copy link
Contributor Author

I went ahead an added coverage in the e2e tests d71b10b

@kyanokashi kyanokashi requested a review from elevran February 2, 2026 19:40
@elevran
Copy link
Collaborator

elevran commented Feb 5, 2026

@kyanokashi please fix the lint errors and ping me to approval. Other than that, I think this is ready to go in.
@kfirwolfson you may need to dismiss the Request Changes using the github UI before this can merge.

@kyanokashi
Copy link
Contributor Author

@elevran I think you were referring to the test errors? Anyways they are fixed now

Copy link

@kfirwolfson kfirwolfson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elevran done. Approved from my pov.

@elevran
Copy link
Collaborator

elevran commented Feb 6, 2026

@elevran I think you were referring to the test errors? Anyways they are fixed now

No. Was referring to the lint errors in the e2e test code introduced by this PR. Check the lint-and-test action :

Error: test/e2e/e2e_test.go:620:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/e2e/e2e_test.go:647:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/e2e/e2e_test.go:681:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/e2e/e2e_test.go:715:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/e2e/e2e_test.go:771:12: string-format: fmt.Sprintf can be replaced with string concatenation (perfsprint)
  	ginkgo.By(fmt.Sprintf("Getting request count from prefill pod: %s", prefillPodName))

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
@kyanokashi
Copy link
Contributor Author

kyanokashi commented Feb 6, 2026

@elevran I think you were referring to the test errors? Anyways they are fixed now

No. Was referring to the lint errors in the e2e test code introduced by this PR. Check the lint-and-test action :

Error: test/e2e/e2e_test.go:620:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/e2e/e2e_test.go:647:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/e2e/e2e_test.go:681:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/e2e/e2e_test.go:715:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/e2e/e2e_test.go:771:12: string-format: fmt.Sprintf can be replaced with string concatenation (perfsprint)
  	ginkgo.By(fmt.Sprintf("Getting request count from prefill pod: %s", prefillPodName))

Ok, fixed now. There's an issue with the CI runs, where after they fail and I push a new commit, they will show as pending and I don't have access to the previous failed runs.

I tried running make lint locally which didn't show any errors because the linter was configured to not run on new code for some reason. I went ahead and made that configurable as well.

@elevran

@elevran
Copy link
Collaborator

elevran commented Feb 6, 2026

/lgtm
/approve

@github-actions github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 6, 2026
@github-actions github-actions bot merged commit dc96b95 into llm-d:main Feb 6, 2026
8 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in llm-d-inference-scheduler Feb 6, 2026
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 15, 2026
* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 16, 2026
* chore: bump gie to v1.2.1 (llm-d#504)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* deps(go): bump sigs.k8s.io/gateway-api in the kubernetes group (llm-d#508)

Bumps the kubernetes group with 1 update: [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api).


Updates `sigs.k8s.io/gateway-api` from 1.4.0 to 1.4.1
- [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/gateway-api@v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/gateway-api
  dependency-version: 1.4.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump the go-dependencies group with 3 updates (llm-d#507)

Bumps the go-dependencies group with 3 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo), [github.com/onsi/gomega](https://github.com/onsi/gomega) and [golang.org/x/sync](https://github.com/golang/sync).


Updates `github.com/onsi/ginkgo/v2` from 2.27.2 to 2.27.3
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.2...v2.27.3)

Updates `github.com/onsi/gomega` from 1.38.2 to 1.38.3
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.38.2...v1.38.3)

Updates `golang.org/x/sync` from 0.18.0 to 0.19.0
- [Commits](golang/sync@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.38.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: golang.org/x/sync
  dependency-version: 0.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Miscellaneous dependency updates (llm-d#510)

* Miscelaneous dependency updates

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Use latest GIE CRDs

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Fixed references to kv-cache-manager

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* deps(go): bump the kubernetes group with 5 updates (llm-d#513)

Bumps the kubernetes group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [k8s.io/api](https://github.com/kubernetes/api) | `0.34.2` | `0.34.3` |
| [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.2` | `0.34.3` |
| [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.2` | `0.34.3` |
| [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.2` | `0.34.3` |
| [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.2` | `0.34.3` |


Updates `k8s.io/api` from 0.34.2 to 0.34.3
- [Commits](kubernetes/api@v0.34.2...v0.34.3)

Updates `k8s.io/apiextensions-apiserver` from 0.34.2 to 0.34.3
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases)
- [Commits](kubernetes/apiextensions-apiserver@v0.34.2...v0.34.3)

Updates `k8s.io/apimachinery` from 0.34.2 to 0.34.3
- [Commits](kubernetes/apimachinery@v0.34.2...v0.34.3)

Updates `k8s.io/client-go` from 0.34.2 to 0.34.3
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.34.2...v0.34.3)

Updates `k8s.io/component-base` from 0.34.2 to 0.34.3
- [Commits](kubernetes/component-base@v0.34.2...v0.34.3)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apiextensions-apiserver
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apimachinery
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/client-go
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/component-base
  dependency-version: 0.34.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix kind-dev-env.sh (llm-d#512)

Running `make env-dev-kind` will fail if the vllm simulator image hasn't
been already pulled.

This fixes it by skipping the manual load & save of the image unless we're
dealing with a custom locally built image (using the dev tag).

The kubelet will anyway pull the right image when deploying the pod.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* test: add precise_prefix_cache_test (llm-d#505)

* test: add precise_prefix_cache_test

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* test: add precise_prefix_cache_test

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* test: reuse upstream data store and enable logr in unit tests (llm-d#518)

* enable logr in ut

Signed-off-by: MregXN <mregxn@gmail.com>

* fix package impoert order

Signed-off-by: MregXN <mregxn@gmail.com>

* apply comments

Signed-off-by: MregXN <mregxn@gmail.com>

---------

Signed-off-by: MregXN <mregxn@gmail.com>

* feat: allow pd_profile_handler to handle diverse plugin types (llm-d#516)

* Store the precise prefix cache score in cycleState.

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit test code

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

---------

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* deps(actions): bump crate-ci/typos from 1.40.0 to 1.40.1 (llm-d#526)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.0 to 1.40.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.40.0...v1.40.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.40.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump google.golang.org/grpc in the go-dependencies group (llm-d#527)

Bumps the go-dependencies group with 1 update: [google.golang.org/grpc](https://github.com/grpc/grpc-go).


Updates `google.golang.org/grpc` from 1.77.0 to 1.78.0
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.77.0...v1.78.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.78.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(metrics): add model_name label to PD decision metric (llm-d#528)

Signed-off-by: CYJiang <googs1025@gmail.com>

* deps(actions): bump crate-ci/typos from 1.40.1 to 1.41.0 (llm-d#532)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.1 to 1.41.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.40.1...v1.41.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Configure dependabot ignores Go version updates (llm-d#533)

* dependabot ignores Go version updates

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* allow semver patch level updates to Go

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

---------

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* Updates the architecture description with reference to BBR and support for multiple GenAI models and LoRAs to remove confusion about llm-d only supporing one model per cluster (llm-d#525)

* finer control over package updates (llm-d#542)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* port auto-assign action from llm-d-kv-cache (llm-d#551)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* refactor: set python version and pin docker image with tag (llm-d#543)

- default set to 3.12 for python
- set 9.7(the current latest) for ubi image

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* chore(test): update API version for nixl test (llm-d#555)

- extentionRef was in old v1alpha2, in v1 it should be updated to
  endpointPickerRef
- remove InferenceModel
- update docs for test/sidecar

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#558)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.3 to 2.27.4
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.3...v2.27.4)

Updates `github.com/onsi/gomega` from 1.38.3 to 1.39.0
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.38.3...v1.39.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(actions): bump crate-ci/typos from 1.41.0 to 1.42.0 (llm-d#557)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.41.0 to 1.42.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.41.0...v1.42.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(actions): bump actions/checkout from 4 to 6 (llm-d#556)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update auto-assign logic (llm-d#560)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* remove newline in unsigned commit message (llm-d#561)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* bump gie to v1.3.0 rc2 (llm-d#562)

* update OWNERS (llm-d#559)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* refactor: Makefile, update docs (llm-d#463)

* refactor: Makefile, update docs

- split Makefile
  1. tools: include install tools, check tools, download dependency(gcc
     etc) and tokenizer. these will be download into "bin" folder than
     global path
  2. cluster: include k8s and ocp
  3. kind
- rename "openshift-base" to "kubernetes-base" to be clear for purpose
- uplift Go lint version to 2.1.6 to align with the same one set in
  Github Action
- rename make targets for better visibility, deprcating old ones
- add more print in "make env"

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: code review

- move image tags from Makefile.tools.mk back to Makefile
- update docuement to reflact how image and tag are created
- do not export image tag env variables IMG_TAG
- fix patch-deployments.yaml after EPP_TAG is not used but should only
  use EPP_IMAGE
- fix kubernetes-dev-env.sh for EPP_IMAGE
- remove flag on golangci_lint fmt

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* code review:

- revert back to 1.3.0
- remove comments
- set default as default namespace

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update Makefile

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* docs: fix broken link in the docs

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>

* feat: add metrics validation in e2e test (llm-d#529)

Signed-off-by: CYJiang <googs1025@gmail.com>

* feat: make no-hit-lru P/D-aware (llm-d#522)

* feat: make no-hit-lru P/D-aware

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* hardcode prefill profile

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* remove spammy log

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* apply suggestions

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

* Update disaggregated Prefill/Decode inference serving documentation (llm-d#571)

* update pd docs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typos

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 (llm-d#572)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.0 to 1.42.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.0...v1.42.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* deps(go): bump github.com/onsi/ginkgo/v2 in the go-dependencies group (llm-d#573)

Bumps the go-dependencies group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo).


Updates `github.com/onsi/ginkgo/v2` from 2.27.4 to 2.27.5
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.4...v2.27.5)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix reviewers auto assign minor bug (llm-d#575)

* fix(scorer): make active request pd aware (llm-d#569)

* fix: decrement all pods on request complete instead of only final pod

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: append all pod endpoints from profile results

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* test(e2e): cleanup kind cluster (llm-d#563)

- if e2e-tests cluster exist, it fails to run "make test-e2e"
- main cleanup should be done in AfterSuite() call
- in certain case(kill/terminate) cluster might remain locally
  this PR is to add trap to preperly clean i up

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* refactor: add early validation in DP profile handler (llm-d#554)

- validate number of schedulingProfiles in EPP to be 1 otherwise return
  empty map to reduce computation on filter and scores.
- add unit test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(go): bump the kubernetes group with 2 updates (llm-d#574)

Bumps the kubernetes group with 2 updates: [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) and [sigs.k8s.io/gateway-api-inference-extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension).


Updates `sigs.k8s.io/controller-runtime` from 0.22.4 to 0.22.5
- [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases)
- [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/controller-runtime@v0.22.4...v0.22.5)

Updates `sigs.k8s.io/gateway-api-inference-extension` from 1.3.0-rc.2 to 1.3.0-rc.3
- [Release notes](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/RELEASE.md)
- [Commits](kubernetes-sigs/gateway-api-inference-extension@v1.3.0-rc.2...v1.3.0-rc.3)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/controller-runtime
  dependency-version: 0.22.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: sigs.k8s.io/gateway-api-inference-extension
  dependency-version: 1.3.0-rc.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* refactor: kv cache manager repo (llm-d#570)

* refactor: kv cache manager repo name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* go mod tidy

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fetch kv cache upstream instead of my fork

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* revert dockerfile to fetch kv cache manager from upstream instead of go mod replace

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update chat preprocessing structs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager version

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor kvblock.Key to kvblock.BlockHash

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add context

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* add parent block key

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* refactor encode

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* validate model name

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* run setup.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* clone vllm into build

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit lint

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* delete fetch-python-wrapper.sh

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit git workflow

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* edit

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* refactor TokenProcessorConfig in config

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix kv cache repo name in docker file

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix e2e tests

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add ignore

Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>

* update architecture docs

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: HyunKyun Moon <mhg5303@gmail.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* bumping IGW version to the full released version (llm-d#583)

Signed-off-by: Kellen Swain <kfswain@google.com>

* Enable prefix-cache awareness in active-active multi-replica scheduler deployments (llm-d#578)

* - active-active-ha support

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>

* lint

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* Switch to pre-built vLLM wheels for CPU builds (llm-d#582)

* try use official vllm wheels in dockerfile.epp

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* use wheels in makefile

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* wip

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* write permissions to setup.sh

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update kv cache manager commit

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* try instal py deps wo sudo

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* CR changes

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)

* update kvc version import

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add go.mod to testable changes

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Use 1.3.0 CRDs (llm-d#586)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* free disk space on ci-release (llm-d#587)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Signed-off-by: MregXN <mregxn@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: CYJiang <googs1025@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Kellen Swain <kfswain@google.com>
Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Antonio Cardace <anto.cardace@gmail.com>
Co-authored-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Co-authored-by: MregXN <46479059+MregXN@users.noreply.github.com>
Co-authored-by: Hyunkyun Moon <mhg5303@gmail.com>
Co-authored-by: CYJiang <86391540+googs1025@users.noreply.github.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: David Breitgand <davidbreitgand@users.noreply.github.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com>
Co-authored-by: Kellen Swain <kfswain@google.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
github-actions bot pushed a commit to revit13/llm-d-inference-scheduler that referenced this pull request Feb 23, 2026
* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584)

* update kvc version import

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* add go.mod to testable changes

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* Use 1.3.0 CRDs (llm-d#586)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* free disk space on ci-release (llm-d#587)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Match documentation with default model in scripts (llm-d#615)

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Test: LGTM Workflow Automation (#32)

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581)

* feat: use Tinyllama as the "model" for kind test

- in order to test precies-prefix-cache-score we cannot use
  fool-reviewer since it need call kv-cache-manager to get tokenizer by
  getting a real model from HF
- the change is to switch the "default model" to TinyLlama
- also to make tokenizer folder writable need change permission to the
  USER in Dockerfile
- rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for
  local test

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: revert back some config to keep using prefix-cache-scorer

- revert file renaming

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update linter configuration (llm-d#588)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* fix: config should use new precise-prefix-cache-scorer (llm-d#576)

- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs
  need migrate from the old one to the new one with spec.
  - rename plugin name
  - remove parameters.autoTune and parameters.mode: cache_tracking and
    lruCapacityPerServer
  - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig
- for config using food-review keep old prefix-cache-scorer
- keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as
  KV and PD need both be enabled which is not done yet

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.1...v1.42.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.42.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated to more recent GIE (llm-d#592)

* Updated to more recent GIE

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updated to latest GIE and chnages due to review comments

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added a true mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Exploited mock SchedulerProfile

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* pull kvc v0.5.0 libs (llm-d#595)

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

* deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.42.2...v1.43.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* address nil,nil return linter error in test mock (llm-d#598)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* deps(go): bump the go-dependencies group with 2 updates (llm-d#597)

Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.27.5...v2.28.1)

Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Models extractor (llm-d#553)

* Models extractor

Signed-off-by: irar2 <irar@il.ibm.com>

* Update register.go

Signed-off-by: Ira Rosen <irar@il.ibm.com>

* Updated for the newer GIE

Signed-off-by: irar2 <irar@il.ibm.com>

* Review comments

Signed-off-by: irar2 <irar@il.ibm.com>

* Check the scheme

Signed-off-by: irar2 <irar@il.ibm.com>

---------

Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>

* feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509)

* feat: implement decode first flow on lmcache connector

- if cache_hit_threshold field is present in completion request, then we perform a decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: error handling

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add back todo comment

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: reduce code complexity and duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve header copying

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add comment explaning the cache_hit_threshold field and the new decode first flow

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: enhance logging for cache hit threshold in decode flow

- decrease verbosity for common log
- add cache_hit_threshold attribute

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve error handling and observability when failing to unmarshal decode response

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add deleted informational comments

Signed-off-by: kyano <kyanokashi2@gmail.com>

* typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: make error logs more descriptive of the failure reason

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: assign 0 cache_hit_threshold before final decode attempt

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: update comment according to feedback

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: remove istio workaround

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: set cache hit threshold to 0 in prefill request for consistent execution

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: update the log

Signed-off-by: kyano <kyanokashi2@gmail.com>

* feat: support online decoding

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: preserve request body in lmcache connector

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: support sse format for streamed decode

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: add and improve log  descriptions

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typo

Signed-off-by: kyano <kyanokashi2@gmail.com>

* nit: undo capitalization

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: typos

Signed-off-by: kyano <kyanokashi2@gmail.com>

* chore: improve error log observability

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate http error checking in function and reuse

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: encapsulate and reuse code better

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: lint error

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: improve code encapsulation and reduce duplication

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename and simplify SSE event signaling logic

Signed-off-by: kyano <kyanokashi2@gmail.com>

* refactor: rename lmcache to shared storage protocol

Signed-off-by: kyano <kyanokashi2@gmail.com>

* fix: remove unused function

Signed-off-by: kyano <kyanokashi2@gmail.com>

* test: e2e tests

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* chore: claude gitignore

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: sim deployment

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* feat: make linter running on new code configurable

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* fix: lint errors

Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

---------

Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>

* Extend support for different ways to decide if disaggregated PD is required (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>

* chore: fix wrong port for NIXL (llm-d#593)

- start with vLLM 0.11.1, default port for NIXL has been updated to 5600
- leave ZMQ to use 5557

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602)

* fix: resolve JSON serialization error in active-request-scorer debug logs

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Add raw scores to debug

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Lgtm2 (#17)

* Implement "LGTM" ChatOps Workflow.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test

* test: automated LGTM workflow test (#19)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#20)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#21)

This PR tests the /lgtm command workflow automation.

Test suite: all

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: automated LGTM workflow test (#22)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#24)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test (#26)

This PR tests the /lgtm command workflow automation.

Test suite: reset

Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* Address review comments.

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: automated LGTM workflow test

This PR tests the /lgtm command workflow automation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* test

Signed-off-by: Revital Sur <eres@il.ibm.com>

* test: open-pr

Tests that opening a PR triggers gatekeeper which blocks without lgtm label.

Test timestamp: 1771188042

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: kyano <kyanokashi2@gmail.com>
Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Signed-off-by: kyanokashi <kyanokashi2@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ira Rosen <irar@il.ibm.com>
Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: alberto <aperdomo@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants