Models extractor by irar2 · Pull Request #553 · llm-d/llm-d-inference-scheduler

irar2 · 2026-01-12T10:47:13Z

This PR adds an ability to collect information from /v1/models and store it in endpoint's attributes.

Closes #466

Signed-off-by: irar2 <irar@il.ibm.com>

nirrozenbaum · 2026-01-12T12:11:16Z

pkg/plugins/datalayer/models/extractor.go

+// ModelInfo defines model's data returned from /v1/models API
+type ModelInfo struct {
+	ID     string `json:"id"`
+	Parent string `json:"parent,omitempty"`


parent field is not part of OpenAI standardization.
it's specific to vllm and might not work with other model servers.
I also don't think it's used (or should be used) anywhere.
I recommend removing this field.

OpenAI standard here:
https://platform.openai.com/docs/api-reference/models/list

A few comments

If not present, the omitempty kicks in so I don't see the downside of having it.

For use cases that need the parent information for Base/LoRA relations, if it is not provided by model extraction then one must assume the base model name is provided elsewhere. There is currently no other source of truth...

I think it is fine to rely on vLLM specific for that.

It can be treated as part of the "contract" (same as the case when other model servers are expected to provide the MSP metrics even if by a different name).

configuration of data sources is per EPP so you can always not enable this for other model servers . This is valid usage as long as we use homogeneous model server in a pool (other code breaks as well when this is not the case...)

Signed-off-by: irar2 <irar@il.ibm.com>

elevran · 2026-01-14T13:10:51Z

/hold
this should go in post v0.5

Signed-off-by: Ira Rosen <irar@il.ibm.com>

Signed-off-by: irar2 <irar@il.ibm.com>

pkg/plugins/datalayer/models/datasource_test.go

pkg/plugins/datalayer/models/extractor.go

elevran · 2026-02-03T09:47:16Z

pkg/plugins/datalayer/models/extractor.go

+}
+
+// NewModelExtractor returns a new model extractor.
+func NewModelExtractor() (*ModelExtractor, error) {


nit: at least in theory, the plugin could have a name...

What do you mean?

ModelExtractor is a plugin. A plugin has a type and an optional name.
The code does not support setting a plugin name and it should.

There is the WithName() method now

thanks.
I was also thinking NewModelExtractor() should be extended with a name string parameter. If empty it is set to the type and WithName() is called internally.
I think that that would have been more consistent with other plugins.

pkg/plugins/datalayer/models/extractor_test.go

elevran · 2026-02-03T09:50:24Z

pkg/plugins/datalayer/models/factories.go

+		}
+	}
+
+	ds := http.NewHTTPDataSource(cfg.Scheme, cfg.Path, cfg.InsecureSkipVerify, ModelsDataSourceType,


Q; does NewHTTPDataSource validate the scheme?

No, there is only a check if it's https

Since we use the scheme passed in by the user it should at least sanitize it to ensure it's one one of a known set of acceptable values (e.g., "http" and "https").
Can be in this PR or separate adding scheme validation to the HTTPDataSource

thanks.
Please open a tracking issue to move this check into HTTPDataSource in GAIE. It should not be up to each data source, IMO.

pkg/plugins/datalayer/models/factories.go

elevran · 2026-02-03T09:53:36Z

/lgtm
/approve
/hold

overall looks good. minor comments left so placing a hold. Leaving to your discretion if you want to amend or cancel the hold to allow merging as-is

Signed-off-by: irar2 <irar@il.ibm.com>

…uler into models

Signed-off-by: irar2 <irar@il.ibm.com>

elevran · 2026-02-04T07:57:17Z

/lgtm
/approve

elevran · 2026-02-04T07:58:00Z

As a follow up, we need a filter and a scorer to take advantage of the v1/models information in request scheduling.

irar2 · 2026-02-05T10:29:53Z

/hold cancel

* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Lgtm2 (#17) * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Revital Sur <eres@il.ibm.com> * test * test: automated LGTM workflow test (#19) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#20) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#21) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#22) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#24) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#26) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test This PR tests the /lgtm command workflow automation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* chore: bump gie to v1.2.1 (llm-d#504) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * deps(go): bump sigs.k8s.io/gateway-api in the kubernetes group (llm-d#508) Bumps the kubernetes group with 1 update: [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api). Updates `sigs.k8s.io/gateway-api` from 1.4.0 to 1.4.1 - [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases) - [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/gateway-api@v1.4.0...v1.4.1) --- updated-dependencies: - dependency-name: sigs.k8s.io/gateway-api dependency-version: 1.4.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(go): bump the go-dependencies group with 3 updates (llm-d#507) Bumps the go-dependencies group with 3 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo), [github.com/onsi/gomega](https://github.com/onsi/gomega) and [golang.org/x/sync](https://github.com/golang/sync). Updates `github.com/onsi/ginkgo/v2` from 2.27.2 to 2.27.3 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.2...v2.27.3) Updates `github.com/onsi/gomega` from 1.38.2 to 1.38.3 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.38.2...v1.38.3) Updates `golang.org/x/sync` from 0.18.0 to 0.19.0 - [Commits](golang/sync@v0.18.0...v0.19.0) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.27.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.38.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies - dependency-name: golang.org/x/sync dependency-version: 0.19.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Miscellaneous dependency updates (llm-d#510) * Miscelaneous dependency updates Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Use latest GIE CRDs Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Fixed references to kv-cache-manager Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * deps(go): bump the kubernetes group with 5 updates (llm-d#513) Bumps the kubernetes group with 5 updates: | Package | From | To | | --- | --- | --- | | [k8s.io/api](https://github.com/kubernetes/api) | `0.34.2` | `0.34.3` | | [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.2` | `0.34.3` | | [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.2` | `0.34.3` | | [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.2` | `0.34.3` | | [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.2` | `0.34.3` | Updates `k8s.io/api` from 0.34.2 to 0.34.3 - [Commits](kubernetes/api@v0.34.2...v0.34.3) Updates `k8s.io/apiextensions-apiserver` from 0.34.2 to 0.34.3 - [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases) - [Commits](kubernetes/apiextensions-apiserver@v0.34.2...v0.34.3) Updates `k8s.io/apimachinery` from 0.34.2 to 0.34.3 - [Commits](kubernetes/apimachinery@v0.34.2...v0.34.3) Updates `k8s.io/client-go` from 0.34.2 to 0.34.3 - [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md) - [Commits](kubernetes/client-go@v0.34.2...v0.34.3) Updates `k8s.io/component-base` from 0.34.2 to 0.34.3 - [Commits](kubernetes/component-base@v0.34.2...v0.34.3) --- updated-dependencies: - dependency-name: k8s.io/api dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apiextensions-apiserver dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apimachinery dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/client-go dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/component-base dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix kind-dev-env.sh (llm-d#512) Running `make env-dev-kind` will fail if the vllm simulator image hasn't been already pulled. This fixes it by skipping the manual load & save of the image unless we're dealing with a custom locally built image (using the dev tag). The kubelet will anyway pull the right image when deploying the pod. Signed-off-by: Antonio Cardace <acardace@redhat.com> * test: add precise_prefix_cache_test (llm-d#505) * test: add precise_prefix_cache_test Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * test: add precise_prefix_cache_test Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> --------- Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * test: reuse upstream data store and enable logr in unit tests (llm-d#518) * enable logr in ut Signed-off-by: MregXN <mregxn@gmail.com> * fix package impoert order Signed-off-by: MregXN <mregxn@gmail.com> * apply comments Signed-off-by: MregXN <mregxn@gmail.com> --------- Signed-off-by: MregXN <mregxn@gmail.com> * feat: allow pd_profile_handler to handle diverse plugin types (llm-d#516) * Store the precise prefix cache score in cycleState. Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * edit test code Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> --------- Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * deps(actions): bump crate-ci/typos from 1.40.0 to 1.40.1 (llm-d#526) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.0 to 1.40.1. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.40.0...v1.40.1) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.40.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(go): bump google.golang.org/grpc in the go-dependencies group (llm-d#527) Bumps the go-dependencies group with 1 update: [google.golang.org/grpc](https://github.com/grpc/grpc-go). Updates `google.golang.org/grpc` from 1.77.0 to 1.78.0 - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.77.0...v1.78.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.78.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat(metrics): add model_name label to PD decision metric (llm-d#528) Signed-off-by: CYJiang <googs1025@gmail.com> * deps(actions): bump crate-ci/typos from 1.40.1 to 1.41.0 (llm-d#532) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.1 to 1.41.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.40.1...v1.41.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Configure dependabot ignores Go version updates (llm-d#533) * dependabot ignores Go version updates Signed-off-by: Etai Lev Ran <elevran@gmail.com> * allow semver patch level updates to Go Signed-off-by: Etai Lev Ran <elevran@gmail.com> --------- Signed-off-by: Etai Lev Ran <elevran@gmail.com> * Updates the architecture description with reference to BBR and support for multiple GenAI models and LoRAs to remove confusion about llm-d only supporing one model per cluster (llm-d#525) * finer control over package updates (llm-d#542) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * port auto-assign action from llm-d-kv-cache (llm-d#551) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * refactor: set python version and pin docker image with tag (llm-d#543) - default set to 3.12 for python - set 9.7(the current latest) for ubi image Signed-off-by: Wen Zhou <wenzhou@redhat.com> * chore(test): update API version for nixl test (llm-d#555) - extentionRef was in old v1alpha2, in v1 it should be updated to endpointPickerRef - remove InferenceModel - update docs for test/sidecar Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#558) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.3 to 2.27.4 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.3...v2.27.4) Updates `github.com/onsi/gomega` from 1.38.3 to 1.39.0 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.38.3...v1.39.0) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.27.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(actions): bump crate-ci/typos from 1.41.0 to 1.42.0 (llm-d#557) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.41.0 to 1.42.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.41.0...v1.42.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(actions): bump actions/checkout from 4 to 6 (llm-d#556) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * update auto-assign logic (llm-d#560) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * remove newline in unsigned commit message (llm-d#561) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * bump gie to v1.3.0 rc2 (llm-d#562) * update OWNERS (llm-d#559) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * refactor: Makefile, update docs (llm-d#463) * refactor: Makefile, update docs - split Makefile 1. tools: include install tools, check tools, download dependency(gcc etc) and tokenizer. these will be download into "bin" folder than global path 2. cluster: include k8s and ocp 3. kind - rename "openshift-base" to "kubernetes-base" to be clear for purpose - uplift Go lint version to 2.1.6 to align with the same one set in Github Action - rename make targets for better visibility, deprcating old ones - add more print in "make env" Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: code review - move image tags from Makefile.tools.mk back to Makefile - update docuement to reflact how image and tag are created - do not export image tag env variables IMG_TAG - fix patch-deployments.yaml after EPP_TAG is not used but should only use EPP_IMAGE - fix kubernetes-dev-env.sh for EPP_IMAGE - remove flag on golangci_lint fmt Signed-off-by: Wen Zhou <wenzhou@redhat.com> * code review: - revert back to 1.3.0 - remove comments - set default as default namespace Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update Makefile Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> * docs: fix broken link in the docs Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> * feat: add metrics validation in e2e test (llm-d#529) Signed-off-by: CYJiang <googs1025@gmail.com> * feat: make no-hit-lru P/D-aware (llm-d#522) * feat: make no-hit-lru P/D-aware Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * hardcode prefill profile Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * remove spammy log Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * apply suggestions Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> --------- Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * Update disaggregated Prefill/Decode inference serving documentation (llm-d#571) * update pd docs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * typos Signed-off-by: Maya Barnea <mayab@il.ibm.com> * typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> * deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 (llm-d#572) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.0 to 1.42.1. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.0...v1.42.1) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(go): bump github.com/onsi/ginkgo/v2 in the go-dependencies group (llm-d#573) Bumps the go-dependencies group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo). Updates `github.com/onsi/ginkgo/v2` from 2.27.4 to 2.27.5 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.4...v2.27.5) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.27.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix reviewers auto assign minor bug (llm-d#575) * fix(scorer): make active request pd aware (llm-d#569) * fix: decrement all pods on request complete instead of only final pod Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: append all pod endpoints from profile results Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * test(e2e): cleanup kind cluster (llm-d#563) - if e2e-tests cluster exist, it fails to run "make test-e2e" - main cleanup should be done in AfterSuite() call - in certain case(kill/terminate) cluster might remain locally this PR is to add trap to preperly clean i up Signed-off-by: Wen Zhou <wenzhou@redhat.com> * refactor: add early validation in DP profile handler (llm-d#554) - validate number of schedulingProfiles in EPP to be 1 otherwise return empty map to reduce computation on filter and scores. - add unit test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(go): bump the kubernetes group with 2 updates (llm-d#574) Bumps the kubernetes group with 2 updates: [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) and [sigs.k8s.io/gateway-api-inference-extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension). Updates `sigs.k8s.io/controller-runtime` from 0.22.4 to 0.22.5 - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.22.4...v0.22.5) Updates `sigs.k8s.io/gateway-api-inference-extension` from 1.3.0-rc.2 to 1.3.0-rc.3 - [Release notes](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases) - [Changelog](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/gateway-api-inference-extension@v1.3.0-rc.2...v1.3.0-rc.3) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-version: 0.22.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: sigs.k8s.io/gateway-api-inference-extension dependency-version: 1.3.0-rc.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * refactor: kv cache manager repo (llm-d#570) * refactor: kv cache manager repo name Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * go mod tidy Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * fetch kv cache upstream instead of my fork Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * revert dockerfile to fetch kv cache manager from upstream instead of go mod replace Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * update chat preprocessing structs Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * update kv cache manager version Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * refactor kvblock.Key to kvblock.BlockHash Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * add context Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * add parent block key Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * refactor encode Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * validate model name Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * run setup.sh Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * clone vllm into build Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * edit Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * edit lint Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * delete fetch-python-wrapper.sh Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * edit git workflow Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * edit Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * refactor TokenProcessorConfig in config Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * fix kv cache repo name in docker file Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * fix e2e tests Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add ignore Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * update architecture docs Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> --------- Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: HyunKyun Moon <mhg5303@gmail.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> * bumping IGW version to the full released version (llm-d#583) Signed-off-by: Kellen Swain <kfswain@google.com> * Enable prefix-cache awareness in active-active multi-replica scheduler deployments (llm-d#578) * - active-active-ha support Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com> * lint Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * Switch to pre-built vLLM wheels for CPU builds (llm-d#582) * try use official vllm wheels in dockerfile.epp Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * wip Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * use wheels in makefile Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * wip Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * write permissions to setup.sh Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * update kv cache manager commit Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * try instal py deps wo sudo Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * CR changes Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> --------- Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584) * update kvc version import Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add go.mod to testable changes Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Use 1.3.0 CRDs (llm-d#586) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * free disk space on ci-release (llm-d#587) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Lgtm2 (#17) * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Revital Sur <eres@il.ibm.com> * test * test: automated LGTM workflow test (#19) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#20) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#21) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#22) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#24) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#26) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test This PR tests the /lgtm command workflow automation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Antonio Cardace <acardace@redhat.com> Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> Signed-off-by: MregXN <mregxn@gmail.com> Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> Signed-off-by: CYJiang <googs1025@gmail.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Kellen Swain <kfswain@google.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Antonio Cardace <anto.cardace@gmail.com> Co-authored-by: Edoardo Vacchi <evacchi@users.noreply.github.com> Co-authored-by: MregXN <46479059+MregXN@users.noreply.github.com> Co-authored-by: Hyunkyun Moon <mhg5303@gmail.com> Co-authored-by: CYJiang <86391540+googs1025@users.noreply.github.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: David Breitgand <davidbreitgand@users.noreply.github.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com> Co-authored-by: Kellen Swain <kfswain@google.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584) * update kvc version import Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add go.mod to testable changes Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Use 1.3.0 CRDs (llm-d#586) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * free disk space on ci-release (llm-d#587) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Match documentation with default model in scripts (llm-d#615) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Test: LGTM Workflow Automation (#32) * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Lgtm2 (#17) * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Revital Sur <eres@il.ibm.com> * test * test: automated LGTM workflow test (#19) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#20) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#21) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#22) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#24) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#26) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test This PR tests the /lgtm command workflow automation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: open-pr Tests that opening a PR triggers gatekeeper which blocks without lgtm label. Test timestamp: 1771188042 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Models extractor

4bb0dfc

Signed-off-by: irar2 <irar@il.ibm.com>

github-project-automation bot added this to llm-d-inference-scheduler Jan 12, 2026

vMaroon requested review from elevran, kfirtoledo, kfswain, nilig, nirrozenbaum and shmuelk January 12, 2026 10:47

nirrozenbaum reviewed Jan 12, 2026

View reviewed changes

Merge branch 'main' into models

8b6e87e

vMaroon requested a review from nirrozenbaum January 13, 2026 10:19

Merge branch 'main' into models

4f37fa5

Signed-off-by: irar2 <irar@il.ibm.com>

github-actions bot added the hold PRs that are blocked on design, other features, release cycle, etc. label Jan 14, 2026

Merge branch 'main' into models

8e970f1

elevran added this to the v0.6 milestone Jan 22, 2026

elevran moved this to In review in llm-d-inference-scheduler Jan 22, 2026

elevran removed the hold PRs that are blocked on design, other features, release cycle, etc. label Jan 26, 2026

irar2 added 5 commits January 29, 2026 08:02

Merge branch 'main' into models

e6e9852

Signed-off-by: Ira Rosen <irar@il.ibm.com>

Merge branch 'main' into models

4628de9

Signed-off-by: Ira Rosen <irar@il.ibm.com>

Update register.go

55996a2

Signed-off-by: Ira Rosen <irar@il.ibm.com>

Updated for the newer GIE

f3f582a

Signed-off-by: irar2 <irar@il.ibm.com>

Merge branch 'main' into models

c0aac23

elevran reviewed Feb 3, 2026

View reviewed changes

github-actions bot added hold PRs that are blocked on design, other features, release cycle, etc. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Feb 3, 2026

github-actions bot previously approved these changes Feb 3, 2026

View reviewed changes

irar2 added 2 commits February 3, 2026 12:39

Review comments

8eaf3ce

Signed-off-by: irar2 <irar@il.ibm.com>

Merge branch 'models' of github.com:irar2/irar2-llm-d-inference-sched…

7d16b18

…uler into models

irar2 dismissed github-actions[bot]’s stale review via 7d16b18 February 3, 2026 10:48

irar2 added 2 commits February 3, 2026 12:48

Merge branch 'main' into models

9c510f8

Check the scheme

5454388

Signed-off-by: irar2 <irar@il.ibm.com>

elevran self-requested a review February 4, 2026 07:52

github-actions bot approved these changes Feb 4, 2026

View reviewed changes

github-actions bot removed the hold PRs that are blocked on design, other features, release cycle, etc. label Feb 5, 2026

github-actions bot merged commit cf638f5 into llm-d:main Feb 5, 2026
8 checks passed

github-project-automation bot moved this from In review to Done in llm-d-inference-scheduler Feb 5, 2026

irar2 deleted the models branch February 5, 2026 10:46

Conversation

irar2 commented Jan 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elevran commented Jan 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elevran commented Feb 3, 2026

Uh oh!

elevran commented Feb 4, 2026

Uh oh!

elevran commented Feb 4, 2026

Uh oh!

irar2 commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants