Conversation
Signed-off-by: irar2 <irar@il.ibm.com>
| // ModelInfo defines model's data returned from /v1/models API | ||
| type ModelInfo struct { | ||
| ID string `json:"id"` | ||
| Parent string `json:"parent,omitempty"` |
There was a problem hiding this comment.
parent field is not part of OpenAI standardization.
it's specific to vllm and might not work with other model servers.
I also don't think it's used (or should be used) anywhere.
I recommend removing this field.
OpenAI standard here:
https://platform.openai.com/docs/api-reference/models/list
There was a problem hiding this comment.
A few comments
- If not present, the
omitemptykicks in so I don't see the downside of having it. - For use cases that need the parent information for Base/LoRA relations, if it is not provided by model extraction then one must assume the base model name is provided elsewhere. There is currently no other source of truth...
I think it is fine to rely on vLLM specific for that.
- It can be treated as part of the "contract" (same as the case when other model servers are expected to provide the MSP metrics even if by a different name).
- configuration of data sources is per EPP so you can always not enable this for other model servers . This is valid usage as long as we use homogeneous model server in a pool (other code breaks as well when this is not the case...)
Signed-off-by: irar2 <irar@il.ibm.com>
|
/hold |
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: Ira Rosen <irar@il.ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
| } | ||
|
|
||
| // NewModelExtractor returns a new model extractor. | ||
| func NewModelExtractor() (*ModelExtractor, error) { |
There was a problem hiding this comment.
nit: at least in theory, the plugin could have a name...
There was a problem hiding this comment.
ModelExtractor is a plugin. A plugin has a type and an optional name.
The code does not support setting a plugin name and it should.
There was a problem hiding this comment.
There is the WithName() method now
There was a problem hiding this comment.
thanks.
I was also thinking NewModelExtractor() should be extended with a name string parameter. If empty it is set to the type and WithName() is called internally.
I think that that would have been more consistent with other plugins.
| } | ||
| } | ||
|
|
||
| ds := http.NewHTTPDataSource(cfg.Scheme, cfg.Path, cfg.InsecureSkipVerify, ModelsDataSourceType, |
There was a problem hiding this comment.
Q; does NewHTTPDataSource validate the scheme?
There was a problem hiding this comment.
No, there is only a check if it's https
There was a problem hiding this comment.
Since we use the scheme passed in by the user it should at least sanitize it to ensure it's one one of a known set of acceptable values (e.g., "http" and "https").
Can be in this PR or separate adding scheme validation to the HTTPDataSource
There was a problem hiding this comment.
thanks.
Please open a tracking issue to move this check into HTTPDataSource in GAIE. It should not be up to each data source, IMO.
|
/lgtm overall looks good. minor comments left so placing a hold. Leaving to your discretion if you want to amend or cancel the hold to allow merging as-is |
Signed-off-by: irar2 <irar@il.ibm.com>
Signed-off-by: irar2 <irar@il.ibm.com>
|
/lgtm |
|
As a follow up, we need a filter and a scorer to take advantage of the |
|
/hold cancel |
* feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Lgtm2 (#17) * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Revital Sur <eres@il.ibm.com> * test * test: automated LGTM workflow test (#19) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#20) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#21) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#22) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#24) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#26) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test This PR tests the /lgtm command workflow automation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* chore: bump gie to v1.2.1 (llm-d#504) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * deps(go): bump sigs.k8s.io/gateway-api in the kubernetes group (llm-d#508) Bumps the kubernetes group with 1 update: [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api). Updates `sigs.k8s.io/gateway-api` from 1.4.0 to 1.4.1 - [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases) - [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/gateway-api@v1.4.0...v1.4.1) --- updated-dependencies: - dependency-name: sigs.k8s.io/gateway-api dependency-version: 1.4.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(go): bump the go-dependencies group with 3 updates (llm-d#507) Bumps the go-dependencies group with 3 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo), [github.com/onsi/gomega](https://github.com/onsi/gomega) and [golang.org/x/sync](https://github.com/golang/sync). Updates `github.com/onsi/ginkgo/v2` from 2.27.2 to 2.27.3 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.2...v2.27.3) Updates `github.com/onsi/gomega` from 1.38.2 to 1.38.3 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.38.2...v1.38.3) Updates `golang.org/x/sync` from 0.18.0 to 0.19.0 - [Commits](golang/sync@v0.18.0...v0.19.0) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.27.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.38.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies - dependency-name: golang.org/x/sync dependency-version: 0.19.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Miscellaneous dependency updates (llm-d#510) * Miscelaneous dependency updates Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Use latest GIE CRDs Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Fixed references to kv-cache-manager Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * deps(go): bump the kubernetes group with 5 updates (llm-d#513) Bumps the kubernetes group with 5 updates: | Package | From | To | | --- | --- | --- | | [k8s.io/api](https://github.com/kubernetes/api) | `0.34.2` | `0.34.3` | | [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.2` | `0.34.3` | | [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.2` | `0.34.3` | | [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.2` | `0.34.3` | | [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.2` | `0.34.3` | Updates `k8s.io/api` from 0.34.2 to 0.34.3 - [Commits](kubernetes/api@v0.34.2...v0.34.3) Updates `k8s.io/apiextensions-apiserver` from 0.34.2 to 0.34.3 - [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases) - [Commits](kubernetes/apiextensions-apiserver@v0.34.2...v0.34.3) Updates `k8s.io/apimachinery` from 0.34.2 to 0.34.3 - [Commits](kubernetes/apimachinery@v0.34.2...v0.34.3) Updates `k8s.io/client-go` from 0.34.2 to 0.34.3 - [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md) - [Commits](kubernetes/client-go@v0.34.2...v0.34.3) Updates `k8s.io/component-base` from 0.34.2 to 0.34.3 - [Commits](kubernetes/component-base@v0.34.2...v0.34.3) --- updated-dependencies: - dependency-name: k8s.io/api dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apiextensions-apiserver dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apimachinery dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/client-go dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/component-base dependency-version: 0.34.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix kind-dev-env.sh (llm-d#512) Running `make env-dev-kind` will fail if the vllm simulator image hasn't been already pulled. This fixes it by skipping the manual load & save of the image unless we're dealing with a custom locally built image (using the dev tag). The kubelet will anyway pull the right image when deploying the pod. Signed-off-by: Antonio Cardace <acardace@redhat.com> * test: add precise_prefix_cache_test (llm-d#505) * test: add precise_prefix_cache_test Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * test: add precise_prefix_cache_test Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> --------- Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * test: reuse upstream data store and enable logr in unit tests (llm-d#518) * enable logr in ut Signed-off-by: MregXN <mregxn@gmail.com> * fix package impoert order Signed-off-by: MregXN <mregxn@gmail.com> * apply comments Signed-off-by: MregXN <mregxn@gmail.com> --------- Signed-off-by: MregXN <mregxn@gmail.com> * feat: allow pd_profile_handler to handle diverse plugin types (llm-d#516) * Store the precise prefix cache score in cycleState. Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * edit test code Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> --------- Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * deps(actions): bump crate-ci/typos from 1.40.0 to 1.40.1 (llm-d#526) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.0 to 1.40.1. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.40.0...v1.40.1) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.40.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(go): bump google.golang.org/grpc in the go-dependencies group (llm-d#527) Bumps the go-dependencies group with 1 update: [google.golang.org/grpc](https://github.com/grpc/grpc-go). Updates `google.golang.org/grpc` from 1.77.0 to 1.78.0 - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.77.0...v1.78.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.78.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat(metrics): add model_name label to PD decision metric (llm-d#528) Signed-off-by: CYJiang <googs1025@gmail.com> * deps(actions): bump crate-ci/typos from 1.40.1 to 1.41.0 (llm-d#532) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.40.1 to 1.41.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.40.1...v1.41.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Configure dependabot ignores Go version updates (llm-d#533) * dependabot ignores Go version updates Signed-off-by: Etai Lev Ran <elevran@gmail.com> * allow semver patch level updates to Go Signed-off-by: Etai Lev Ran <elevran@gmail.com> --------- Signed-off-by: Etai Lev Ran <elevran@gmail.com> * Updates the architecture description with reference to BBR and support for multiple GenAI models and LoRAs to remove confusion about llm-d only supporing one model per cluster (llm-d#525) * finer control over package updates (llm-d#542) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * port auto-assign action from llm-d-kv-cache (llm-d#551) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * refactor: set python version and pin docker image with tag (llm-d#543) - default set to 3.12 for python - set 9.7(the current latest) for ubi image Signed-off-by: Wen Zhou <wenzhou@redhat.com> * chore(test): update API version for nixl test (llm-d#555) - extentionRef was in old v1alpha2, in v1 it should be updated to endpointPickerRef - remove InferenceModel - update docs for test/sidecar Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#558) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.3 to 2.27.4 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.3...v2.27.4) Updates `github.com/onsi/gomega` from 1.38.3 to 1.39.0 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.38.3...v1.39.0) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.27.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(actions): bump crate-ci/typos from 1.41.0 to 1.42.0 (llm-d#557) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.41.0 to 1.42.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.41.0...v1.42.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(actions): bump actions/checkout from 4 to 6 (llm-d#556) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * update auto-assign logic (llm-d#560) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * remove newline in unsigned commit message (llm-d#561) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * bump gie to v1.3.0 rc2 (llm-d#562) * update OWNERS (llm-d#559) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * refactor: Makefile, update docs (llm-d#463) * refactor: Makefile, update docs - split Makefile 1. tools: include install tools, check tools, download dependency(gcc etc) and tokenizer. these will be download into "bin" folder than global path 2. cluster: include k8s and ocp 3. kind - rename "openshift-base" to "kubernetes-base" to be clear for purpose - uplift Go lint version to 2.1.6 to align with the same one set in Github Action - rename make targets for better visibility, deprcating old ones - add more print in "make env" Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: code review - move image tags from Makefile.tools.mk back to Makefile - update docuement to reflact how image and tag are created - do not export image tag env variables IMG_TAG - fix patch-deployments.yaml after EPP_TAG is not used but should only use EPP_IMAGE - fix kubernetes-dev-env.sh for EPP_IMAGE - remove flag on golangci_lint fmt Signed-off-by: Wen Zhou <wenzhou@redhat.com> * code review: - revert back to 1.3.0 - remove comments - set default as default namespace Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update Makefile Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> * docs: fix broken link in the docs Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> * feat: add metrics validation in e2e test (llm-d#529) Signed-off-by: CYJiang <googs1025@gmail.com> * feat: make no-hit-lru P/D-aware (llm-d#522) * feat: make no-hit-lru P/D-aware Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * hardcode prefill profile Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * remove spammy log Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * apply suggestions Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> --------- Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> * Update disaggregated Prefill/Decode inference serving documentation (llm-d#571) * update pd docs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * typos Signed-off-by: Maya Barnea <mayab@il.ibm.com> * typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> * deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 (llm-d#572) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.0 to 1.42.1. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.0...v1.42.1) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * deps(go): bump github.com/onsi/ginkgo/v2 in the go-dependencies group (llm-d#573) Bumps the go-dependencies group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo). Updates `github.com/onsi/ginkgo/v2` from 2.27.4 to 2.27.5 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.4...v2.27.5) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.27.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix reviewers auto assign minor bug (llm-d#575) * fix(scorer): make active request pd aware (llm-d#569) * fix: decrement all pods on request complete instead of only final pod Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: append all pod endpoints from profile results Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * test(e2e): cleanup kind cluster (llm-d#563) - if e2e-tests cluster exist, it fails to run "make test-e2e" - main cleanup should be done in AfterSuite() call - in certain case(kill/terminate) cluster might remain locally this PR is to add trap to preperly clean i up Signed-off-by: Wen Zhou <wenzhou@redhat.com> * refactor: add early validation in DP profile handler (llm-d#554) - validate number of schedulingProfiles in EPP to be 1 otherwise return empty map to reduce computation on filter and scores. - add unit test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(go): bump the kubernetes group with 2 updates (llm-d#574) Bumps the kubernetes group with 2 updates: [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) and [sigs.k8s.io/gateway-api-inference-extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension). Updates `sigs.k8s.io/controller-runtime` from 0.22.4 to 0.22.5 - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.22.4...v0.22.5) Updates `sigs.k8s.io/gateway-api-inference-extension` from 1.3.0-rc.2 to 1.3.0-rc.3 - [Release notes](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases) - [Changelog](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/gateway-api-inference-extension@v1.3.0-rc.2...v1.3.0-rc.3) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-version: 0.22.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: sigs.k8s.io/gateway-api-inference-extension dependency-version: 1.3.0-rc.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * refactor: kv cache manager repo (llm-d#570) * refactor: kv cache manager repo name Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * go mod tidy Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * fetch kv cache upstream instead of my fork Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * revert dockerfile to fetch kv cache manager from upstream instead of go mod replace Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * update chat preprocessing structs Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * update kv cache manager version Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * refactor kvblock.Key to kvblock.BlockHash Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * add context Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * add parent block key Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * refactor encode Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * validate model name Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * run setup.sh Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * clone vllm into build Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * edit Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * edit lint Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * delete fetch-python-wrapper.sh Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * edit git workflow Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * edit Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * refactor TokenProcessorConfig in config Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * fix kv cache repo name in docker file Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * fix e2e tests Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add ignore Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> * update architecture docs Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> --------- Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: HyunKyun Moon <mhg5303@gmail.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> * bumping IGW version to the full released version (llm-d#583) Signed-off-by: Kellen Swain <kfswain@google.com> * Enable prefix-cache awareness in active-active multi-replica scheduler deployments (llm-d#578) * - active-active-ha support Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com> * lint Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * Switch to pre-built vLLM wheels for CPU builds (llm-d#582) * try use official vllm wheels in dockerfile.epp Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * wip Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * use wheels in makefile Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * wip Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * write permissions to setup.sh Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * update kv cache manager commit Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * try instal py deps wo sudo Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * CR changes Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> --------- Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> * update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584) * update kvc version import Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add go.mod to testable changes Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Use 1.3.0 CRDs (llm-d#586) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * free disk space on ci-release (llm-d#587) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Lgtm2 (#17) * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Revital Sur <eres@il.ibm.com> * test * test: automated LGTM workflow test (#19) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#20) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#21) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#22) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#24) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#26) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test This PR tests the /lgtm command workflow automation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Antonio Cardace <acardace@redhat.com> Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> Signed-off-by: MregXN <mregxn@gmail.com> Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> Signed-off-by: CYJiang <googs1025@gmail.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Kellen Swain <kfswain@google.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Antonio Cardace <anto.cardace@gmail.com> Co-authored-by: Edoardo Vacchi <evacchi@users.noreply.github.com> Co-authored-by: MregXN <46479059+MregXN@users.noreply.github.com> Co-authored-by: Hyunkyun Moon <mhg5303@gmail.com> Co-authored-by: CYJiang <86391540+googs1025@users.noreply.github.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: David Breitgand <davidbreitgand@users.noreply.github.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com> Co-authored-by: Kellen Swain <kfswain@google.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584) * update kvc version import Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add go.mod to testable changes Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Use 1.3.0 CRDs (llm-d#586) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * free disk space on ci-release (llm-d#587) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Match documentation with default model in scripts (llm-d#615) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Test: LGTM Workflow Automation (#32) * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Lgtm2 (#17) * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Revital Sur <eres@il.ibm.com> * test * test: automated LGTM workflow test (#19) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#20) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#21) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#22) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#24) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#26) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test This PR tests the /lgtm command workflow automation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: open-pr Tests that opening a PR triggers gatekeeper which blocks without lgtm label. Test timestamp: 1771188042 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This PR adds an ability to collect information from /v1/models and store it in endpoint's attributes.
Closes #466