Skip to content

Releases: llm-d/llm-d-inference-scheduler

v0.5.0

02 Feb 18:18
v0.5.0
5df2c23

Choose a tag to compare

Docker image is available at:

docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.5.0

Notable

  • Prefill/Decode disaggregation awareness in filters and scorers
  • Support for data parallel serving validated with vLLM and inference-sim
  • Various CICD enhancements
  • IGW: Flow control scale from/to zero support
  • IGW: Standalone EPP

What's Changed

  • Fix for flaky sites during lychee md link checker by @pierDipi in #485
  • deps(actions): bump actions/checkout from 5 to 6 by @dependabot[bot] in #487
  • deps(go): bump google.golang.org/grpc from 1.76.0 to 1.77.0 in the go-dependencies group by @dependabot[bot] in #486
  • Use kv-cache-manager based on Go mod version instead of hardcoded by @pierDipi in #484
  • update llm-d-kv-cache version to v0.4.0 by @vMaroon in #492
  • fix: github action missing Trivy scan on sidecar image by @zdtsw in #481
  • [Fix] Enhance macOS Makefile to Support Non-Homebrew Python Installations by @hyeongyun0916 in #489
  • feat(allowlist): support both v1 and v1alpha2 InferencePool APIs with flag by @googs1025 in #474
  • fix: make 'install-dependencies' and 'build' target by @zdtsw in #493
  • skip lint and test when only docs change by @setsunakute in #494
  • fix: Fixes for Data Parallel support when also running with Prefix Disaggregation by @shmuelk in #498
  • sync gie to v1.2.0 by @nirrozenbaum in #499
  • Error if PYTHON_CONFIG is empty by @elevran in #497
  • Add GH action to check for signed and verified commits in PR by @elevran in #500
  • deps(actions): bump crate-ci/typos from 1.39.2 to 1.40.0 by @dependabot[bot] in #501
  • build: make build should use CGO_CFLAGS, CGO_LDFLAGS by @evacchi in #503
  • chore: bump gie to v1.2.1 by @nirrozenbaum in #504
  • deps(go): bump sigs.k8s.io/gateway-api from 1.4.0 to 1.4.1 in the kubernetes group by @dependabot[bot] in #508
  • deps(go): bump the go-dependencies group with 3 updates by @dependabot[bot] in #507
  • Miscellaneous dependency updates by @shmuelk in #510
  • deps(go): bump the kubernetes group with 5 updates by @dependabot[bot] in #513
  • Fix running make env-dev-kind by @acardace in #512
  • test: add precise_prefix_cache_test by @evacchi in #505
  • test: reuse upstream data store and enable logr in unit tests by @MregXN in #518
  • feat: allow pd_profile_handler to handle diverse plugin types by @hyeongyun0916 in #516
  • deps(actions): bump crate-ci/typos from 1.40.0 to 1.40.1 by @dependabot[bot] in #526
  • deps(go): bump google.golang.org/grpc from 1.77.0 to 1.78.0 in the go-dependencies group by @dependabot[bot] in #527
  • feat(metrics): add model_name label to PD decision metric by @googs1025 in #528
  • deps(actions): bump crate-ci/typos from 1.40.1 to 1.41.0 by @dependabot[bot] in #532
  • Configure dependabot ignores Go version updates by @elevran in #533
  • Updates the architecture description by @davidbreitgand in #525
  • Dependabot: exert finer control over package updates by @elevran in #542
  • port auto-assign action from llm-d-kv-cache by @vMaroon in #551
  • refactor: set python version and pin docker image with tag by @zdtsw in #543
  • chore(test): update API version for nixl test by @zdtsw in #555
  • deps(go): bump the go-dependencies group with 2 updates by @dependabot[bot] in #558
  • deps(actions): bump crate-ci/typos from 1.41.0 to 1.42.0 by @dependabot[bot] in #557
  • deps(actions): bump actions/checkout from 4 to 6 by @dependabot[bot] in #556
  • Update auto-assign logic by @elevran in #560
  • Remove newline in unsigned commit message by @elevran in #561
  • bump gie to v1.3.0 rc2 by @nirrozenbaum in #562
  • Update OWNERS by @elevran in #559
  • refactor: Makefile, update docs by @zdtsw in #463
  • feat: add metrics validation in e2e test by @googs1025 in #529
  • feat: make no-hit-lru P/D-aware by @evacchi in #522
  • Update disaggregated Prefill/Decode inference serving documentation by @mayabar in #571
  • deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 by @dependabot[bot] in #572
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.27.4 to 2.27.5 in the go-dependencies group by @dependabot[bot] in #573
  • fix reviewers auto assign minor bug by @nirrozenbaum in #575
  • fix(scorer): make active request pd aware by @kyanokashi in #569
  • test(e2e): cleanup kind cluster by @zdtsw in #563
  • refactor: add early validation in DP profile handler by @zdtsw in #554
  • deps(go): bump the kubernetes group with 2 updates by @dependabot[bot] in #574
  • refactor: kv cache manager repo by @sagearc in #570
  • bumping IGW version to the full released version by @kfswain in #583
  • Enable prefix-cache awareness in active-active multi-replica scheduler deployments by @vMaroon in #578
  • Switch to pre-built vLLM wheels for CPU builds by @sagearc in #582
  • update llm-d-kv-cache import to v0.5.0-RC1 by @vMaroon in #584
  • Use 1.3.0 CRDs by @shmuelk in #586

Updates in Inference Gateway Extension v1.3.0

llm-d-inference-scheduler v0.5.0 has been updated to use the latest version of the Inference Gateway Extension which is 1.3.0.
You can see those changes here

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.5.0-rc.1

22 Jan 18:43
v0.5.0-rc.1
91c80c4

Choose a tag to compare

v0.5.0-rc.1 Pre-release
Pre-release

What's Changed

  • Fix for flaky sites during lychee md link checker by @pierDipi in #485
  • deps(actions): bump actions/checkout from 5 to 6 by @dependabot[bot] in #487
  • deps(go): bump google.golang.org/grpc from 1.76.0 to 1.77.0 in the go-dependencies group by @dependabot[bot] in #486
  • Use kv-cache-manager based on Go mod version instead of hardcoded by @pierDipi in #484
  • update llm-d-kv-cache version to v0.4.0 by @vMaroon in #492
  • fix: github action missing Trivy scan on sidecar image by @zdtsw in #481
  • [Fix] Enhance macOS Makefile to Support Non-Homebrew Python Installations by @hyeongyun0916 in #489
  • feat(allowlist): support both v1 and v1alpha2 InferencePool APIs with flag by @googs1025 in #474
  • fix: make 'install-dependencies' and 'build' target by @zdtsw in #493
  • skip lint and test when only docs change by @setsunakute in #494
  • fix: Fixes for Data Parallel support when also running with Prefix Disaggregation by @shmuelk in #498
  • sync gie to v1.2.0 by @nirrozenbaum in #499
  • Error if PYTHON_CONFIG is empty by @elevran in #497
  • Add GH action to check for signed and verified commits in PR by @elevran in #500
  • deps(actions): bump crate-ci/typos from 1.39.2 to 1.40.0 by @dependabot[bot] in #501
  • build: make build should use CGO_CFLAGS, CGO_LDFLAGS by @evacchi in #503
  • chore: bump gie to v1.2.1 by @nirrozenbaum in #504
  • deps(go): bump sigs.k8s.io/gateway-api from 1.4.0 to 1.4.1 in the kubernetes group by @dependabot[bot] in #508
  • deps(go): bump the go-dependencies group with 3 updates by @dependabot[bot] in #507
  • Miscellaneous dependency updates by @shmuelk in #510
  • deps(go): bump the kubernetes group with 5 updates by @dependabot[bot] in #513
  • Fix running make env-dev-kind by @acardace in #512
  • test: add precise_prefix_cache_test by @evacchi in #505
  • test: reuse upstream data store and enable logr in unit tests by @MregXN in #518
  • feat: allow pd_profile_handler to handle diverse plugin types by @hyeongyun0916 in #516
  • deps(actions): bump crate-ci/typos from 1.40.0 to 1.40.1 by @dependabot[bot] in #526
  • deps(go): bump google.golang.org/grpc from 1.77.0 to 1.78.0 in the go-dependencies group by @dependabot[bot] in #527
  • feat(metrics): add model_name label to PD decision metric by @googs1025 in #528
  • deps(actions): bump crate-ci/typos from 1.40.1 to 1.41.0 by @dependabot[bot] in #532
  • Configure dependabot ignores Go version updates by @elevran in #533
  • Updates the architecture description by @davidbreitgand in #525
  • Dependabot: exert finer control over package updates by @elevran in #542
  • port auto-assign action from llm-d-kv-cache by @vMaroon in #551
  • refactor: set python version and pin docker image with tag by @zdtsw in #543
  • chore(test): update API version for nixl test by @zdtsw in #555
  • deps(go): bump the go-dependencies group with 2 updates by @dependabot[bot] in #558
  • deps(actions): bump crate-ci/typos from 1.41.0 to 1.42.0 by @dependabot[bot] in #557
  • deps(actions): bump actions/checkout from 4 to 6 by @dependabot[bot] in #556
  • Update auto-assign logic by @elevran in #560
  • Remove newline in unsigned commit message by @elevran in #561
  • bump gie to v1.3.0 rc2 by @nirrozenbaum in #562
  • Update OWNERS by @elevran in #559
  • refactor: Makefile, update docs by @zdtsw in #463
  • feat: add metrics validation in e2e test by @googs1025 in #529
  • feat: make no-hit-lru P/D-aware by @evacchi in #522
  • Update disaggregated Prefill/Decode inference serving documentation by @mayabar in #571
  • deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 by @dependabot[bot] in #572
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.27.4 to 2.27.5 in the go-dependencies group by @dependabot[bot] in #573
  • fix reviewers auto assign minor bug by @nirrozenbaum in #575
  • fix(scorer): make active request pd aware by @kyanokashi in #569
  • test(e2e): cleanup kind cluster by @zdtsw in #563
  • refactor: add early validation in DP profile handler by @zdtsw in #554
  • deps(go): bump the kubernetes group with 2 updates by @dependabot[bot] in #574
  • refactor: kv cache manager repo by @sagearc in #570
  • bumping IGW version to the full released version by @kfswain in #583
  • Enable prefix-cache awareness in active-active multi-replica scheduler deployments by @vMaroon in #578
  • Switch to pre-built vLLM wheels for CPU builds by @sagearc in #582
  • update llm-d-kv-cache import to v0.5.0-RC1 by @vMaroon in #584
  • Use 1.3.0 CRDs by @shmuelk in #586

Updates in Inference Gateway Extension v1.3.0

llm-d-inference-scheduler v0.5.0 has been updated to use the latest version of the Inference Gateway Extension which is 1.3.0.
You can see those changes here

New Contributors

Full Changelog: v0.4.0-rc.1...v0.5.0-rc.1

v0.4.0

01 Dec 16:17
v0.4.0
86f5af7

Choose a tag to compare

Docker image is available at:

docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.4.0

What's Changed

  • Use a production version of Istio by @shmuelk in #334
  • add vMaroon as code owner by @elevran in #342
  • Upgrade github.com/llm-d/llm-d-kv-cache-manager import to v0.3.0 by @vMaroon in #344
  • add a hold label when PRs are pushed to branch other than main by @nirrozenbaum in #345
  • sync gic to latest v1.0.0 release by @nirrozenbaum in #353
  • deps(actions): bump actions/stale from 9 to 10 by @dependabot[bot] in #350
  • deps(actions): bump actions/setup-go from 5 to 6 by @dependabot[bot] in #351
  • deps(actions): bump crate-ci/typos from 1.35.7 to 1.36.2 by @dependabot[bot] in #348
  • deps(go): bump the go-dependencies group with 7 updates by @dependabot[bot] in #349
  • bump llm-d-kv-cache-manager version by @vMaroon in #359
  • fix: Rename config to kv-cache-utilization-scorer from kv-cache-scorer by @yankay in #358
  • updating release issue-template by @kfswain in #361
  • bump llm-d-kv-cache-manager version (v0.3.2) by @vMaroon in #365
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.25.3 to 2.26.0 in the go-dependencies group by @dependabot[bot] in #368
  • feat: Add a scoring plugin to distribute new groups evenly by @usize in #357
  • implement PreRequest and PostResponse interface checks by @learner0810 in #372
  • deps(go): bump the kubernetes group with 2 updates by @dependabot[bot] in #369
  • deps(go): bump google.golang.org/grpc from 1.75.1 to 1.76.0 in the go-dependencies group by @dependabot[bot] in #374
  • Supports the ResponseComplete plugin by @learner0810 in #378
  • deps(actions): bump crate-ci/typos from 1.36.2 to 1.38.1 by @dependabot[bot] in #373
  • Fix multi-architecture image issues with Kind by @shmuelk in #362
  • feat: Moved the Routing Sidecar from its own repo to the inference-scheduler repo by @shmuelk in #379
  • Upgrade to use Gateway Inference Extension 1.1.0 rc.1 by @shmuelk in #384
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.1 in the go-dependencies group by @dependabot[bot] in #389
  • Ensure that max_completion_tokens=1 in Prefill by @shmuelk in #403
  • Add explanation of inference-scheduler relation to IGW/GIE by @elevran in #393
  • Add test coverage to test-unit Makefile target by @carlory in #391
  • Add regression tests for max_completion_tokens by @pierDipi in #411
  • Makefile refactoring to minimize the number of targets by @shmuelk in #397
  • feat: Add vLLM Data Parallel support to llm-d-inference-scheduler by @shmuelk in #392
  • fix(scorer): prevent potential division by zero in ActiveRequest.Score by @googs1025 in #413
  • Fixed wildcard targets by @shmuelk in #416
  • deps(actions): bump crate-ci/typos from 1.38.1 to 1.39.0 by @dependabot[bot] in #419
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.27.1 to 2.27.2 in the go-dependencies group by @dependabot[bot] in #417
  • Missed change to the Go code coverage output file names in the Makefile refactoring by @shmuelk in #422
  • Fix: Remove reference to the missing make target by @andreyod in #423
  • deps(actions): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #420
  • deps(go): bump sigs.k8s.io/controller-runtime from 0.22.3 to 0.22.4 in the kubernetes group by @dependabot[bot] in #418
  • Enhancement: return 503 instead of 502 when decode node is not ready by @Phil-OSophy-42 in #412
  • Remove endpointslices from RBAC by @elevran in #424
  • Fix Image Loading for Podman in E2E Tests by @hdefazio in #406
  • readme meetings update by @nirrozenbaum in #427
  • Fix references to the SideCar's tag by @shmuelk in #428
  • Remove duplicate error logs by @hyeongyun0916 in #429
  • Upgrade to istio-1.28 by @irar2 in #431
  • Complete upgrade to Istio 1.28.0 by @shmuelk in #433
  • Upgrade GIE dependency to 1.1.0 by @shmuelk in #435
  • Remove dev from branch list in PR actions by @elevran in #434
  • Added support for Data Parallel in a Disagregated Prefil/Decode setup by @shmuelk in #432
  • Remove code coverage from CI workflow by @carlory in #437
  • test: Scale up and down the model server during an end to end test by @shmuelk in #354
  • fix: add validation in ByLabelFactory to prevent invalid configurations by @googs1025 in #440
  • deps(actions): bump golangci/golangci-lint-action from 8 to 9 by @dependabot[bot] in #444
  • change lmcache connector to nixlv2 by @googs1025 in #446
  • fix: Roll back automatic updates to Dockerfiles by @shmuelk in #447
  • deps(go): bump golang.org/x/sync from 0.17.0 to 0.18.0 in the go-dependencies group by @dependabot[bot] in #443
  • fix(profile): validate handler parameters to prevent invalid config by @googs1025 in #449
  • Added chat completions preprocessing support by @guygir in #426
  • docs: add integration guide for external prefill/decode workloads by @googs1025 in #451
  • Define and manage PR lifecycle by @elevran in #450
  • test: End to End test for Data Parallel support by @shmuelk in #442
  • docs: add PD-aware examples for by-label and by-label-selector plugins by @googs1025 in #454
  • deps(actions): bump crate-ci/typos from 1.39.0 to 1.39.2 by @dependabot[bot] in #459
  • Add SGLang Connector for Prefill/Decode Disaggregation (migrated from llm-d-routing-sidecar#64) by @bongwoobak in #456
  • deps(go): bump the kubernetes group with 4 updates by @dependabot[bot] in #460
  • add unit test in scheduler plugin part(by-label, data-parallel-profile-handler, pd-profile-handler) by @googs1025 in #461
  • test: Enable running the end to end tests on K8S clusters other than Kind by @shmuelk in #453
  • Allow the sidecar to sample from a list of prefill host ports by @smarterclayton in #404
  • fix: Fixed issues running locally 'make lint' and 'make test-unit' by @shmuelk in #464
  • cleanup: Followup to Python paths fix by @shmuelk in #468
  • Replace tab with spaces to avoid treating as make target by @elevran in #469
  • minor refactoring of precise-prefix-cache scorer plugin by @vMaroon in #473
  • feat: Add initial metrics and update dependencies by...
Read more

v0.4.0-rc.1

24 Nov 12:07
v0.4.0-rc.1
cd7f004

Choose a tag to compare

v0.4.0-rc.1 Pre-release
Pre-release

Docker image is available here:

docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.4.0-rc.1

What's Changed

  • Use a production version of Istio by @shmuelk in #334
  • add vMaroon as code owner by @elevran in #342
  • Upgrade github.com/llm-d/llm-d-kv-cache-manager import to v0.3.0 by @vMaroon in #344
  • add a hold label when PRs are pushed to branch other than main by @nirrozenbaum in #345
  • sync gic to latest v1.0.0 release by @nirrozenbaum in #353
  • deps(actions): bump actions/stale from 9 to 10 by @dependabot[bot] in #350
  • deps(actions): bump actions/setup-go from 5 to 6 by @dependabot[bot] in #351
  • deps(actions): bump crate-ci/typos from 1.35.7 to 1.36.2 by @dependabot[bot] in #348
  • deps(go): bump the go-dependencies group with 7 updates by @dependabot[bot] in #349
  • bump llm-d-kv-cache-manager version by @vMaroon in #359
  • fix: Rename config to kv-cache-utilization-scorer from kv-cache-scorer by @yankay in #358
  • updating release issue-template by @kfswain in #361
  • bump llm-d-kv-cache-manager version (v0.3.2) by @vMaroon in #365
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.25.3 to 2.26.0 in the go-dependencies group by @dependabot[bot] in #368
  • feat: Add a scoring plugin to distribute new groups evenly by @usize in #357
  • implement PreRequest and PostResponse interface checks by @learner0810 in #372
  • deps(go): bump the kubernetes group with 2 updates by @dependabot[bot] in #369
  • deps(go): bump google.golang.org/grpc from 1.75.1 to 1.76.0 in the go-dependencies group by @dependabot[bot] in #374
  • Supports the ResponseComplete plugin by @learner0810 in #378
  • deps(actions): bump crate-ci/typos from 1.36.2 to 1.38.1 by @dependabot[bot] in #373
  • Fix multi-architecture image issues with Kind by @shmuelk in #362
  • feat: Moved the Routing Sidecar from its own repo to the inference-scheduler repo by @shmuelk in #379
  • Upgrade to use Gateway Inference Extension 1.1.0 rc.1 by @shmuelk in #384
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.1 in the go-dependencies group by @dependabot[bot] in #389
  • Ensure that max_completion_tokens=1 in Prefill by @shmuelk in #403
  • Add explanation of inference-scheduler relation to IGW/GIE by @elevran in #393
  • Add test coverage to test-unit Makefile target by @carlory in #391
  • Add regression tests for max_completion_tokens by @pierDipi in #411
  • Makefile refactoring to minimize the number of targets by @shmuelk in #397
  • feat: Add vLLM Data Parallel support to llm-d-inference-scheduler by @shmuelk in #392
  • fix(scorer): prevent potential division by zero in ActiveRequest.Score by @googs1025 in #413
  • Fixed wildcard targets by @shmuelk in #416
  • deps(actions): bump crate-ci/typos from 1.38.1 to 1.39.0 by @dependabot[bot] in #419
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.27.1 to 2.27.2 in the go-dependencies group by @dependabot[bot] in #417
  • Missed change to the Go code coverage output file names in the Makefile refactoring by @shmuelk in #422
  • Fix: Remove reference to the missing make target by @andreyod in #423
  • deps(actions): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #420
  • deps(go): bump sigs.k8s.io/controller-runtime from 0.22.3 to 0.22.4 in the kubernetes group by @dependabot[bot] in #418
  • Enhancement: return 503 instead of 502 when decode node is not ready by @Phil-OSophy-42 in #412
  • Remove endpointslices from RBAC by @elevran in #424
  • Fix Image Loading for Podman in E2E Tests by @hdefazio in #406
  • readme meetings update by @nirrozenbaum in #427
  • Fix references to the SideCar's tag by @shmuelk in #428
  • Remove duplicate error logs by @hyeongyun0916 in #429
  • Upgrade to istio-1.28 by @irar2 in #431
  • Complete upgrade to Istio 1.28.0 by @shmuelk in #433
  • Upgrade GIE dependency to 1.1.0 by @shmuelk in #435
  • Remove dev from branch list in PR actions by @elevran in #434
  • Added support for Data Parallel in a Disagregated Prefil/Decode setup by @shmuelk in #432
  • Remove code coverage from CI workflow by @carlory in #437
  • test: Scale up and down the model server during an end to end test by @shmuelk in #354
  • fix: add validation in ByLabelFactory to prevent invalid configurations by @googs1025 in #440
  • deps(actions): bump golangci/golangci-lint-action from 8 to 9 by @dependabot[bot] in #444
  • change lmcache connector to nixlv2 by @googs1025 in #446
  • fix: Roll back automatic updates to Dockerfiles by @shmuelk in #447
  • deps(go): bump golang.org/x/sync from 0.17.0 to 0.18.0 in the go-dependencies group by @dependabot[bot] in #443
  • fix(profile): validate handler parameters to prevent invalid config by @googs1025 in #449
  • Added chat completions preprocessing support by @guygir in #426
  • docs: add integration guide for external prefill/decode workloads by @googs1025 in #451
  • Define and manage PR lifecycle by @elevran in #450
  • test: End to End test for Data Parallel support by @shmuelk in #442
  • docs: add PD-aware examples for by-label and by-label-selector plugins by @googs1025 in #454
  • deps(actions): bump crate-ci/typos from 1.39.0 to 1.39.2 by @dependabot[bot] in #459
  • Add SGLang Connector for Prefill/Decode Disaggregation (migrated from llm-d-routing-sidecar#64) by @bongwoobak in #456
  • deps(go): bump the kubernetes group with 4 updates by @dependabot[bot] in #460
  • add unit test in scheduler plugin part(by-label, data-parallel-profile-handler, pd-profile-handler) by @googs1025 in #461
  • test: Enable running the end to end tests on K8S clusters other than Kind by @shmuelk in #453
  • Allow the sidecar to sample from a list of prefill host ports by @smarterclayton in #404
  • fix: Fixed issues running locally 'make lint' and 'make test-unit' by @shmuelk in #464
  • cleanup: Followup to Python paths fix by @shmuelk in #468
  • Replace tab with spaces to avoid treating as make target by @elevran in #469
  • minor refactoring of precise-prefix-cache scorer plugin by @vMaroon in #473
  • feat: Add initial metrics and update dependencies...
Read more

v0.3.2

09 Oct 00:08
v0.3.2

Choose a tag to compare

In addition to the below changes these patches include fixes to the kv-cache-manager dependency

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.3.2

v0.3.2-rc.1

03 Oct 18:51
v0.3.2-rc.1

Choose a tag to compare

v0.3.2-rc.1 Pre-release
Pre-release

Small fixes to kv-cache-manager required updated dependencies

v0.3.1

29 Sep 20:52
v0.3.1

Choose a tag to compare

Small patch updating kv cache manager dependency to include support in v0.3

See the full v0.3 changes here:

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.3.1

v0.3.1-rc.1

26 Sep 02:03
v0.3.1-rc.1

Choose a tag to compare

v0.3.1-rc.1 Pre-release
Pre-release

Full Changelog: v0.3.0...v0.3.1-rc.1

v0.3.0

24 Sep 19:30
v0.3.0
1889019

Choose a tag to compare

Image pull example: docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.3.0

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.3.0

v0.3.0-rc.2

17 Sep 11:06
v0.3.0-rc.2
1889019

Choose a tag to compare

v0.3.0-rc.2 Pre-release
Pre-release

Image is available here: docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.3.0-rc.2