Releases: llm-d-incubation/llm-d-fast-model-actuation
Milestone 3 - Launcher-Based Inference with Sleep/Wake (Test Release #6)
What's Changed
- Reword GPU-ful to GPU-bearing to pass typo checker by @MikeSpreitzer in #395
- Consider port when selecting launcher by @waltforme in #396
- Use EnvVars map instead of copying it by @MikeSpreitzer in #401
- Add annotations to instances in launcher by @MikeSpreitzer in #399
- Controll the GPU assignment for e2e test on OpenShift by @waltforme in #403
- Begin to use annotations in VllmConfig by @waltforme in #404
Full Changelog: v0.5.1-alpha.5...v0.5.1-alpha.6
Milestone 3 - Launcher-Based Inference with Sleep/Wake (Test Release #5)
What's Changed
- Null out
serverDat.Sleepingwhen no vLLM instances associated yet by @waltforme in #359 - deps(actions): bump docker/login-action from 3.7.0 to 4.0.0 by @dependabot[bot] in #323
- deps(actions): bump actions/checkout from 4.2.2 to 6.0.2 by @dependabot[bot] in #324
- deps(actions): bump docker/setup-buildx-action from 3.12.0 to 4.0.0 by @dependabot[bot] in #325
- ci: fix actions/checkout version comments and pin by SHA by @MikeSpreitzer in #361
- deps(actions): bump docker/build-push-action from 6.18.0 to 7.0.0 by @dependabot[bot] in #326
- Add deploy_fma.sh and debug workflow for OCP E2E by @diegocastanibm in #357
- Discontinue the usage of LauncherGeneratedBy label by @waltforme in #365
- 🌱 Unify launcher unit testing by @MikeSpreitzer in #368
- Improve launcher logging - Part 2 by @diegocastanibm in #367
- Fix: Add enable-sleep-mode flag to enable sleep mode for vllm server by @aavarghese in #376
- deps(actions): bump actions/setup-go from 6.2.0 to 6.3.0 by @dependabot[bot] in #360
- Include creation parameters inline in launcher instance state replies by @MikeSpreitzer in #369
- 🌱 Hot fix to e2e test on Openshift by @MikeSpreitzer in #382
- Sync unbound launcher-based server-providing pods by @waltforme in #362
- Preserve the final state at the end of the e2e test in kind by @waltforme in #390
- Extract launcher E2E test scenarios into reusable script by @MikeSpreitzer in #386
- Pin ko base image to chainguard/static digest by @MikeSpreitzer in #392
- deps(actions): bump docker/setup-qemu-action from 3.2.0 to 4.0.0 by @dependabot[bot] in #370
- deps(actions): bump actions/cache from 5.0.3 to 5.0.4 by @dependabot[bot] in #371
- deps(actions): bump docker/metadata-action from 5.10.0 to 6.0.0 by @dependabot[bot] in #372
- deps(go): bump the kubernetes group across 1 directory with 3 updates by @dependabot[bot] in #373
- deps: bump code-generator from v0.34.2 to v0.34.6 by @MikeSpreitzer in #393
- Dump logs for every container in e2e test by @waltforme in #394
- Self-annotation on launcher pods to signal hosted instance changes by @waltforme in #391
Full Changelog: v0.5.1-alpha.4...v0.5.1-alpha.5
Milestone 3 - Launcher-Based Inference with Sleep/Wake (Test Release #4)
What's Changed
- Adjust Node viewing ClusterRole in E2E-on-OCP workflow by @MikeSpreitzer in #313
- fix: wait for CRDs to be Established in OpenShift E2E workflow by @MikeSpreitzer in #314
- Test cases for multiple instances sharing one launcher pod by @waltforme in #264
- deps(actions): bump actions/upload-artifact from 6.0.0 to 7.0.0 by @dependabot[bot] in #301
- ✨ Add workflow summary step showing gate decision by @MikeSpreitzer in #316
- Improvements to the launcher-based tests by @waltforme in #319
- GPU assignment for launcher-based server-providing Pods by @waltforme in #317
- [DOCS] More files to align FMA with LLM-d by @diegocastanibm in #283
- [DOCS] Adding governance documents by @diegocastanibm in #278
- 🌱 Remove per-repo gh-aw typo/link/upstream workflows by @clubanderson in #321
- Launcher log improvements by @diegocastanibm in #286
- Install kubernetes python library on launcher dockerfile by @manoelmarques in #328
- 🌱 Bump vllm-openai image to v0.15.1 by @MikeSpreitzer in #329
- Rework
GetInferenceServerPortby @waltforme in #330 - Improve launcher's GPU mock by @waltforme in #322
- Fix misplaced envar names in launcher.md by @waltforme in #344
- Configure launcher for ConfigMap-based GPU UUID-to-index translation by @MikeSpreitzer in #341
- Revise type VllmConfig by @waltforme in #346
- Remove the dependency on the
gpu-mapConfigMap for M3 code in production by @waltforme in #349 - 🌱 Pin E2E-on-OpenShift test to vllm-d cluster by @MikeSpreitzer in #350
- Fix typos and add typos config by @MikeSpreitzer in #351
- deps(go): bump the kubernetes group with 3 updates by @dependabot[bot] in #300
- ci: Use real requester and launcher in OpenShift e2e by @rubambiza in #343
- ci: Address review feedback on OpenShift e2e by @rubambiza in #354
- ✨ Add cleanup of launcher image by @MikeSpreitzer in #356
- ci: Enable launcher-populator in OpenShift E2E and local tests by @aavarghese in #348
Full Changelog: v0.5.1-alpha.3...v0.5.1-alpha.4
Milestone 3 - Launcher-Based Inference with Sleep/Wake (Test Release #3)
What's Changed
- ci: remove dual-pods finalizers before namespace deletion by @MikeSpreitzer in #288
- ci: use literal Go build version instead of go.mod value by @MikeSpreitzer in #290
- 🐛 Stop the E2E on OpenShift workflow from deleting the CRDs by @MikeSpreitzer in #293
- fix: upgrade vllm CPU build from v0.15.0 to v0.15.1 by @MikeSpreitzer in #294
- ✨ Reorg E2E on OCP workflow to always dump state by @MikeSpreitzer in #295
- deps(actions): bump actions/download-artifact from 6.0.0 to 8.0.0 by @dependabot[bot] in #303
- deps(actions): bump actions/github-script from 7.0.1 to 8.0.0 by @dependabot[bot] in #304
- ci: upgrade docker/login-action to v3.7.0 by @MikeSpreitzer in #305
- ✨ New management for ValidatingAdmissionPolicy[Binding] objects by @MikeSpreitzer in #297
- ci: upgrade actions/cache to v5.0.3 by @MikeSpreitzer in #306
- fix: replace ClusterRoleBinding to view with namespace-scoped pods permission by @MikeSpreitzer in #310
Full Changelog: v0.5.1-alpha.2...v0.5.1-alpha.3
Milestone 3 - Launcher-Based Inference with Sleep/Wake (Test Release #2)
What's Changed
- ✨ Add launcher and test workload E2E tests to OpenShift CI by @clubanderson in #262
- ✨ Add GitHub Agentic Workflows for typo, link, and upstream checks by @clubanderson in #255
- [FEATURE] - Fetch stdout in launcher by @diegocastanibm in #242
- docs: Update FMA docs with ecosystem context, milestone status, and dependencies by @rubambiza in #266
- Refining ValidatingAdmissionPolicy tests to match current implementation of DP controller by @aavarghese in #234
- deps(actions): bump docker/setup-buildx-action from 3.7.1 to 3.12.0 by @dependabot[bot] in #249
- deps(go): bump the kubernetes group with 5 updates by @dependabot[bot] in #253
- Bump docker/metadata-action, actions/setup-go, and ko-build/setup-ko by @MikeSpreitzer in #281
- Bump actions/setup-python from 4 to 6.2.0 by @MikeSpreitzer in #280
- deps(go): bump github.com/spf13/pflag from 1.0.6 to 1.0.10 in the go-dependencies group by @dependabot[bot] in #254
- Combine helm charts by @rubambiza in #263
Full Changelog: v0.5.1-alpha...v0.5.1-alpha.2
Milestone 3 - Launcher-Based Inference with Sleep/Wake (Test Release)
This is a test release for Milestone 3, introducing launcher-based inference server management with sleep/wake capabilities for efficient GPU resource utilization and quick start-up.
Run TAG="v0.5.1-alpha"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Release v0.5.1-alpha completed successfully!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Container Images:
• ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/dual-pods-controller:v0.5.1-alpha
• ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/launcher-populator:v0.5.1-alpha
• ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/launcher:v0.5.1-alpha
• ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/requester:v0.5.1-alpha
Helm Charts (version 0.5.1-alpha):
• oci://ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/charts/dual-pods-controller
• oci://ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/charts/launcher-populator
Install with:
helm install dpctlr oci://ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/charts/dual-pods-controller --version 0.5.1-alpha
helm install launcher-populator oci://ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/charts/launcher-populator --version 0.5.1-alpha
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Milestone 1: Dual pods without sleep/wake
Merge pull request #88 from MikeSpreitzer/source-reorg Controller source reorg