Milestone 3 - Launcher-Based Inference with Sleep/Wake (Test Release)
This is a test release for Milestone 3, introducing launcher-based inference server management with sleep/wake capabilities for efficient GPU resource utilization and quick start-up.
Run TAG="v0.5.1-alpha"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Release v0.5.1-alpha completed successfully!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Container Images:
• ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/dual-pods-controller:v0.5.1-alpha
• ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/launcher-populator:v0.5.1-alpha
• ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/launcher:v0.5.1-alpha
• ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/requester:v0.5.1-alpha
Helm Charts (version 0.5.1-alpha):
• oci://ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/charts/dual-pods-controller
• oci://ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/charts/launcher-populator
Install with:
helm install dpctlr oci://ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/charts/dual-pods-controller --version 0.5.1-alpha
helm install launcher-populator oci://ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/charts/launcher-populator --version 0.5.1-alpha
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━