Problem Statement
Currently, when multinode: true, the chart creates two independent LeaderWorkerSets (prefill and decode) with no coordination between them. This means rolling updates happen independently — prefill may update to a new revision while decode is still on the old one, risking KV cache incompatibility and service disruption.
Proposed Solution
Adopt the new DisaggregatedSet CRD (disaggregatedset.x-k8s.io/v1alpha1), merged into LWS main, which provides:
- Unified management of prefill/decode as a single resource
- N-dimensional coordinated rolling updates preserving the prefill-to-decode ratio
- Automatic headless Service creation with revision-aware labels for traffic routing
A new template (e.g. disaggregatedset.yaml) could be added behind an opt-in flag, generating a single DisaggregatedSet instead of two separate LWS when both prefill and decode are enabled in multinode mode.
Opening this as a discussion — would love to hear thoughts on the approach and timing (e.g. wait for DisaggregatedSet to reach beta, or start with alpha support behind a feature flag).
Alternatives Considered
Keep the current approach of two independent LWS resources managed by Helm, and rely on users to coordinate updates manually.
Additional Context
cc @jgchn @kalantar
Problem Statement
Currently, when
multinode: true, the chart creates two independent LeaderWorkerSets (prefill and decode) with no coordination between them. This means rolling updates happen independently — prefill may update to a new revision while decode is still on the old one, risking KV cache incompatibility and service disruption.Proposed Solution
Adopt the new DisaggregatedSet CRD (
disaggregatedset.x-k8s.io/v1alpha1), merged into LWS main, which provides:A new template (e.g.
disaggregatedset.yaml) could be added behind an opt-in flag, generating a single DisaggregatedSet instead of two separate LWS when both prefill and decode are enabled in multinode mode.Opening this as a discussion — would love to hear thoughts on the approach and timing (e.g. wait for DisaggregatedSet to reach beta, or start with alpha support behind a feature flag).
Alternatives Considered
Keep the current approach of two independent LWS resources managed by Helm, and rely on users to coordinate updates manually.
Additional Context
cc @jgchn @kalantar