Skip to content

[Feature]: Adopt LWS DisaggregatedSet for coordinated prefill/decode lifecycle management #252

@yankay

Description

@yankay

Problem Statement

Currently, when multinode: true, the chart creates two independent LeaderWorkerSets (prefill and decode) with no coordination between them. This means rolling updates happen independently — prefill may update to a new revision while decode is still on the old one, risking KV cache incompatibility and service disruption.

Proposed Solution

Adopt the new DisaggregatedSet CRD (disaggregatedset.x-k8s.io/v1alpha1), merged into LWS main, which provides:

  • Unified management of prefill/decode as a single resource
  • N-dimensional coordinated rolling updates preserving the prefill-to-decode ratio
  • Automatic headless Service creation with revision-aware labels for traffic routing

A new template (e.g. disaggregatedset.yaml) could be added behind an opt-in flag, generating a single DisaggregatedSet instead of two separate LWS when both prefill and decode are enabled in multinode mode.

Opening this as a discussion — would love to hear thoughts on the approach and timing (e.g. wait for DisaggregatedSet to reach beta, or start with alpha support behind a feature flag).

Alternatives Considered

Keep the current approach of two independent LWS resources managed by Helm, and rely on users to coordinate updates manually.

Additional Context

cc @jgchn @kalantar

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions