Support LeaderWorkerSet (LWS) as Scaling Target

## Motivation

The autoscaler currently only supports `Deployment` as a scaling target. All workload access — replica counts, GPU extraction, vLLM arg parsing, pod ownership resolution — is hardcoded to `*appsv1.Deployment`. This prevents scaling multi-node inference workloads that use [LeaderWorkerSet](https://github.com/kubernetes-sigs/lws) (`leaderworkerset.x-k8s.io/v1`), the standard Kubernetes API for leader-worker patterns commonly used in tensor-parallel and pipeline-parallel vLLM deployments.

The `ScaleTargetRef` field in the VA CRD already accepts any Kind (it uses `autoscalingv1.CrossVersionObjectReference`), but the internal implementation ignores the Kind and always fetches a Deployment.

**Note:** The ScaleFromZero engine (`internal/engines/scalefromzero/engine.go`) already supports arbitrary Kinds via `RESTMapper` + unstructured client. This pattern should be extended to the saturation engine.

## Current State: Deployment-Hardcoded Locations

| # | Component | File | Line | Issue |
|---|---|---|------|-------|
| 1 | `GetDeploymentWithBackoff()` | `internal/utils/utils.go` | 89 | Typed to `*appsv1.Deployment` parameter |
| 2 | `getDeploymentGPUsPerReplica()` | `internal/engines/saturation/engine.go` | 562 | Accesses `deploy.Spec.Template.Spec.Containers` |
| 3 | `ParseVLLMArgs()` | `internal/engines/analyzers/saturation_v2/deployment_parser.go` | 55 | Accesses `deploy.Spec.Template.Spec.Containers` |
| 4 | `GetCurrentDeploymentReplicas()` | `internal/actuator/actuator.go` | 28 | Reads `deploy.Status.Replicas` and `deploy.Spec.Replicas` |
| 5 | `findDeploymentForPod()` | `internal/collector/source/pod_va_mapper.go` | 88 | Hardcoded `owner.Kind != "Deployment"` string check |
| 6 | Deployment maps | `engine.go`, `engine_v2.go`, `replica_metrics.go` | various | `map[string]*appsv1.Deployment` throughout |
| 7 | Indexer Kind default | `internal/controller/indexers/indexers.go` | 45 | Defaults unknown Kinds to `apps/v1` |

## LeaderWorkerSet vs Deployment

| Aspect | Deployment | LeaderWorkerSet |
|---|---|---|
| API group | `apps/v1` | `leaderworkerset.x-k8s.io/v1` |
| Pod template | `spec.template` | `spec.leaderWorkerTemplate.workerTemplate` (required) + optional `spec.leaderWorkerTemplate.leaderTemplate` (`*PodTemplateSpec`, defaults to workerTemplate when nil) |
| Replica field | `spec.replicas` | `spec.replicas` (number of groups) |
| Group size | N/A (1 pod = 1 replica) | `spec.leaderWorkerTemplate.size` (total pods per group: 1 leader + Size-1 workers) |
| Status replicas | `status.replicas` | `status.replicas` (number of ready groups) |
| Pod ownership | Pod → ReplicaSet → Deployment | Pod → StatefulSet → LeaderWorkerSet (leader) or Pod → LeaderWorkerSet (workers) |
| GPU resources | On container in pod template | On containers in **both** leader and worker templates — **leader also does compute** |
| vLLM args | In container args/env | In **leader** container args/env (leader starts vLLM API server + Ray head; workers join as Ray workers) |
| vLLM metrics | From all pods | From **leader** pod (port 8080, serves OpenAI API + exposes /metrics) |
| Scale subresource | Supported | Supported |

### Important: Leader Also Does Compute

In vLLM multi-node deployments, **both leader and worker pods run vLLM and require GPUs**. The leader is not just a coordinator — it runs the vLLM API server, acts as the Ray head node, and participates in tensor-parallel/pipeline-parallel computation alongside workers.

Example from [vLLM LWS deployment guide](https://docs.vllm.ai/en/latest/deployment/frameworks/lws/):
- Leader: 8x NVIDIA GPUs, runs `vllm.entrypoints.openai.api_server`, serves port 8080
- Workers: 8x NVIDIA GPUs each, join as Ray worker nodes
- Both have identical GPU/memory resource requests

Total GPUs per replica group = leader_GPUs + (Size - 1) × worker_GPUs.
In practice, leader and worker GPU counts are typically identical for TP/PP workloads.

## Proposed Design

### ScaleTargetAccessor interface

Introduce a `ScaleTargetAccessor` interface that provides a uniform API to extract scaling-relevant information from any supported workload kind:

```go
// ScaleTargetAccessor provides a uniform interface to extract scaling-relevant
// information from any supported scale target kind (Deployment, LeaderWorkerSet).
type ScaleTargetAccessor interface {
    // GetReplicas returns current spec replicas.
    GetReplicas() *int32
    // GetStatusReplicas returns status replicas (actual running).
    GetStatusReplicas() int32
    // GetLeaderPodTemplateSpec returns the pod template for the leader/primary pod.
    // For Deployment: the single pod template.
    // For LWS: the leader template (falls back to worker template if not set).
    // Use this for: vLLM args extraction (leader starts the API server),
    // metrics port discovery, pod label matching.
    GetLeaderPodTemplateSpec() corev1.PodTemplateSpec
    // GetWorkerPodTemplateSpec returns the pod template for worker pods.
    // For Deployment: same as GetLeaderPodTemplateSpec() (single template).
    // For LWS: the worker template.
    // Use this for: GPU resource extraction when workers differ from leader.
    GetWorkerPodTemplateSpec() corev1.PodTemplateSpec
    // GetTotalGPUsPerReplica returns total GPU count across all pods in a replica.
    // For Deployment: GPUs from the single pod template.
    // For LWS: leader_GPUs + (Size - 1) * worker_GPUs.
    GetTotalGPUsPerReplica() int
    // GetGroupSize returns the number of pods per replica.
    // For Deployment: always 1.
    // For LWS: spec.leaderWorkerTemplate.size (1 leader + N-1 workers).
    GetGroupSize() int32
    // GetObject returns the underlying client.Object for K8s operations.
    GetObject() client.Object
}
```



## Changes Required

### 1. Add LWS API dependency

```bash
go get sigs.k8s.io/lws@latest
```

Register LWS scheme in `cmd/main.go`:

```go
import lwsv1 "sigs.k8s.io/lws/api/leaderworkerset/v1"

func init() {
    utilruntime.Must(lwsv1.AddToScheme(scheme))
}
```

### 2. Create ScaleTargetAccessor package

New package: `internal/utils/scaletarget/`

| File | Contents |
|---|---|
| `accessor.go` | `ScaleTargetAccessor` interface definition |
| `deployment.go` | `DeploymentAccessor` implementation |
| `lws.go` | `LWSAccessor` implementation |
| `fetch.go` | `FetchScaleTarget()` factory function |
| `accessor_test.go` | Unit tests for both implementations |

### 3. Refactor callers to use ScaleTargetAccessor

| Before | After |
|---|---|
| `getDeploymentGPUsPerReplica(deploy)` | `accessor.GetTotalGPUsPerReplica()` |
| `ParseVLLMArgs(deploy *appsv1.Deployment)` | `ParseVLLMArgs(podTemplate corev1.PodTemplateSpec)` — use `accessor.GetLeaderPodTemplateSpec()` (leader runs the vLLM API server with `--tensor-parallel-size`, `--model`, etc.) |
| `GetCurrentDeploymentReplicas(va)` | `accessor.GetStatusReplicas()` |
| `deployments map[string]*appsv1.Deployment` | `scaleTargets map[string]ScaleTargetAccessor` |
| `utils.GetDeploymentWithBackoff(...)` | `scaletarget.FetchScaleTarget(ctx, c, kind, name, ns)` |

### 4. Fix pod ownership chain

`internal/collector/source/pod_va_mapper.go:88`:

```go
// Before:
if rsOwner.Kind != "Deployment" {
    return ""
}

// After: support multiple ownership chains
// Deployment: Pod → ReplicaSet → Deployment
// LWS leader: Pod → StatefulSet → LeaderWorkerSet
// LWS worker: Pod → LeaderWorkerSet (direct)
```

### 5. Update indexer

`internal/controller/indexers/indexers.go`:

```go
switch ref.Kind {
case "Deployment":
    ref.APIVersion = "apps/v1"
case "LeaderWorkerSet":
    ref.APIVersion = "leaderworkerset.x-k8s.io/v1"
default:
    ref.APIVersion = "apps/v1"
}
```

### 6. Update deployment_parser.go

Change `ParseVLLMArgs` to accept `corev1.PodTemplateSpec` instead of `*appsv1.Deployment`:

```go
// Before:
func ParseVLLMArgs(deploy *appsv1.Deployment) VLLMEngineParams {
    for _, container := range deploy.Spec.Template.Spec.Containers { ... }
}

// After:
func ParseVLLMArgs(podTemplate corev1.PodTemplateSpec) VLLMEngineParams {
    for _, container := range podTemplate.Spec.Containers { ... }
}
```

Callers use `accessor.GetLeaderPodTemplateSpec()` to provide the template, because the
leader pod starts the vLLM API server with `--tensor-parallel-size`, `--model`,
`--max-num-seqs`, and other engine parameters. Workers join as Ray worker nodes and
inherit their configuration from the leader.

## Example VA for LWS

```yaml
apiVersion: llmd.ai/v1alpha1
kind: VariantAutoscaling
metadata:
  name: llama-70b-tp8
  labels:
    inference.optimization/acceleratorName: "H100"
spec:
  scaleTargetRef:
    kind: LeaderWorkerSet
    name: llama-70b-tp8-lws
    apiVersion: leaderworkerset.x-k8s.io/v1
  modelID: "llama-70b"
  variantCost: "80.0"
```

## Backward Compatibility

- Existing VAs with `kind: Deployment` work unchanged — `DeploymentAccessor` preserves current behavior
- The CRD `ScaleTargetRef` already accepts any Kind; no schema change needed
- The refactor is purely internal; no user-visible API changes
- LWS scheme registration is additive (does not affect Deployment handling)
- If LWS CRDs are not installed in the cluster, VAs with `kind: LeaderWorkerSet` will fail at fetch time with a clear error

#	Component	File	Line	Issue
1	`GetDeploymentWithBackoff()`	`internal/utils/utils.go`	89	Typed to `*appsv1.Deployment` parameter
2	`getDeploymentGPUsPerReplica()`	`internal/engines/saturation/engine.go`	562	Accesses `deploy.Spec.Template.Spec.Containers`
3	`ParseVLLMArgs()`	`internal/engines/analyzers/saturation_v2/deployment_parser.go`	55	Accesses `deploy.Spec.Template.Spec.Containers`
4	`GetCurrentDeploymentReplicas()`	`internal/actuator/actuator.go`	28	Reads `deploy.Status.Replicas` and `deploy.Spec.Replicas`
5	`findDeploymentForPod()`	`internal/collector/source/pod_va_mapper.go`	88	Hardcoded `owner.Kind != "Deployment"` string check
6	Deployment maps	`engine.go`, `engine_v2.go`, `replica_metrics.go`	various	`map[string]*appsv1.Deployment` throughout
7	Indexer Kind default	`internal/controller/indexers/indexers.go`	45	Defaults unknown Kinds to `apps/v1`

File	Contents
`accessor.go`	`ScaleTargetAccessor` interface definition
`deployment.go`	`DeploymentAccessor` implementation
`lws.go`	`LWSAccessor` implementation
`fetch.go`	`FetchScaleTarget()` factory function
`accessor_test.go`	Unit tests for both implementations

Before	After
`getDeploymentGPUsPerReplica(deploy)`	`accessor.GetTotalGPUsPerReplica()`
`ParseVLLMArgs(deploy *appsv1.Deployment)`	`ParseVLLMArgs(podTemplate corev1.PodTemplateSpec)` — use `accessor.GetLeaderPodTemplateSpec()` (leader runs the vLLM API server with `--tensor-parallel-size`, `--model`, etc.)
`GetCurrentDeploymentReplicas(va)`	`accessor.GetStatusReplicas()`
`deployments map[string]*appsv1.Deployment`	`scaleTargets map[string]ScaleTargetAccessor`
`utils.GetDeploymentWithBackoff(...)`	`scaletarget.FetchScaleTarget(ctx, c, kind, name, ns)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support LeaderWorkerSet (LWS) as Scaling Target #811

Motivation

Current State: Deployment-Hardcoded Locations

LeaderWorkerSet vs Deployment

Important: Leader Also Does Compute

Proposed Design

ScaleTargetAccessor interface

Changes Required

1. Add LWS API dependency

2. Create ScaleTargetAccessor package

3. Refactor callers to use ScaleTargetAccessor

4. Fix pod ownership chain

5. Update indexer

6. Update deployment_parser.go

Example VA for LWS

Backward Compatibility

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Aspect	Deployment	LeaderWorkerSet
API group	`apps/v1`	`leaderworkerset.x-k8s.io/v1`
Pod template	`spec.template`	`spec.leaderWorkerTemplate.workerTemplate` (required) + optional `spec.leaderWorkerTemplate.leaderTemplate` (`*PodTemplateSpec`, defaults to workerTemplate when nil)
Replica field	`spec.replicas`	`spec.replicas` (number of groups)
Group size	N/A (1 pod = 1 replica)	`spec.leaderWorkerTemplate.size` (total pods per group: 1 leader + Size-1 workers)
Status replicas	`status.replicas`	`status.replicas` (number of ready groups)
Pod ownership	Pod → ReplicaSet → Deployment	Pod → StatefulSet → LeaderWorkerSet (leader) or Pod → LeaderWorkerSet (workers)
GPU resources	On container in pod template	On containers in both leader and worker templates — leader also does compute
vLLM args	In container args/env	In leader container args/env (leader starts vLLM API server + Ray head; workers join as Ray workers)
vLLM metrics	From all pods	From leader pod (port 8080, serves OpenAI API + exposes /metrics)
Scale subresource	Supported	Supported

Support LeaderWorkerSet (LWS) as Scaling Target #811

Description

Motivation

Current State: Deployment-Hardcoded Locations

LeaderWorkerSet vs Deployment

Important: Leader Also Does Compute

Proposed Design

ScaleTargetAccessor interface

Changes Required

1. Add LWS API dependency

2. Create ScaleTargetAccessor package

3. Refactor callers to use ScaleTargetAccessor

4. Fix pod ownership chain

5. Update indexer

6. Update deployment_parser.go

Example VA for LWS

Backward Compatibility

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions