Skip to content

Commit 00d9dca

Browse files
committed
Cleanup doc
Signed-off-by: Lionel Villard <villard@us.ibm.com>
1 parent 5a6e409 commit 00d9dca

24 files changed

+207
-1981
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,6 @@ llmd-infra/
3535

3636
*.tgz
3737
actionlint
38+
39+
# AI
40+
.claude

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ The repository uses AI-powered workflows to automate repetitive tasks:
5656
- **Workflow Creation**: Interactive designer for new workflows
5757
- **Workflow Debugging**: Assists with troubleshooting
5858

59-
Learn more in the [Agentic Workflows Guide](docs/developer-guide/agentic-workflows.md).
59+
Learn more in the [Developer Guide](docs/developer-guide/development.md).
6060

6161
## WVA Project Structure
6262

README.md

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The Workload Variant Autoscaler (WVA) is a Kubernetes-based global autoscaler fo
88

99
### What is a variant?
1010

11-
In WVA, a **variant** is a way of serving a given model: a scale target (Deployment, StatefulSet, or LWS) with a particular combination of hardware, runtimes, and serving approach. Variants for the same model share the same base model (e.g. meta/llama-3.1-8b); LoRA adapters can differ per variant. Each variant is a distinct setup—e.g. different accelerators (A100, H100, L4), parallelism, or performance requirements. Create one `VariantAutoscaling` per variant; when several variants serve the same model, WVA chooses which to scale (e.g. add capacity on the cheapest variant, remove it from the most expensive). See [Configuration](docs/user-guide/configuration.md) and [Saturation Analyzer](docs/saturation-analyzer.md) for details.
11+
In WVA, a **variant** is a way of serving a given model: a scale target (Deployment, StatefulSet, or LWS) with a particular combination of hardware, runtimes, and serving approach. Variants for the same model share the same base model (e.g. meta/llama-3.1-8b); LoRA adapters can differ per variant. Each variant is a distinct setup—e.g. different accelerators (A100, H100, L4), parallelism, or performance requirements. Create one `VariantAutoscaling` per variant; when several variants serve the same model, WVA chooses which to scale (e.g. add capacity on the cheapest variant, remove it from the most expensive). See [Configuration](docs/user-guide/configuration.md) and [Saturation Analyzer](docs/user-guide/saturation-analyzer.md) for details.
1212

1313
<!--
1414
<![Architecture](docs/design/diagrams/inferno-WVA-design.png)>
@@ -29,16 +29,9 @@ In WVA, a **variant** is a way of serving a given model: a scale target (Deploym
2929
- [CRD Reference](docs/user-guide/crd-reference.md)
3030
- [Multi-Controller Isolation](docs/user-guide/multi-controller-isolation.md)
3131

32-
<!--
33-
34-
### Tutorials
35-
- [Quick Start Demo](docs/tutorials/demo.md)
36-
- [Parameter Estimation](docs/tutorials/parameter-estimation.md)
37-
- [vLLM Server Setup](docs/tutorials/vllm-samples.md)
38-
-->
3932
### Integrations
40-
- [HPA Integration](docs/integrations/hpa-integration.md)
41-
- [KEDA Integration](docs/integrations/keda-integration.md)
33+
- [HPA Integration](docs/user-guide/hpa-integration.md)
34+
- [KEDA Integration](docs/user-guide/keda-integration.md)
4235
- [Prometheus Metrics](docs/integrations/prometheus.md)
4336

4437
<!--

charts/workload-variant-autoscaler/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,7 @@ HPA_STABILIZATION_SECONDS=120 ./deploy/install.sh
248248
- **Development**: Use 30-60 seconds for faster iteration
249249
- **E2E Tests**: Use 30 seconds for rapid validation
250250

251-
See [HPA Integration Guide](../../docs/integrations/hpa-integration.md) for detailed information.
251+
See [HPA Integration Guide](../../docs/user-guide/hpa-integration.md) for detailed information.
252252

253253
### Usage Examples
254254

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
metadata:
4+
name: hpa-sample
5+
resources:
6+
- va.yaml
7+
- hpa.yaml

config/samples/hpa/va.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Example VariantAutoscaling for HPA/KEDA integration.
2+
# Ensure a Deployment named sample-deployment exists in llm-d-sim (e.g. from kind-emulator or e2e).
3+
apiVersion: llmd.ai/v1alpha1
4+
kind: VariantAutoscaling
5+
metadata:
6+
name: sample-deployment
7+
namespace: llm-d-sim
8+
labels:
9+
inference.optimization/acceleratorName: A100
10+
spec:
11+
scaleTargetRef:
12+
kind: Deployment
13+
name: sample-deployment
14+
modelID: default/default
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
metadata:
4+
name: keda-sample
5+
resources:
6+
- va.yaml
7+
- scaledobject.yaml

config/samples/keda/va.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Example VariantAutoscaling for HPA/KEDA integration.
2+
# Ensure a Deployment named sample-deployment exists in llm-d-sim (e.g. from kind-emulator or e2e).
3+
apiVersion: llmd.ai/v1alpha1
4+
kind: VariantAutoscaling
5+
metadata:
6+
name: sample-deployment
7+
namespace: llm-d-sim
8+
labels:
9+
inference.optimization/acceleratorName: A100
10+
spec:
11+
scaleTargetRef:
12+
kind: Deployment
13+
name: sample-deployment
14+
modelID: default/default

0 commit comments

Comments
 (0)