llm-d
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 1 addition & 1 deletion b/‎CONTRIBUTING.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 3 additions & 10 deletions b/‎README.md‎
Lines changed: 3 additions & 10 deletions
diff --git a/‎charts/workload-variant-autoscaler/README.md‎
Lines changed: 1 addition & 1 deletion b/‎charts/workload-variant-autoscaler/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎config/samples/hpa-integration.yaml‎ ‎config/samples/hpa/hpa.yaml‎config/samples/hpa-integration.yaml renamed to config/samples/hpa/hpa.yaml b/‎config/samples/hpa-integration.yaml‎ ‎config/samples/hpa/hpa.yaml‎config/samples/hpa-integration.yaml renamed to config/samples/hpa/hpa.yaml
diff --git a/‎config/samples/hpa/kustomization.yaml‎
Lines changed: 7 additions & 0 deletions b/‎config/samples/hpa/kustomization.yaml‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎config/samples/hpa/va.yaml‎
Lines changed: 14 additions & 0 deletions b/‎config/samples/hpa/va.yaml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎config/samples/keda/kustomization.yaml‎
Lines changed: 7 additions & 0 deletions b/‎config/samples/keda/kustomization.yaml‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎config/samples/keda-scaled-object.yaml‎ ‎config/samples/keda/scaledobject.yaml‎config/samples/keda-scaled-object.yaml renamed to config/samples/keda/scaledobject.yaml b/‎config/samples/keda-scaled-object.yaml‎ ‎config/samples/keda/scaledobject.yaml‎config/samples/keda-scaled-object.yaml renamed to config/samples/keda/scaledobject.yaml
diff --git a/‎config/samples/keda/va.yaml‎
Lines changed: 14 additions & 0 deletions b/‎config/samples/keda/va.yaml‎
Lines changed: 14 additions & 0 deletions
@@ -35,3 +35,6 @@ llmd-infra/
 
 *.tgz
 actionlint
+
+# AI
+.claude
@@ -56,7 +56,7 @@ The repository uses AI-powered workflows to automate repetitive tasks:
 - **Workflow Creation**: Interactive designer for new workflows
 - **Workflow Debugging**: Assists with troubleshooting
 
-Learn more in the [Agentic Workflows Guide](docs/developer-guide/agentic-workflows.md).
+Learn more in the [Developer Guide](docs/developer-guide/development.md).
 
 ## WVA Project Structure
 
 
@@ -8,7 +8,7 @@ The Workload Variant Autoscaler (WVA) is a Kubernetes-based global autoscaler fo
 
 ### What is a variant?
 
-In WVA, a **variant** is a way of serving a given model: a scale target (Deployment, StatefulSet, or LWS) with a particular combination of hardware, runtimes, and serving approach. Variants for the same model share the same base model (e.g. meta/llama-3.1-8b); LoRA adapters can differ per variant. Each variant is a distinct setup—e.g. different accelerators (A100, H100, L4), parallelism, or performance requirements. Create one `VariantAutoscaling` per variant; when several variants serve the same model, WVA chooses which to scale (e.g. add capacity on the cheapest variant, remove it from the most expensive). See [Configuration](docs/user-guide/configuration.md) and [Saturation Analyzer](docs/saturation-analyzer.md) for details.
+In WVA, a **variant** is a way of serving a given model: a scale target (Deployment, StatefulSet, or LWS) with a particular combination of hardware, runtimes, and serving approach. Variants for the same model share the same base model (e.g. meta/llama-3.1-8b); LoRA adapters can differ per variant. Each variant is a distinct setup—e.g. different accelerators (A100, H100, L4), parallelism, or performance requirements. Create one `VariantAutoscaling` per variant; when several variants serve the same model, WVA chooses which to scale (e.g. add capacity on the cheapest variant, remove it from the most expensive). See [Configuration](docs/user-guide/configuration.md) and [Saturation Analyzer](docs/user-guide/saturation-analyzer.md) for details.
 
 <!--
 <![Architecture](docs/design/diagrams/inferno-WVA-design.png)>
@@ -29,16 +29,9 @@ In WVA, a **variant** is a way of serving a given model: a scale target (Deploym
 - [CRD Reference](docs/user-guide/crd-reference.md)
 - [Multi-Controller Isolation](docs/user-guide/multi-controller-isolation.md)
 
-<!-- 
-
-### Tutorials
-- [Quick Start Demo](docs/tutorials/demo.md)
-- [Parameter Estimation](docs/tutorials/parameter-estimation.md)
-- [vLLM Server Setup](docs/tutorials/vllm-samples.md)
--->
 ### Integrations
-- [HPA Integration](docs/integrations/hpa-integration.md)
-- [KEDA Integration](docs/integrations/keda-integration.md)
+- [HPA Integration](docs/user-guide/hpa-integration.md)
+- [KEDA Integration](docs/user-guide/keda-integration.md)
 - [Prometheus Metrics](docs/integrations/prometheus.md)
 
 <!-- 
 
@@ -248,7 +248,7 @@ HPA_STABILIZATION_SECONDS=120 ./deploy/install.sh
 - **Development**: Use 30-60 seconds for faster iteration
 - **E2E Tests**: Use 30 seconds for rapid validation
 
-See [HPA Integration Guide](../../docs/integrations/hpa-integration.md) for detailed information.
+See [HPA Integration Guide](../../docs/user-guide/hpa-integration.md) for detailed information.
 
 ### Usage Examples
 
 
@@ -0,0 +1,7 @@
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+metadata:
+  name: hpa-sample
+resources:
+- va.yaml
+- hpa.yaml
@@ -0,0 +1,14 @@
+# Example VariantAutoscaling for HPA/KEDA integration.
+# Ensure a Deployment named sample-deployment exists in llm-d-sim (e.g. from kind-emulator or e2e).
+apiVersion: llmd.ai/v1alpha1
+kind: VariantAutoscaling
+metadata:
+  name: sample-deployment
+  namespace: llm-d-sim
+  labels:
+    inference.optimization/acceleratorName: A100
+spec:
+  scaleTargetRef:
+    kind: Deployment
+    name: sample-deployment
+  modelID: default/default
@@ -0,0 +1,7 @@
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+metadata:
+  name: keda-sample
+resources:
+- va.yaml
+- scaledobject.yaml
@@ -0,0 +1,14 @@
+# Example VariantAutoscaling for HPA/KEDA integration.
+# Ensure a Deployment named sample-deployment exists in llm-d-sim (e.g. from kind-emulator or e2e).
+apiVersion: llmd.ai/v1alpha1
+kind: VariantAutoscaling
+metadata:
+  name: sample-deployment
+  namespace: llm-d-sim
+  labels:
+    inference.optimization/acceleratorName: A100
+spec:
+  scaleTargetRef:
+    kind: Deployment
+    name: sample-deployment
+  modelID: default/default
-Original file line number
+Diff line change
 *.tgz
 actionlint
++
 +# AI
 +.claude