ai-dynamo · nvrohanv · Jan 20, 2026 · Jan 21, 2026 · Jan 24, 2026 · Jan 25, 2026
@@ -53,7 +53,7 @@ Grove introduces four simple concepts:
 | [PodGang](scheduler/api/core/v1alpha1/podgang.go)                   | The scheduler API that defines a unit of gang-scheduling. A PodGang is a collection of groups of similar pods, where each pod group defines a minimum number of replicas guaranteed for gang-scheduling. |
 
 Get started with a step-by-step hands-on Grove tutorial here
-**→ [Core Concepts Overview](docs/user-guide/core-concepts/overview.md)**
+**→ [Core Concepts Overview](docs/user-guide/01_core-concepts/01_overview.md)**
 
 Refer to all Grove APIs here
 **→ [API Reference](docs/api-reference/operator-api.md)**

@@ -186,7 +186,7 @@ Only the Grove operator pod should remain.
 Now that you understand the basics, explore:
 
 - **[Installation Guide](installation.md)** - Learn more about local and remote cluster deployment
-- **[Core Concepts Tutorial](user-guide/core-concepts/overview.md)** - Step-by-step hands-on tutorial on Grove application development
+- **[Core Concepts Tutorial](user-guide/01_core-concepts/01_overview.md)** - Step-by-step hands-on tutorial on Grove application development
 - **[API Reference](api-reference/operator-api.md)** - Deep dive into all configuration options
 - **[Samples](../operator/samples/)** - Explore more examples
 

@@ -32,4 +32,4 @@ Grove provides three levels of scaling to match different operational needs:
 
 - **Scale PodClique replicas** (`kubectl scale pclq ...`) - Adjust the number of pods in a specific role. Use this for fine-tuning individual components (e.g., add more workers to an existing leader-worker group).
 
-In the [next guide](./pcs_and_pclq_intro.md) we go through some examples showcasing PodCliqueSet and PodClique
+In the [next guide](./02_pcs_and_pclq_intro.md) we go through some examples showcasing PodCliqueSet and PodClique
@@ -2,7 +2,7 @@
 
 In this guide we go over some hands-on examples showcasing how to use a PodCliqueSet and PodCliques.
 
-Refer to [Overview](./overview.md) for instructions on how to run the examples in this guide.
+Refer to [Overview](./01_overview.md) for instructions on how to run the examples in this guide.
 
 ## Example 1: Single-Node Aggregated Inference
 
@@ -31,11 +31,11 @@ spec:
           - name: model-worker
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "1"
-                memory: "2Gi"
+                cpu: "10m"
+                memory: "32Mi"
 ```
 
 ### **Key Points:**
@@ -46,11 +46,11 @@ spec:
 
 ### **Deploy:**
 
-In this example, we will deploy the file: [single-node-aggregated.yaml](../../../operator/samples/user-guide/concept-overview/single-node-aggregated.yaml)
+In this example, we will deploy the file: [single-node-aggregated.yaml](../../../operator/samples/user-guide/01_core-concepts/single-node-aggregated.yaml)
 ```bash
 # NOTE: Run the following commands from the `/path/to/grove/operator` directory,
 # where `/path/to/grove` is the root of your cloned Grove repository.
-kubectl apply -f samples/user-guide/concept-overview/single-node-aggregated.yaml
+kubectl apply -f samples/user-guide/01_core-concepts/single-node-aggregated.yaml
 kubectl get pods -l app.kubernetes.io/part-of=single-node-aggregated -o wide
 ```
 
@@ -135,11 +135,11 @@ spec:
           - name: prefill
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "2"
-                memory: "4Gi"
+                cpu: "10m"
+                memory: "32Mi"
     - name: decode
       spec:
         roleName: decode
@@ -154,11 +154,11 @@ spec:
           - name: decode
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "1"
-                memory: "2Gi"
+                cpu: "10m"
+                memory: "32Mi"
 ```
 
 ### **Key Points:**
@@ -168,11 +168,11 @@ spec:
 
 ### **Deploy**
 
-In this example, we will deploy the file: [single-node-disaggregated.yaml](../../../operator/samples/user-guide/concept-overview/single-node-disaggregated.yaml)
+In this example, we will deploy the file: [single-node-disaggregated.yaml](../../../operator/samples/user-guide/01_core-concepts/single-node-disaggregated.yaml)
 ```bash
 # NOTE: Run the following commands from the `/path/to/grove/operator` directory,
 # where `/path/to/grove` is the root of your cloned Grove repository.
-kubectl apply -f samples/user-guide/concept-overview/single-node-disaggregated.yaml
+kubectl apply -f samples/user-guide/01_core-concepts/single-node-disaggregated.yaml
 kubectl get pods -l app.kubernetes.io/part-of=single-node-disaggregated -o wide
 ```
 
@@ -193,7 +193,7 @@ You can scale the `prefill` and `decode` PodCliques the same way the [`model-wor
 
 Additionally, the `single-node-disaggregated` PodCliqueSet can be scaled the same way the `single-node-aggregated` PodCliqueSet was scaled in the previous example. We show an example to demonstrate how when PodCliqueSets are scaled, all constituent PodCliques are replicated, underscoring why scaling PodCliqueSets should be treated as scaling the entire system (useful for canary deployments, A/B testing, or high availability across zones).
 ```bash
-kubectl scale pcs single-node-aggregated --replicas=2
+kubectl scale pcs single-node-disaggregated --replicas=2
 ```
 After running this you will observe
 ```bash
@@ -219,4 +219,4 @@ To teardown the example delete the `single-node-disaggregated` PodCliqueSet, the
 kubectl delete pcs single-node-disaggregated
 ```
 
-In the [next guide](./pcsg_intro.md) we showcase how to use PodCliqueScalingGroup to represent multi-node components
+In the [next guide](./03_pcsg_intro.md) we showcase how to use PodCliqueScalingGroup to represent multi-node components
@@ -1,8 +1,8 @@
 # PodCliqueScalingGroup
 
-In the [previous guide](./pcs_and_pclq_intro.md) we covered some hands on examples on how to use PodCliqueSet and PodCliques. In this guide we go over some hands-on examples on how to use PodCliqueScalingGroup to represent multinode components.
+In the [previous guide](./02_pcs_and_pclq_intro.md) we covered some hands on examples on how to use PodCliqueSet and PodCliques. In this guide we go over some hands-on examples on how to use PodCliqueScalingGroup to represent multinode components.
 
-Refer to [Overview](./overview.md) for instructions on how to run the examples in this guide.
+Refer to [Overview](./01_overview.md) for instructions on how to run the examples in this guide.
 
 ## Example 3: Multi-Node Aggregated Inference
 
@@ -36,11 +36,11 @@ spec:
           - name: model-leader
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Model Leader (Aggregated) on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Model Leader (Aggregated) on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "2"
-                memory: "4Gi"
+                cpu: "10m"
+                memory: "32Mi"
     - name: worker
       spec:
         roleName: worker
@@ -55,11 +55,11 @@ spec:
           - name: model-worker
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "4"
-                memory: "8Gi"
+                cpu: "10m"
+                memory: "32Mi"
     podCliqueScalingGroups:
     - name: model-instance
       cliqueNames: [leader, worker]
@@ -74,11 +74,11 @@ spec:
 
 ### **Deploy:**
 
-In this example, we will deploy the file: [multi-node-aggregated.yaml](../../../operator/samples/user-guide/concept-overview/multi-node-aggregated.yaml)
+In this example, we will deploy the file: [multi-node-aggregated.yaml](../../../operator/samples/user-guide/01_core-concepts/multi-node-aggregated.yaml)
 ```bash
 # NOTE: Run the following commands from the `/path/to/grove/operator` directory,
 # where `/path/to/grove` is the root of your cloned Grove repository.
-kubectl apply -f samples/user-guide/concept-overview/multi-node-aggregated.yaml
+kubectl apply -f samples/user-guide/01_core-concepts/multi-node-aggregated.yaml
 kubectl get pods -l app.kubernetes.io/part-of=multinode-aggregated -o wide
 ```
 
@@ -207,11 +207,11 @@ spec:
           - name: prefill-leader
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "2"
-                memory: "4Gi"
+                cpu: "10m"
+                memory: "32Mi"
     - name: pworker
       spec:
         roleName: pworker
@@ -226,11 +226,11 @@ spec:
           - name: prefill-worker
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "4"
-                memory: "8Gi"
+                cpu: "10m"
+                memory: "32Mi"
     - name: dleader
       spec:
         roleName: dleader
@@ -245,11 +245,11 @@ spec:
           - name: decode-leader
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "1"
-                memory: "2Gi"
+                cpu: "10m"
+                memory: "32Mi"
     - name: dworker
       spec:
         roleName: dworker
@@ -264,11 +264,11 @@ spec:
           - name: decode-worker
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "2"
-                memory: "4Gi"
+                cpu: "10m"
+                memory: "32Mi"
     podCliqueScalingGroups:
     - name: prefill
       cliqueNames: [pleader, pworker]
@@ -288,11 +288,11 @@ spec:
 
 ### **Deploy**
 
-In this example, we will deploy the file: [multi-node-disaggregated.yaml](../../../operator/samples/user-guide/concept-overview/multi-node-disaggregated.yaml)
+In this example, we will deploy the file: [multi-node-disaggregated.yaml](../../../operator/samples/user-guide/01_core-concepts/multi-node-disaggregated.yaml)
 ```bash
 # NOTE: Run the following commands from the `/path/to/grove/operator` directory,
 # where `/path/to/grove` is the root of your cloned Grove repository.
-kubectl apply -f samples/user-guide/concept-overview/multi-node-disaggregated.yaml
+kubectl apply -f samples/user-guide/01_core-concepts/multi-node-disaggregated.yaml
 kubectl get pods -l app.kubernetes.io/part-of=multinode-disaggregated -o wide
 ```
 
@@ -325,4 +325,4 @@ To teardown the example delete the `multinode-disaggregated` PodCliqueSet, the o
 ```bash
 kubectl delete pcs multinode-disaggregated
 ```
-In the [next guide](./takeaways.md) we showcase how Grove can represent an arbitrary number of components and summarize the key takeaways.
+In the [next guide](./04_takeaways.md) we showcase how Grove can represent an arbitrary number of components and summarize the key takeaways.
@@ -1,10 +1,10 @@
 # Takeaways
 
-Refer to [Overview](./overview.md) for instructions on how to run the examples in this guide.
+Refer to [Overview](./01_overview.md) for instructions on how to run the examples in this guide.
 
 ## Example 5: Complete Inference Pipeline
 
-The [previous examples](./pcsg_intro.md) have focused on mapping various inference workloads into Grove primitives, focusing on the model instances. However, the primitives are generic and the point of Grove is to allow the user to represent as many components as they'd like. To illustrate this point we now provide an example where we represent additional components such as a frontend and vision encoder. To add additional components you simply add additional PodCliques and PodCliqueScalingGroups into the PodCliqueSet
+The [previous examples](./03_pcsg_intro.md) have focused on mapping various inference workloads into Grove primitives, focusing on the model instances. However, the primitives are generic and the point of Grove is to allow the user to represent as many components as they'd like. To illustrate this point we now provide an example where we represent additional components such as a frontend and vision encoder. To add additional components you simply add additional PodCliques and PodCliqueScalingGroups into the PodCliqueSet
 
 ```yaml
 apiVersion: grove.io/v1alpha1
@@ -31,11 +31,11 @@ spec:
           - name: frontend
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Frontend Service on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Frontend Service on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "0.5"
-                memory: "1Gi"
+                cpu: "10m"
+                memory: "32Mi"
     - name: vision-encoder
       spec:
         roleName: vision-encoder
@@ -50,11 +50,11 @@ spec:
           - name: vision-encoder
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Vision Encoder on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Vision Encoder on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "3"
-                memory: "6Gi"
+                cpu: "10m"
+                memory: "32Mi"
     # Multi-node components
     - name: pleader
       spec:
@@ -70,11 +70,11 @@ spec:
           - name: prefill-leader
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "2"
-                memory: "4Gi"
+                cpu: "10m"
+                memory: "32Mi"
     - name: pworker
       spec:
         roleName: pworker
@@ -89,11 +89,11 @@ spec:
           - name: prefill-worker
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "4"
-                memory: "8Gi"
+                cpu: "10m"
+                memory: "32Mi"
     - name: dleader
       spec:
         roleName: dleader
@@ -108,11 +108,11 @@ spec:
           - name: decode-leader
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "1"
-                memory: "2Gi"
+                cpu: "10m"
+                memory: "32Mi"
     - name: dworker
       spec:
         roleName: dworker
@@ -127,11 +127,11 @@ spec:
           - name: decode-worker
             image: nginx:latest
             command: ["/bin/sh"]
-            args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep 3600"]
+            args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep infinity"]
             resources:
               requests:
-                cpu: "2"
-                memory: "4Gi"
+                cpu: "10m"
+                memory: "32Mi"
     podCliqueScalingGroups:
     - name: prefill
       cliqueNames: [pleader, pworker]
@@ -149,11 +149,11 @@ spec:
 
 **Deploy and explore:**
 
-In this example, we will deploy the file: [complete-inference-pipeline.yaml](../../../operator/samples/user-guide/concept-overview/complete-inference-pipeline.yaml)
+In this example, we will deploy the file: [complete-inference-pipeline.yaml](../../../operator/samples/user-guide/01_core-concepts/complete-inference-pipeline.yaml)
 ```bash
 # NOTE: Run the following commands from the `/path/to/grove/operator` directory,
 # where `/path/to/grove` is the root of your cloned Grove repository.
-kubectl apply -f samples/user-guide/concept-overview/complete-inference-pipeline.yaml
+kubectl apply -f samples/user-guide/01_core-concepts/complete-inference-pipeline.yaml
 kubectl get pods -l app.kubernetes.io/part-of=comp-inf-ppln -o wide
 ```
 

@@ -0,0 +1,17 @@
+# Pod and Resource Naming Conventions
+
+This section explains Grove's hierarchical naming scheme for pods and resources. Grove's naming convention is designed to be **self-documenting**: when you run `kubectl get pods`, the pod names immediately tell you which PodCliqueSet, PodCliqueScalingGroup (if applicable), and PodClique each pod belongs to.
+
+## Prerequisites
+
+Before starting this section:
+- Review the [core concepts tutorial](../01_core-concepts/01_overview.md) to understand Grove's primitives
+- Set up a cluster following the [installation guide](../../installation.md), the two options are:
+  - [A local KIND demo cluster](../../installation.md#local-kind-cluster-set-up): Create the cluster with `make kind-up FAKE_NODES=40`, set `KUBECONFIG` env variable as directed, and run `make deploy`
+  - [A remote Kubernetes cluster](../../installation.md#remote-cluster-set-up) with [Grove installed from package](../../installation.md#install-grove-from-package)
+
+## Guides in This Section
+
+1. **[Naming Conventions](./02_naming-conventions.md)**: Learn the naming patterns, best practices, and how to plan names for your resources.
+
+2. **[Hands-On Example](./03_hands-on-example.md)**: Deploy an example system with the structure of a multi-node disaggregated inference system and observe the naming hierarchy in action.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -32,4 +32,4 @@ Grove provides three levels of scaling to match different operational needs:

		- Scale PodClique replicas (`kubectl scale pclq ...`) - Adjust the number of pods in a specific role. Use this for fine-tuning individual components (e.g., add more workers to an existing leader-worker group).

		In the [next guide](./pcs_and_pclq_intro.md) we go through some examples showcasing PodCliqueSet and PodClique
		In the [next guide](./02_pcs_and_pclq_intro.md) we go through some examples showcasing PodCliqueSet and PodClique