diff --git a/README.md b/README.md index 385419028..87e68b381 100644 --- a/README.md +++ b/README.md @@ -53,7 +53,7 @@ Grove introduces four simple concepts: | [PodGang](scheduler/api/core/v1alpha1/podgang.go) | The scheduler API that defines a unit of gang-scheduling. A PodGang is a collection of groups of similar pods, where each pod group defines a minimum number of replicas guaranteed for gang-scheduling. | Get started with a step-by-step hands-on Grove tutorial here -**→ [Core Concepts Overview](docs/user-guide/core-concepts/overview.md)** +**→ [Core Concepts Overview](docs/user-guide/01_core-concepts/01_overview.md)** Refer to all Grove APIs here **→ [API Reference](docs/api-reference/operator-api.md)** diff --git a/docs/quickstart.md b/docs/quickstart.md index 3d1928119..50d77c51c 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -186,7 +186,7 @@ Only the Grove operator pod should remain. Now that you understand the basics, explore: - **[Installation Guide](installation.md)** - Learn more about local and remote cluster deployment -- **[Core Concepts Tutorial](user-guide/core-concepts/overview.md)** - Step-by-step hands-on tutorial on Grove application development +- **[Core Concepts Tutorial](user-guide/01_core-concepts/01_overview.md)** - Step-by-step hands-on tutorial on Grove application development - **[API Reference](api-reference/operator-api.md)** - Deep dive into all configuration options - **[Samples](../operator/samples/)** - Explore more examples diff --git a/docs/user-guide/core-concepts/overview.md b/docs/user-guide/01_core-concepts/01_overview.md similarity index 96% rename from docs/user-guide/core-concepts/overview.md rename to docs/user-guide/01_core-concepts/01_overview.md index 0518d69a6..33ab83dc5 100644 --- a/docs/user-guide/core-concepts/overview.md +++ b/docs/user-guide/01_core-concepts/01_overview.md @@ -32,4 +32,4 @@ Grove provides three levels of scaling to match different operational needs: - **Scale PodClique replicas** (`kubectl scale pclq ...`) - Adjust the number of pods in a specific role. Use this for fine-tuning individual components (e.g., add more workers to an existing leader-worker group). -In the [next guide](./pcs_and_pclq_intro.md) we go through some examples showcasing PodCliqueSet and PodClique +In the [next guide](./02_pcs_and_pclq_intro.md) we go through some examples showcasing PodCliqueSet and PodClique diff --git a/docs/user-guide/core-concepts/pcs_and_pclq_intro.md b/docs/user-guide/01_core-concepts/02_pcs_and_pclq_intro.md similarity index 94% rename from docs/user-guide/core-concepts/pcs_and_pclq_intro.md rename to docs/user-guide/01_core-concepts/02_pcs_and_pclq_intro.md index 2ea37a19d..4522b3c91 100644 --- a/docs/user-guide/core-concepts/pcs_and_pclq_intro.md +++ b/docs/user-guide/01_core-concepts/02_pcs_and_pclq_intro.md @@ -2,7 +2,7 @@ In this guide we go over some hands-on examples showcasing how to use a PodCliqueSet and PodCliques. -Refer to [Overview](./overview.md) for instructions on how to run the examples in this guide. +Refer to [Overview](./01_overview.md) for instructions on how to run the examples in this guide. ## Example 1: Single-Node Aggregated Inference @@ -31,11 +31,11 @@ spec: - name: model-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep infinity"] resources: requests: - cpu: "1" - memory: "2Gi" + cpu: "10m" + memory: "32Mi" ``` ### **Key Points:** @@ -46,11 +46,11 @@ spec: ### **Deploy:** -In this example, we will deploy the file: [single-node-aggregated.yaml](../../../operator/samples/user-guide/concept-overview/single-node-aggregated.yaml) +In this example, we will deploy the file: [single-node-aggregated.yaml](../../../operator/samples/user-guide/01_core-concepts/single-node-aggregated.yaml) ```bash # NOTE: Run the following commands from the `/path/to/grove/operator` directory, # where `/path/to/grove` is the root of your cloned Grove repository. -kubectl apply -f samples/user-guide/concept-overview/single-node-aggregated.yaml +kubectl apply -f samples/user-guide/01_core-concepts/single-node-aggregated.yaml kubectl get pods -l app.kubernetes.io/part-of=single-node-aggregated -o wide ``` @@ -135,11 +135,11 @@ spec: - name: prefill image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" - name: decode spec: roleName: decode @@ -154,11 +154,11 @@ spec: - name: decode image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "1" - memory: "2Gi" + cpu: "10m" + memory: "32Mi" ``` ### **Key Points:** @@ -168,11 +168,11 @@ spec: ### **Deploy** -In this example, we will deploy the file: [single-node-disaggregated.yaml](../../../operator/samples/user-guide/concept-overview/single-node-disaggregated.yaml) +In this example, we will deploy the file: [single-node-disaggregated.yaml](../../../operator/samples/user-guide/01_core-concepts/single-node-disaggregated.yaml) ```bash # NOTE: Run the following commands from the `/path/to/grove/operator` directory, # where `/path/to/grove` is the root of your cloned Grove repository. -kubectl apply -f samples/user-guide/concept-overview/single-node-disaggregated.yaml +kubectl apply -f samples/user-guide/01_core-concepts/single-node-disaggregated.yaml kubectl get pods -l app.kubernetes.io/part-of=single-node-disaggregated -o wide ``` @@ -193,7 +193,7 @@ You can scale the `prefill` and `decode` PodCliques the same way the [`model-wor Additionally, the `single-node-disaggregated` PodCliqueSet can be scaled the same way the `single-node-aggregated` PodCliqueSet was scaled in the previous example. We show an example to demonstrate how when PodCliqueSets are scaled, all constituent PodCliques are replicated, underscoring why scaling PodCliqueSets should be treated as scaling the entire system (useful for canary deployments, A/B testing, or high availability across zones). ```bash -kubectl scale pcs single-node-aggregated --replicas=2 +kubectl scale pcs single-node-disaggregated --replicas=2 ``` After running this you will observe ```bash @@ -219,4 +219,4 @@ To teardown the example delete the `single-node-disaggregated` PodCliqueSet, the kubectl delete pcs single-node-disaggregated ``` -In the [next guide](./pcsg_intro.md) we showcase how to use PodCliqueScalingGroup to represent multi-node components +In the [next guide](./03_pcsg_intro.md) we showcase how to use PodCliqueScalingGroup to represent multi-node components diff --git a/docs/user-guide/core-concepts/pcsg_intro.md b/docs/user-guide/01_core-concepts/03_pcsg_intro.md similarity index 93% rename from docs/user-guide/core-concepts/pcsg_intro.md rename to docs/user-guide/01_core-concepts/03_pcsg_intro.md index df73e471a..ccbfc305b 100644 --- a/docs/user-guide/core-concepts/pcsg_intro.md +++ b/docs/user-guide/01_core-concepts/03_pcsg_intro.md @@ -1,8 +1,8 @@ # PodCliqueScalingGroup -In the [previous guide](./pcs_and_pclq_intro.md) we covered some hands on examples on how to use PodCliqueSet and PodCliques. In this guide we go over some hands-on examples on how to use PodCliqueScalingGroup to represent multinode components. +In the [previous guide](./02_pcs_and_pclq_intro.md) we covered some hands on examples on how to use PodCliqueSet and PodCliques. In this guide we go over some hands-on examples on how to use PodCliqueScalingGroup to represent multinode components. -Refer to [Overview](./overview.md) for instructions on how to run the examples in this guide. +Refer to [Overview](./01_overview.md) for instructions on how to run the examples in this guide. ## Example 3: Multi-Node Aggregated Inference @@ -36,11 +36,11 @@ spec: - name: model-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Model Leader (Aggregated) on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Model Leader (Aggregated) on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" - name: worker spec: roleName: worker @@ -55,11 +55,11 @@ spec: - name: model-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep infinity"] resources: requests: - cpu: "4" - memory: "8Gi" + cpu: "10m" + memory: "32Mi" podCliqueScalingGroups: - name: model-instance cliqueNames: [leader, worker] @@ -74,11 +74,11 @@ spec: ### **Deploy:** -In this example, we will deploy the file: [multi-node-aggregated.yaml](../../../operator/samples/user-guide/concept-overview/multi-node-aggregated.yaml) +In this example, we will deploy the file: [multi-node-aggregated.yaml](../../../operator/samples/user-guide/01_core-concepts/multi-node-aggregated.yaml) ```bash # NOTE: Run the following commands from the `/path/to/grove/operator` directory, # where `/path/to/grove` is the root of your cloned Grove repository. -kubectl apply -f samples/user-guide/concept-overview/multi-node-aggregated.yaml +kubectl apply -f samples/user-guide/01_core-concepts/multi-node-aggregated.yaml kubectl get pods -l app.kubernetes.io/part-of=multinode-aggregated -o wide ``` @@ -207,11 +207,11 @@ spec: - name: prefill-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" - name: pworker spec: roleName: pworker @@ -226,11 +226,11 @@ spec: - name: prefill-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "4" - memory: "8Gi" + cpu: "10m" + memory: "32Mi" - name: dleader spec: roleName: dleader @@ -245,11 +245,11 @@ spec: - name: decode-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep infinity"] resources: requests: - cpu: "1" - memory: "2Gi" + cpu: "10m" + memory: "32Mi" - name: dworker spec: roleName: dworker @@ -264,11 +264,11 @@ spec: - name: decode-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" podCliqueScalingGroups: - name: prefill cliqueNames: [pleader, pworker] @@ -288,11 +288,11 @@ spec: ### **Deploy** -In this example, we will deploy the file: [multi-node-disaggregated.yaml](../../../operator/samples/user-guide/concept-overview/multi-node-disaggregated.yaml) +In this example, we will deploy the file: [multi-node-disaggregated.yaml](../../../operator/samples/user-guide/01_core-concepts/multi-node-disaggregated.yaml) ```bash # NOTE: Run the following commands from the `/path/to/grove/operator` directory, # where `/path/to/grove` is the root of your cloned Grove repository. -kubectl apply -f samples/user-guide/concept-overview/multi-node-disaggregated.yaml +kubectl apply -f samples/user-guide/01_core-concepts/multi-node-disaggregated.yaml kubectl get pods -l app.kubernetes.io/part-of=multinode-disaggregated -o wide ``` @@ -325,4 +325,4 @@ To teardown the example delete the `multinode-disaggregated` PodCliqueSet, the o ```bash kubectl delete pcs multinode-disaggregated ``` -In the [next guide](./takeaways.md) we showcase how Grove can represent an arbitrary number of components and summarize the key takeaways. +In the [next guide](./04_takeaways.md) we showcase how Grove can represent an arbitrary number of components and summarize the key takeaways. diff --git a/docs/user-guide/core-concepts/takeaways.md b/docs/user-guide/01_core-concepts/04_takeaways.md similarity index 84% rename from docs/user-guide/core-concepts/takeaways.md rename to docs/user-guide/01_core-concepts/04_takeaways.md index 369aa3df4..f5f1f34eb 100644 --- a/docs/user-guide/core-concepts/takeaways.md +++ b/docs/user-guide/01_core-concepts/04_takeaways.md @@ -1,10 +1,10 @@ # Takeaways -Refer to [Overview](./overview.md) for instructions on how to run the examples in this guide. +Refer to [Overview](./01_overview.md) for instructions on how to run the examples in this guide. ## Example 5: Complete Inference Pipeline -The [previous examples](./pcsg_intro.md) have focused on mapping various inference workloads into Grove primitives, focusing on the model instances. However, the primitives are generic and the point of Grove is to allow the user to represent as many components as they'd like. To illustrate this point we now provide an example where we represent additional components such as a frontend and vision encoder. To add additional components you simply add additional PodCliques and PodCliqueScalingGroups into the PodCliqueSet +The [previous examples](./03_pcsg_intro.md) have focused on mapping various inference workloads into Grove primitives, focusing on the model instances. However, the primitives are generic and the point of Grove is to allow the user to represent as many components as they'd like. To illustrate this point we now provide an example where we represent additional components such as a frontend and vision encoder. To add additional components you simply add additional PodCliques and PodCliqueScalingGroups into the PodCliqueSet ```yaml apiVersion: grove.io/v1alpha1 @@ -31,11 +31,11 @@ spec: - name: frontend image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Frontend Service on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Frontend Service on node:' && hostname && sleep infinity"] resources: requests: - cpu: "0.5" - memory: "1Gi" + cpu: "10m" + memory: "32Mi" - name: vision-encoder spec: roleName: vision-encoder @@ -50,11 +50,11 @@ spec: - name: vision-encoder image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Vision Encoder on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Vision Encoder on node:' && hostname && sleep infinity"] resources: requests: - cpu: "3" - memory: "6Gi" + cpu: "10m" + memory: "32Mi" # Multi-node components - name: pleader spec: @@ -70,11 +70,11 @@ spec: - name: prefill-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" - name: pworker spec: roleName: pworker @@ -89,11 +89,11 @@ spec: - name: prefill-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "4" - memory: "8Gi" + cpu: "10m" + memory: "32Mi" - name: dleader spec: roleName: dleader @@ -108,11 +108,11 @@ spec: - name: decode-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep infinity"] resources: requests: - cpu: "1" - memory: "2Gi" + cpu: "10m" + memory: "32Mi" - name: dworker spec: roleName: dworker @@ -127,11 +127,11 @@ spec: - name: decode-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" podCliqueScalingGroups: - name: prefill cliqueNames: [pleader, pworker] @@ -149,11 +149,11 @@ spec: **Deploy and explore:** -In this example, we will deploy the file: [complete-inference-pipeline.yaml](../../../operator/samples/user-guide/concept-overview/complete-inference-pipeline.yaml) +In this example, we will deploy the file: [complete-inference-pipeline.yaml](../../../operator/samples/user-guide/01_core-concepts/complete-inference-pipeline.yaml) ```bash # NOTE: Run the following commands from the `/path/to/grove/operator` directory, # where `/path/to/grove` is the root of your cloned Grove repository. -kubectl apply -f samples/user-guide/concept-overview/complete-inference-pipeline.yaml +kubectl apply -f samples/user-guide/01_core-concepts/complete-inference-pipeline.yaml kubectl get pods -l app.kubernetes.io/part-of=comp-inf-ppln -o wide ``` diff --git a/docs/user-guide/02_pod-and-resource-naming-conventions/01_overview.md b/docs/user-guide/02_pod-and-resource-naming-conventions/01_overview.md new file mode 100644 index 000000000..08330e55c --- /dev/null +++ b/docs/user-guide/02_pod-and-resource-naming-conventions/01_overview.md @@ -0,0 +1,17 @@ +# Pod and Resource Naming Conventions + +This section explains Grove's hierarchical naming scheme for pods and resources. Grove's naming convention is designed to be **self-documenting**: when you run `kubectl get pods`, the pod names immediately tell you which PodCliqueSet, PodCliqueScalingGroup (if applicable), and PodClique each pod belongs to. + +## Prerequisites + +Before starting this section: +- Review the [core concepts tutorial](../01_core-concepts/01_overview.md) to understand Grove's primitives +- Set up a cluster following the [installation guide](../../installation.md), the two options are: + - [A local KIND demo cluster](../../installation.md#local-kind-cluster-set-up): Create the cluster with `make kind-up FAKE_NODES=40`, set `KUBECONFIG` env variable as directed, and run `make deploy` + - [A remote Kubernetes cluster](../../installation.md#remote-cluster-set-up) with [Grove installed from package](../../installation.md#install-grove-from-package) + +## Guides in This Section + +1. **[Naming Conventions](./02_naming-conventions.md)**: Learn the naming patterns, best practices, and how to plan names for your resources. + +2. **[Hands-On Example](./03_hands-on-example.md)**: Deploy an example system with the structure of a multi-node disaggregated inference system and observe the naming hierarchy in action. diff --git a/docs/user-guide/02_pod-and-resource-naming-conventions/02_naming-conventions.md b/docs/user-guide/02_pod-and-resource-naming-conventions/02_naming-conventions.md new file mode 100644 index 000000000..ac89a7d7a --- /dev/null +++ b/docs/user-guide/02_pod-and-resource-naming-conventions/02_naming-conventions.md @@ -0,0 +1,180 @@ +# Naming Conventions + +This guide explains Grove's hierarchical pod and resource naming scheme and best practices for naming your resources. + +## Why Hierarchical Naming Matters + +Grove's naming scheme serves two critical purposes: + +1. **Immediate Visual Understanding**: Pod names encode the complete hierarchy, so `kubectl get pods` output is self-explanatory. You can instantly see which pods belong together and how they're organized. + +2. **Programmatic Pod Discovery**: The hierarchical structure enables pods to discover and communicate with each other using fully qualified domain names (FQDNs). The [Environment Variables guide](../03_environment-variables-for-pod-discovery/01_overview.md) demonstrates how to programmatically construct these FQDNs using Grove's injected environment variables. + +## Pod Naming Patterns + +### Standalone PodCliques + +For PodCliques that are **not** part of a PodCliqueScalingGroup, the pod naming follows this pattern: + +``` +--- +``` + +**Components:** +- ``: The name of the PodCliqueSet +- ``: The replica index of the PodCliqueSet (0-based) +- ``: The name of the PodClique template defined in the PodCliqueSet spec +- ``: A random 5-character suffix generated by Kubernetes + +**Example:** `multinode-disaggregated-0-frontend-a7b3c` + +Looking at this name, you can immediately tell: +- It belongs to the `multinode-disaggregated` PodCliqueSet +- It's part of PodCliqueSet replica 0 +- It's from the `frontend` PodClique + +### PodCliques in a PodCliqueScalingGroup + +For PodCliques that **are** part of a PodCliqueScalingGroup, the pod naming includes the PCSG information: + +``` +----- +``` + +**Components:** +- ``: The name of the PodCliqueSet +- ``: The replica index of the PodCliqueSet (0-based) +- ``: The name of the PodCliqueScalingGroup template +- ``: The replica index of the PodCliqueScalingGroup (0-based) +- ``: The name of the PodClique template within the PCSG +- ``: A random 5-character suffix generated by Kubernetes + +**Example:** `multinode-disaggregated-0-prefill-1-pworker-m9n0o` + +Looking at this name, you can immediately tell: +- It belongs to the `multinode-disaggregated` PodCliqueSet (replica 0) +- It's part of the `prefill` PodCliqueScalingGroup (replica 1) +- It's from the `pworker` PodClique (prefill worker) + +## Naming Best Practices + +### Kubernetes Name Length Limit + +Kubernetes has a **63-character limit** for resource names. Since Grove constructs full pod names by combining multiple components, you need to be mindful of name lengths when choosing names for your resources. + +**How Grove constructs names:** + +For standalone PodCliques, the final pod name is: +``` +---<5-char-suffix> +``` + +For PodCliques in a PCSG, the final pod name is: +``` +-----<5-char-suffix> +``` + +**Character budget breakdown:** +- `<5-char-suffix>`: 5 characters (fixed by Kubernetes) +- `-` separators: 3-5 characters depending on structure +- Replica indices: 1+ characters each (single digit for 0-9, two digits for 10-99, etc.) +- Your chosen names: Remaining characters + +### Naming Guidelines + +1. **Use Short, Descriptive Names**: Choose concise but meaningful names + - ✅ Good: `frontend`, `api`, `db`, `cache` + - ❌ Avoid: `frontend-service-component`, `api-gateway-server` (too long) + - ❌ Avoid: `f`, `a`, `d`, `c` (too cryptic) + +2. **Use Abbreviations for Multi-Component Systems**: When you have multiple PodCliqueScalingGroups with similar roles, use prefixes or abbreviations + - ✅ Good: `pleader`, `pworker` (prefill), `dleader`, `dworker` (decode) + - ❌ Avoid: `prefill-leader`, `prefill-worker`, `decode-leader`, `decode-worker` + +3. **Keep PodCliqueSet Names Short**: Remember that the PCS name is included in every pod name + - ✅ Good: `ml-inference`, `web-app`, `data-pipeline` + - ❌ Avoid: `machine-learning-inference-service`, `web-application-stack` + +4. **Plan for Scaling**: Consider whether you'll need double-digit replica indices (adds 1 character per additional digit) + - If you plan to scale to 10+ or 100+ or 1000+ replicas, budget accordingly + +5. **Unique PodClique Names Within a PodCliqueSet**: All PodClique names must be unique within a PodCliqueSet. We explain the rationale for this further in the [Hands-On Example](./03_hands-on-example.md#why-unique-podclique-names-matter). + - If you have leader/worker patterns in multiple PCSGs, you **must** use different names (e.g., `pleader`/`pworker` and `dleader`/`dworker`) + +### Example: Planning Names for a Complex System + +Let's plan names for a multi-node disaggregated inference system with a frontend: + +**Requirements:** +- 1 standalone frontend component +- 2 multi-node components: prefill and decode +- Each multi-node component has leader/worker roles +- All PodClique names must be unique within the PodCliqueSet +- Names should be short to allow for scaling headroom while remaining descriptive + +**Name choices:** +- PodCliqueSet: `mn-disagg` (short, 9 chars) +- Standalone PodClique: `frontend` (8 chars) +- PCSG for prefill: `prefill` (7 chars) + - Leader PodClique: `pleader` (7 chars) + - Worker PodClique: `pworker` (7 chars) +- PCSG for decode: `decode` (6 chars) + - Leader PodClique: `dleader` (7 chars) + - Worker PodClique: `dworker` (7 chars) + +**Resulting pod names:** +- Frontend: `mn-disagg-0-frontend-a7b3c` (26 chars) ✅ +- Prefill leader: `mn-disagg-0-prefill-0-pleader-a7b3c` (35 chars) ✅ +- Prefill worker: `mn-disagg-0-prefill-0-pworker-a7b3c` (35 chars) ✅ +- Decode leader: `mn-disagg-0-decode-0-dleader-a7b3c` (34 chars) ✅ +- Decode worker: `mn-disagg-0-decode-0-dworker-a7b3c` (34 chars) ✅ + +**Scaling headroom:** The longest name (`mn-disagg-0-prefill-0-pworker-a7b3c`) is 35 characters, leaving 28 characters of headroom. Each additional digit in a replica index adds 1 character: +- 2-digit indices for PCS and PCSG (10-99): 37 chars → scales to 99 PCS replicas × 99 PCSG replicas ✅ +- 3-digit indices for PCS and PCSG (100-999): 39 chars → scales to 999 × 999 replicas ✅ +- 7-digit indices for PCS and PCSG: 47 chars → scales to millions of replicas ✅ + +With these name choices, you could scale to millions of replicas on both dimensions without hitting the limit. All names are well under the 63-character limit with room for scaling growth while remaining descriptive! + +To deploy a PodCliqueSet with this structure and explore the naming hierarchy through `kubectl`, continue to the [Hands-On Example](./03_hands-on-example.md). + +--- + +## Resource Naming Reference + +### Grove Resources and Their Naming + +| Resource | You Name | Grove Generates | Pattern | +|----------|----------|-----------------|---------| +| **PodCliqueSet** | ✅ | - | `` | +| **PodClique (template)** | ✅ | - | `` (in spec.template.cliques) | +| **PCSG (template)** | ✅ | - | `` (in spec.template.podCliqueScalingGroups) | +| **PodClique (resource, standalone)** | - | ✅ | `--` | +| **PodClique (resource, in PCSG)** | - | ✅ | `----` | +| **PCSG (resource)** | - | ✅ | `--` | +| **Pod (standalone)** | - | ✅ | `---` | +| **Pod (in PCSG)** | - | ✅ | `-----` | + +**You control:** PodCliqueSet name, PodClique template names, PCSG template names +**Grove generates:** All resource instances with hierarchical naming + +## Key Takeaways + +1. **Self-Documenting Hierarchy**: Pod names encode the complete hierarchy from PodCliqueSet → PCSG (if applicable) → PodClique → Pod, making `kubectl get pods` output immediately understandable. + +2. **63-Character Limit**: Kubernetes enforces a 63-character limit on resource names. Use short, meaningful names for your resources, especially PodCliqueSet and PCSG names which appear in every generated name. + +3. **Unique PodClique Names**: All PodClique names must be unique within a PodCliqueSet. When you have similar roles in multiple PCSGs (e.g., leader/worker in both prefill and decode), use prefixes or abbreviations (e.g., `pleader`/`pworker` and `dleader`/`dworker`). + +4. **Predictable Patterns**: The naming scheme is consistent whether you're using standalone PodCliques or PodCliqueScalingGroups, making it easy to understand your system at a glance. + +5. **Planning is Key**: Before creating resources, plan your names considering the full hierarchy and potential scaling needs. + +## Next Steps + +Now that you understand Grove's naming scheme and best practices: + +- **See it in action**: Continue to the [Hands-On Example](./03_hands-on-example.md) to deploy an example system and observe the naming hierarchy firsthand. + +- **Learn programmatic discovery**: Head to the [Environment Variables guide](../03_environment-variables-for-pod-discovery/01_overview.md) to learn how to use these names programmatically for pod discovery, including how Grove injects environment variables and how to construct FQDNs for pod-to-pod communication. + diff --git a/docs/user-guide/02_pod-and-resource-naming-conventions/03_hands-on-example.md b/docs/user-guide/02_pod-and-resource-naming-conventions/03_hands-on-example.md new file mode 100644 index 000000000..344c51736 --- /dev/null +++ b/docs/user-guide/02_pod-and-resource-naming-conventions/03_hands-on-example.md @@ -0,0 +1,284 @@ +# Hands-On Example: Multi-Node Disaggregated Inference + +This guide walks through deploying a realistic example that demonstrates both naming patterns and the requirement for unique PodClique names. Make sure you've read the [Naming Conventions](./02_naming-conventions.md) guide first to understand the patterns we'll see in action. + +## The Example System + +Let's deploy an example with the structure of a multi-node disaggregated inference system: +- A standalone frontend component +- A prefill PodCliqueScalingGroup with leader and workers +- A decode PodCliqueScalingGroup with leader and workers + +```yaml +apiVersion: grove.io/v1alpha1 +kind: PodCliqueSet +metadata: + name: multinode-disaggregated + namespace: default +spec: + replicas: 1 + template: + cliques: + # Standalone PodClique + - name: frontend + spec: + replicas: 2 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: frontend + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Frontend' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + # Prefill PodCliqueScalingGroup PodCliques + - name: pleader + spec: + roleName: pleader + replicas: 1 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: pleader + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Prefill Leader' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + - name: pworker + spec: + roleName: pworker + replicas: 3 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: pworker + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Prefill Worker' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + # Decode PodCliqueScalingGroup PodCliques + - name: dleader + spec: + roleName: dleader + replicas: 1 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: dleader + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Decode Leader' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + - name: dworker + spec: + roleName: dworker + replicas: 2 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: dworker + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Decode Worker' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + podCliqueScalingGroups: + - name: prefill + cliqueNames: [pleader, pworker] + replicas: 2 + - name: decode + cliqueNames: [dleader, dworker] + replicas: 1 +``` + +## Why Unique PodClique Names Matter + +Notice in the YAML above: +- Prefill PCSG uses `pleader` and `pworker` (not just `leader` and `worker`) +- Decode PCSG uses `dleader` and `dworker` (not just `leader` and `worker`) + +**Why?** + +PodClique names **must be unique within a PodCliqueSet**. Reusing generic names like `leader` and `worker` across multiple PodCliqueScalingGroups is therefore **not allowed**. + +More importantly, this reflects **Grove's core philosophy**. + +In Grove, a `PodCliqueSet` is how you describe the **components of your system**. Each `PodClique` represents one component with a clearly defined, globally meaningful role. PodClique names are meant to be stable identifiers for those roles (e.g. prefill leader, decode worker), not local labels that change depending on how components are grouped or scaled. + +When multiple components need to be co-scheduled, scaled together, and function as a single logical unit (e.g. a *super-pod* for prefill), they are placed into a `PodCliqueScalingGroup`. The scaling group defines *how components run together*, but it does not redefine or merge their identities. + +That said, Grove also aims to keep names concise. Rather than naming a `PodClique` `prefill-leader`, we use short prefixes such as `p` (prefill) and `d` (decode) to preserve uniqueness while keeping names short and readable. The `PodCliqueScalingGroup` name already conveys which logical unit a PodClique belongs to, allowing the PodClique name itself to remain compact without losing clarity. + +This is why `pleader`/`pworker` and `dleader`/`dworker` are the intended and recommended pattern. + + +## Deploy and Observe + +In this example, we will deploy the file: [multinode-disaggregated-with-frontend.yaml](../../../operator/samples/user-guide/02_pod-and-resource-naming-conventions/multinode-disaggregated-with-frontend.yaml) + +```bash +# NOTE: Run the following commands from the `/path/to/grove/operator` directory, +# where `/path/to/grove` is the root of your cloned Grove repository. +kubectl apply -f samples/user-guide/02_pod-and-resource-naming-conventions/multinode-disaggregated-with-frontend.yaml + +# Get all pods - observe the self-documenting names +kubectl get pods -l app.kubernetes.io/part-of=multinode-disaggregated -o wide +``` + +You should see output like: +``` +NAME READY STATUS RESTARTS AGE +multinode-disaggregated-0-decode-0-dleader-abc12 1/1 Running 0 45s +multinode-disaggregated-0-decode-0-dworker-def34 1/1 Running 0 45s +multinode-disaggregated-0-decode-0-dworker-ghi56 1/1 Running 0 45s +multinode-disaggregated-0-frontend-jkl78 1/1 Running 0 45s +multinode-disaggregated-0-frontend-mno90 1/1 Running 0 45s +multinode-disaggregated-0-prefill-0-pleader-pqr12 1/1 Running 0 45s +multinode-disaggregated-0-prefill-0-pworker-stu34 1/1 Running 0 45s +multinode-disaggregated-0-prefill-0-pworker-vwx56 1/1 Running 0 45s +multinode-disaggregated-0-prefill-0-pworker-yza78 1/1 Running 0 45s +multinode-disaggregated-0-prefill-1-pleader-bcd90 1/1 Running 0 45s +multinode-disaggregated-0-prefill-1-pworker-efg12 1/1 Running 0 45s +multinode-disaggregated-0-prefill-1-pworker-hij34 1/1 Running 0 45s +multinode-disaggregated-0-prefill-1-pworker-klm56 1/1 Running 0 45s +``` + +## Parsing the Naming Hierarchy + +Looking at this output, you can immediately understand the system structure: + +**1. Standalone PodClique (frontend):** +``` +multinode-disaggregated-0-frontend-* +``` +- Simpler naming: `---` +- 2 frontend pods serving requests + +**2. PodCliqueScalingGroup (prefill) - 2 replicas:** +``` +multinode-disaggregated-0-prefill-0-* +multinode-disaggregated-0-prefill-1-* +``` +- Deeper hierarchy: `-----` +- Each replica has 1 `pleader` + 3 `pworker` pods +- Two independent prefill clusters + +**3. PodCliqueScalingGroup (decode) - 1 replica:** +``` +multinode-disaggregated-0-decode-0-* +``` +- Same deep hierarchy as prefill +- Has 1 `dleader` + 2 `dworker` pods +- One decode cluster + +**4. Clear role identification through naming:** +- `frontend` = frontend component +- `pleader` = prefill leader +- `pworker` = prefill worker +- `dleader` = decode leader +- `dworker` = decode worker + +## Examining Resources + +Let's look at the underlying Grove resources: + +```bash +# List PodCliques +kubectl get pclq +``` + +Output: +``` +NAME AGE +multinode-disaggregated-0-decode-0-dleader 2m +multinode-disaggregated-0-decode-0-dworker 2m +multinode-disaggregated-0-frontend 2m +multinode-disaggregated-0-prefill-0-pleader 2m +multinode-disaggregated-0-prefill-0-pworker 2m +multinode-disaggregated-0-prefill-1-pleader 2m +multinode-disaggregated-0-prefill-1-pworker 2m +``` + +**Observations:** +- Standalone PodClique: `multinode-disaggregated-0-frontend` +- Prefill PCSG PodCliques: Names include `prefill-0` or `prefill-1` to show which replica +- Decode PCSG PodCliques: Names include `decode-0` +- All names are unique and under 63 characters + +```bash +# List PodCliqueScalingGroups +kubectl get pcsg +``` + +Output: +``` +NAME AGE +multinode-disaggregated-0-decode 2m +multinode-disaggregated-0-prefill 2m +``` + +The PCSG names clearly identify the two scaling groups. + +## Name Length Analysis + +Let's verify our names fit within the 63-character limit: + +- Longest pod name: `multinode-disaggregated-0-prefill-1-pworker-klm56` + - Characters: 49 (well under 63) ✅ + +- PodClique names (max): `multinode-disaggregated-0-prefill-1-pworker` + - Characters: 43 (under 63) ✅ + +- PCSG names: `multinode-disaggregated-0-prefill` + - Characters: 33 (under 63) ✅ + +All resource names are comfortably within the limit! + +## Cleanup + +```bash +kubectl delete pcs multinode-disaggregated +``` + +--- + +## Next Steps + +Now that you've seen the naming conventions in action, check out: +- The [Key Takeaways](./02_naming-conventions.md#key-takeaways) section for a summary of naming best practices +- The [Environment Variables guide](../03_environment-variables-for-pod-discovery/01_overview.md) to learn how to use these names programmatically for pod discovery + diff --git a/docs/user-guide/03_environment-variables-for-pod-discovery/01_overview.md b/docs/user-guide/03_environment-variables-for-pod-discovery/01_overview.md new file mode 100644 index 000000000..ed0a84eed --- /dev/null +++ b/docs/user-guide/03_environment-variables-for-pod-discovery/01_overview.md @@ -0,0 +1,22 @@ +# Environment Variables for Pod Discovery + +This section explains the environment variables that Grove automatically injects into your pods and shows how to use them for pod discovery, coordination, and configuration in distributed systems. + +## Prerequisites + +Before starting this section: +- Review the [core concepts tutorial](../01_core-concepts/01_overview.md) to understand Grove's primitives +- Read the [Pod Naming guide](../02_pod-and-resource-naming-conventions/01_overview.md) to understand Grove's naming conventions +- Set up a cluster following the [installation guide](../../installation.md), the two options are: + - [A local KIND demo cluster](../../installation.md#local-kind-cluster-set-up): Create the cluster with `make kind-up FAKE_NODES=40`, set `KUBECONFIG` env variable as directed, and run `make deploy` + - [A remote Kubernetes cluster](../../installation.md#remote-cluster-set-up) with [Grove installed from package](../../installation.md#install-grove-from-package) + +> **Note:** The examples in this section require at least one real node to run actual containers and inspect environment variables in pod logs. The KIND cluster created with `make kind-up FAKE_NODES=40` includes one real control-plane node alongside the fake nodes, which is sufficient for these examples. + +## Guides in This Section + +1. **[Environment Variables Reference](./02_env_var_reference.md)**: Understand the distinction between pod names and hostnames, and see the complete reference of all injected Grove environment variables. + +2. **[Hands-On Examples](./03_hands-on-examples.md)**: Deploy example PodCliqueSets and use environment variables to construct FQDNs and discover other pods. **We strongly recommend working through these examples**—they demonstrate the practical techniques you'll need to implement pod discovery in your own applications. + +3. **[Common Patterns and Takeaways](./04_common-patterns-and-takeaways.md)**: Learn practical patterns for using environment variables in your applications. diff --git a/docs/user-guide/03_environment-variables-for-pod-discovery/02_env_var_reference.md b/docs/user-guide/03_environment-variables-for-pod-discovery/02_env_var_reference.md new file mode 100644 index 000000000..8a631ed80 --- /dev/null +++ b/docs/user-guide/03_environment-variables-for-pod-discovery/02_env_var_reference.md @@ -0,0 +1,103 @@ +# Environment Variables Reference + +This guide covers the environment variables that Grove automatically injects into your pods. + +## Overview + +Grove automatically injects environment variables into every container and init container in your pods. These environment variables provide runtime information that your application can use for: +- **Pod discovery**: Finding other pods in your system +- **Coordination**: Understanding your role in a distributed system +- **Configuration**: Self-configuring based on your position in the hierarchy + +However, before we get to the environment variables it is important to first make a distinction between a Pod's Name and its Hostname. + +## Understanding Pod Names vs. Hostnames in Kubernetes + +A common source of confusion is the difference between a pod's **name** and its **hostname**. Understanding this distinction is essential for using Grove's environment variables correctly. + +### Pod Name (Kubernetes Resource Identifier) + +The **pod name** is the unique identifier for the Pod resource in Kubernetes (stored in `metadata.name`). When Grove creates pods, it uses Kubernetes' `generateName` feature, which appends a random 5-character suffix to ensure uniqueness: + +``` +- +Example: env-demo-standalone-0-frontend-abc12 +``` + +This name is what you see when running `kubectl get pods`. However, you **cannot use this name for DNS-based pod discovery** because the random suffix is unpredictable. + +### Hostname (DNS-Resolvable Identity) + +The **hostname** is a separate field (`spec.hostname`) that Grove explicitly sets on each pod. Unlike the pod name, the hostname follows a **deterministic pattern**: + +``` +- +Example: env-demo-standalone-0-frontend-0 +``` + +Grove automatically creates a headless service for each PodCliqueSet replica, so you don't need to create one yourself. Grove also sets the pod's **subdomain** (`spec.subdomain`) to match the headless service name. In Kubernetes, when a pod has both `hostname` and `subdomain` set, and a matching headless service exists, the pod becomes DNS-resolvable at: + +``` +...svc.cluster.local +``` + +For example: +``` +env-demo-standalone-0-frontend-0.env-demo-standalone-0.default.svc.cluster.local +└──────── hostname ────────────┘ └──── subdomain ────┘ └─────────────────────┘ + namespace + cluster suffix +``` + +### Why This Matters + +| Attribute | Pod Name | Hostname | +|-----------|----------|----------| +| Source | `metadata.name` | `spec.hostname` | +| Pattern | `-` | `-` | +| Predictable? | ❌ No (random suffix) | ✅ Yes (index-based) | +| DNS resolvable? | ❌ No | ✅ Yes (with headless service) | +| Use case | `kubectl` commands, logs | Pod discovery, pod-to-pod communication | + +**The environment variables Grove provides give you the building blocks to construct the hostname-based FQDN, not the pod name.** This is why pod discovery in Grove is deterministic and doesn't require knowledge of random suffixes. + +## Environment Variables Reference + +### Available in All Pods + +These environment variables are injected into every pod managed by Grove: + +| Environment Variable | Description | Example Value | +|---------------------|-------------|---------------| +| `GROVE_PCS_NAME` | Name of the PodCliqueSet (as specified in metadata.name) | `my-service` | +| `GROVE_PCS_INDEX` | Replica index of the PodCliqueSet (0-based) | `0` | +| `GROVE_PCLQ_NAME` | Fully qualified PodClique resource name (see structure below) | `my-service-0-frontend` | +| `GROVE_HEADLESS_SERVICE` | FQDN of the headless service for the PodCliqueSet replica | `my-service-0.default.svc.cluster.local` | +| `GROVE_PCLQ_POD_INDEX` | Index of this pod within its PodClique (0-based) | `2` | + +**Understanding `GROVE_PCLQ_NAME`:** +- For **standalone PodCliques**: `--` + - Example: `my-service-0-frontend` +- For **PodCliques in a PCSG**: `----` + - Example: `my-service-0-model-instance-0-leader` + +### Additional Variables for PodCliqueScalingGroup Pods + +If a pod belongs to a PodClique that is part of a PodCliqueScalingGroup, these additional environment variables are available: + +| Environment Variable | Description | Example Value | +|---------------------|-------------|---------------| +| `GROVE_PCSG_NAME` | Fully qualified PCSG resource name (see structure below) | `my-service-0-model-instance` | +| `GROVE_PCSG_INDEX` | Replica index of the PodCliqueScalingGroup (0-based) | `1` | +| `GROVE_PCSG_TEMPLATE_NUM_PODS` | Total number of pods in the PCSG template | `4` | + +**Understanding `GROVE_PCSG_NAME`:** +- Structure: `--` +- Example: `my-service-0-model-instance` +- **Note:** This does NOT include the PCSG replica index. To construct a sibling PodClique name within the same PCSG replica, use: `$GROVE_PCSG_NAME-$GROVE_PCSG_INDEX-` + +**Note:** `GROVE_PCSG_TEMPLATE_NUM_PODS` represents the total number of pods defined in the PodCliqueScalingGroup template, calculated as the sum of replicas across all PodCliques in the PCSG. For example, if a PCSG has 1 leader replica and 3 worker replicas, this value would be 4. This value does not change based on scaling, so is only guaranteed to be accurate at startup. + +## Next Steps + +Continue to the [Hands-On Examples](./03_hands-on-examples.md) to deploy example PodCliqueSets and use environment variables to construct FQDNs and discover other pods. We strongly recommend working through these examples as they demonstrate the practical techniques you'll need to implement pod discovery in your own applications. + diff --git a/docs/user-guide/03_environment-variables-for-pod-discovery/03_hands-on-examples.md b/docs/user-guide/03_environment-variables-for-pod-discovery/03_hands-on-examples.md new file mode 100644 index 000000000..6e1f34589 --- /dev/null +++ b/docs/user-guide/03_environment-variables-for-pod-discovery/03_hands-on-examples.md @@ -0,0 +1,320 @@ +# Hands-On Examples + +This guide walks through deploying example PodCliqueSets, observing the environment variables that Grove injects, and using them to construct FQDNs for pod discovery. Make sure you've read the [Environment Variables Reference](./02_env_var_reference.md) guide first to understand the environment variables we'll see in action. + +## Example 1: Standalone PodClique Environment Variables + +Let's deploy a simple PodCliqueSet with a standalone PodClique and inspect the environment variables. + +```yaml +apiVersion: grove.io/v1alpha1 +kind: PodCliqueSet +metadata: + name: env-demo-standalone + namespace: default +spec: + replicas: 1 + template: + cliques: + - name: frontend + spec: + replicas: 2 + podSpec: + containers: + - name: app + image: busybox:latest + command: ["/bin/sh"] + args: + - "-c" + - | + echo "=== Grove Environment Variables ===" + echo "GROVE_PCS_NAME=$GROVE_PCS_NAME" + echo "GROVE_PCS_INDEX=$GROVE_PCS_INDEX" + echo "GROVE_PCLQ_NAME=$GROVE_PCLQ_NAME" + echo "GROVE_HEADLESS_SERVICE=$GROVE_HEADLESS_SERVICE" + echo "GROVE_PCLQ_POD_INDEX=$GROVE_PCLQ_POD_INDEX" + echo "" + echo "=== Pod Name vs Hostname ===" + echo "Pod Name (random suffix): $POD_NAME" + echo "Hostname (deterministic): $GROVE_PCLQ_NAME-$GROVE_PCLQ_POD_INDEX" + echo "" + echo "My FQDN: $GROVE_PCLQ_NAME-$GROVE_PCLQ_POD_INDEX.$GROVE_HEADLESS_SERVICE" + echo "" + echo "Sleeping..." + sleep infinity + env: + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + resources: + requests: + cpu: "10m" + memory: "32Mi" +``` + +### Deploy and Inspect + +In this example, we will deploy the file: [standalone-env-vars.yaml](../../../operator/samples/user-guide/03_environment-variables-for-pod-discovery/standalone-env-vars.yaml) + +```bash +# NOTE: Run the following commands from the `/path/to/grove/operator` directory, +# where `/path/to/grove` is the root of your cloned Grove repository. +kubectl apply -f samples/user-guide/03_environment-variables-for-pod-discovery/standalone-env-vars.yaml + +# Wait for pods to be ready (skip if you are manually verifying they are ready) +kubectl wait --for=condition=ready pod -l app.kubernetes.io/part-of=env-demo-standalone --timeout=60s + +# List the pods +kubectl get pods -l app.kubernetes.io/part-of=env-demo-standalone +``` + +You should see output similar to: +``` +NAME READY STATUS RESTARTS AGE +env-demo-standalone-0-frontend-abc12 1/1 Running 0 30s +env-demo-standalone-0-frontend-def34 1/1 Running 0 30s +``` + +Now, let's check the logs of one of the pods to see the environment variables: + +```bash +# Get the name of the first pod +POD_NAME=$(kubectl get pods -l app.kubernetes.io/part-of=env-demo-standalone -o jsonpath='{.items[0].metadata.name}') + +# View the logs +kubectl logs $POD_NAME +``` + +You should see output like: +``` +=== Grove Environment Variables === +GROVE_PCS_NAME=env-demo-standalone +GROVE_PCS_INDEX=0 +GROVE_PCLQ_NAME=env-demo-standalone-0-frontend +GROVE_HEADLESS_SERVICE=env-demo-standalone-0.default.svc.cluster.local +GROVE_PCLQ_POD_INDEX=0 + +=== Pod Name vs Hostname === +Pod Name (random suffix): env-demo-standalone-0-frontend-abc12 +Hostname (deterministic): env-demo-standalone-0-frontend-0 + +My FQDN: env-demo-standalone-0-frontend-0.env-demo-standalone-0.default.svc.cluster.local + +Sleeping... +``` + +**Key Observations:** +- The **pod name** (`env-demo-standalone-0-frontend-abc12`) has a random suffix—this is the Kubernetes resource identifier, not used for DNS +- The **hostname** (constructed as `$GROVE_PCLQ_NAME-$GROVE_PCLQ_POD_INDEX`) is deterministic—this is what you use for pod discovery +- `GROVE_PCLQ_NAME` contains the fully qualified PodClique name without the random suffix +- `GROVE_PCLQ_POD_INDEX` tells us this is the first pod (index 0) in the PodClique +- `GROVE_HEADLESS_SERVICE` provides the headless service domain, which you combine with the hostname to construct the pod's FQDN: `$GROVE_PCLQ_NAME-$GROVE_PCLQ_POD_INDEX.$GROVE_HEADLESS_SERVICE` + +### Cleanup + +```bash +kubectl delete pcs env-demo-standalone +``` + +--- + +## Example 2: PodCliqueScalingGroup with Leader-Worker Communication + +This example demonstrates a more complex scenario with a PodCliqueScalingGroup containing leader and worker pods. We'll show how workers can use environment variables to discover and connect to their leader. + +```yaml +apiVersion: grove.io/v1alpha1 +kind: PodCliqueSet +metadata: + name: env-demo-pcsg + namespace: default +spec: + replicas: 1 + template: + cliques: + - name: leader + spec: + roleName: leader + replicas: 1 + podSpec: + containers: + - name: leader + image: busybox:latest + command: ["/bin/sh"] + args: + - "-c" + - | + echo "=== Leader Pod ===" + echo "GROVE_PCS_NAME=$GROVE_PCS_NAME" + echo "GROVE_PCS_INDEX=$GROVE_PCS_INDEX" + echo "GROVE_PCLQ_NAME=$GROVE_PCLQ_NAME" + echo "GROVE_PCSG_NAME=$GROVE_PCSG_NAME" + echo "GROVE_PCSG_INDEX=$GROVE_PCSG_INDEX" + echo "GROVE_PCLQ_POD_INDEX=$GROVE_PCLQ_POD_INDEX" + echo "GROVE_PCSG_TEMPLATE_NUM_PODS=$GROVE_PCSG_TEMPLATE_NUM_PODS" + echo "" + echo "My FQDN: $GROVE_PCLQ_NAME-$GROVE_PCLQ_POD_INDEX.$GROVE_HEADLESS_SERVICE" + echo "" + echo "Listening for worker connections..." + sleep infinity + resources: + requests: + cpu: "10m" + memory: "32Mi" + - name: worker + spec: + roleName: worker + replicas: 3 + podSpec: + containers: + - name: worker + image: busybox:latest + command: ["/bin/sh"] + args: + - "-c" + - | + echo "=== Worker Pod ===" + echo "GROVE_PCS_NAME=$GROVE_PCS_NAME" + echo "GROVE_PCS_INDEX=$GROVE_PCS_INDEX" + echo "GROVE_PCLQ_NAME=$GROVE_PCLQ_NAME" + echo "GROVE_PCSG_NAME=$GROVE_PCSG_NAME" + echo "GROVE_PCSG_INDEX=$GROVE_PCSG_INDEX" + echo "GROVE_PCLQ_POD_INDEX=$GROVE_PCLQ_POD_INDEX" + echo "GROVE_PCSG_TEMPLATE_NUM_PODS=$GROVE_PCSG_TEMPLATE_NUM_PODS" + echo "" + echo "=== Constructing Leader Address ===" + # The leader PodClique name is: PCSG name + PCSG index + "-leader" + LEADER_PCLQ_NAME="$GROVE_PCSG_NAME-$GROVE_PCSG_INDEX-leader" + # Leader is always pod index 0 since there's only 1 leader replica + LEADER_POD_INDEX=0 + # Construct the leader's FQDN + LEADER_FQDN="$LEADER_PCLQ_NAME-$LEADER_POD_INDEX.$GROVE_HEADLESS_SERVICE" + echo "Connecting to leader at: $LEADER_FQDN" + echo "" + echo "Sleeping..." + sleep infinity + resources: + requests: + cpu: "10m" + memory: "32Mi" + podCliqueScalingGroups: + - name: model-instance + cliqueNames: [leader, worker] + replicas: 2 +``` + +### Deploy and Inspect + +In this example, we will deploy the file: [pcsg-env-vars.yaml](../../../operator/samples/user-guide/03_environment-variables-for-pod-discovery/pcsg-env-vars.yaml) + +```bash +# NOTE: Run the following commands from the `/path/to/grove/operator` directory, +# where `/path/to/grove` is the root of your cloned Grove repository. +kubectl apply -f samples/user-guide/03_environment-variables-for-pod-discovery/pcsg-env-vars.yaml + +# Wait for pods to be ready (skip if you are manually verifying they are ready) +kubectl wait --for=condition=ready pod -l app.kubernetes.io/part-of=env-demo-pcsg --timeout=60s + +# List all pods +kubectl get pods -l app.kubernetes.io/part-of=env-demo-pcsg -o wide +``` + +You should see 8 pods (2 PCSG replicas × (1 leader + 3 workers)): +``` +NAME READY STATUS RESTARTS AGE +env-demo-pcsg-0-model-instance-0-leader-abc12 1/1 Running 0 45s +env-demo-pcsg-0-model-instance-0-worker-def34 1/1 Running 0 45s +env-demo-pcsg-0-model-instance-0-worker-ghi56 1/1 Running 0 45s +env-demo-pcsg-0-model-instance-0-worker-jkl78 1/1 Running 0 45s +env-demo-pcsg-0-model-instance-1-leader-mno90 1/1 Running 0 45s +env-demo-pcsg-0-model-instance-1-worker-pqr12 1/1 Running 0 45s +env-demo-pcsg-0-model-instance-1-worker-stu34 1/1 Running 0 45s +env-demo-pcsg-0-model-instance-1-worker-vwx56 1/1 Running 0 45s +``` + +Let's inspect the leader logs from the first PCSG replica: + +```bash +# Get the leader pod name from the first PCSG replica (model-instance-0) +LEADER_POD=$(kubectl get pods -l app.kubernetes.io/part-of=env-demo-pcsg -o name | grep "model-instance-0-leader" | head -1) + +kubectl logs $LEADER_POD +``` + +You should see: +``` +=== Leader Pod === +GROVE_PCS_NAME=env-demo-pcsg +GROVE_PCS_INDEX=0 +GROVE_PCLQ_NAME=env-demo-pcsg-0-model-instance-0-leader +GROVE_PCSG_NAME=env-demo-pcsg-0-model-instance +GROVE_PCSG_INDEX=0 +GROVE_PCLQ_POD_INDEX=0 +GROVE_PCSG_TEMPLATE_NUM_PODS=4 + +My FQDN: env-demo-pcsg-0-model-instance-0-leader-0.env-demo-pcsg-0.default.svc.cluster.local + +Listening for worker connections... +``` + +Now let's check a worker pod from the same PCSG replica: + +```bash +# Get a worker pod name from the first PCSG replica (model-instance-0) +WORKER_POD=$(kubectl get pods -l app.kubernetes.io/part-of=env-demo-pcsg -o name | grep "model-instance-0-worker" | head -1) + +kubectl logs $WORKER_POD +``` + +You should see: +``` +=== Worker Pod === +GROVE_PCS_NAME=env-demo-pcsg +GROVE_PCS_INDEX=0 +GROVE_PCLQ_NAME=env-demo-pcsg-0-model-instance-0-worker +GROVE_PCSG_NAME=env-demo-pcsg-0-model-instance +GROVE_PCSG_INDEX=0 +GROVE_PCLQ_POD_INDEX=0 +GROVE_PCSG_TEMPLATE_NUM_PODS=4 + +=== Constructing Leader Address === +Connecting to leader at: env-demo-pcsg-0-model-instance-0-leader-0.env-demo-pcsg-0.default.svc.cluster.local + +Sleeping... +``` + +**Key Observations:** +- Both the leader and worker share the same `GROVE_PCSG_NAME` (`env-demo-pcsg-0-model-instance`) and `GROVE_PCSG_INDEX` (`0`), confirming they belong to the same PCSG replica +- The worker successfully constructed the leader's FQDN using environment variables: + - Leader PodClique name: `$GROVE_PCSG_NAME-$GROVE_PCSG_INDEX-leader` + - Leader pod index: `0` (since there's only 1 leader replica) + - Headless service: `$GROVE_HEADLESS_SERVICE` +- `GROVE_PCSG_TEMPLATE_NUM_PODS` is `4` (1 leader + 3 workers), which can be useful for workers to know the total cluster size + +### Verifying Leader-Worker Connectivity + +Let's verify that workers can actually reach their leader using DNS: + +```bash +# Get a worker pod from the first PCSG replica +WORKER_POD=$(kubectl get pods -l app.kubernetes.io/part-of=env-demo-pcsg -o name | grep "model-instance-0-worker" | head -1) + +# Try to resolve the leader from the worker +kubectl exec $WORKER_POD -- nslookup env-demo-pcsg-0-model-instance-0-leader-0.env-demo-pcsg-0.default.svc.cluster.local +``` + +You should see that the DNS name resolves successfully, confirming that the worker can discover its leader. + +### Cleanup + +```bash +kubectl delete pcs env-demo-pcsg +``` + +--- + +## Next Steps + +Now that you've seen the environment variables in action, continue to [Common Patterns and Takeaways](./04_common-patterns-and-takeaways.md) for reusable patterns you can adapt for your applications and a summary of key concepts. + diff --git a/docs/user-guide/03_environment-variables-for-pod-discovery/04_common-patterns-and-takeaways.md b/docs/user-guide/03_environment-variables-for-pod-discovery/04_common-patterns-and-takeaways.md new file mode 100644 index 000000000..81efac76e --- /dev/null +++ b/docs/user-guide/03_environment-variables-for-pod-discovery/04_common-patterns-and-takeaways.md @@ -0,0 +1,91 @@ +# Common Patterns and Takeaways + +This guide covers practical patterns for using Grove environment variables in your applications, along with key takeaways. + +## Common Patterns for Using Environment Variables + +Here are some common patterns for using Grove environment variables in your applications: + +### Pattern 1: Constructing Pod FQDNs + +To construct the FQDN for any pod in your PodClique: + +```bash +# For your own FQDN +MY_FQDN="$GROVE_PCLQ_NAME-$GROVE_PCLQ_POD_INDEX.$GROVE_HEADLESS_SERVICE" + +# For another pod in the same PodClique (e.g., pod index 3) +OTHER_POD_INDEX=3 +OTHER_POD_FQDN="$GROVE_PCLQ_NAME-$OTHER_POD_INDEX.$GROVE_HEADLESS_SERVICE" +``` + +### Pattern 2: Finding the Leader in a PCSG + +If you're in a worker pod and need to connect to the leader (assuming the leader PodClique is named "leader"): + +```bash +# Construct the leader's PodClique name: PCSG name + PCSG index + "-leader" +LEADER_PCLQ_NAME="$GROVE_PCSG_NAME-$GROVE_PCSG_INDEX-leader" + +# Leader is typically at index 0 +LEADER_FQDN="$LEADER_PCLQ_NAME-0.$GROVE_HEADLESS_SERVICE" +``` + +### Pattern 3: Discovering All Peers in a PodClique + +If you need to construct addresses for all pods in your PodClique: + +```bash +# Assuming you have a way to know the total number of replicas in your PodClique +# (this could be passed in as a custom env var or ConfigMap) +TOTAL_REPLICAS=5 + +for i in $(seq 0 $((TOTAL_REPLICAS - 1))); do + PEER_FQDN="$GROVE_PCLQ_NAME-$i.$GROVE_HEADLESS_SERVICE" + echo "Peer $i: $PEER_FQDN" +done +``` + +### Pattern 4: Determining Your Role in a PCSG + +You can use the `GROVE_PCLQ_NAME` to determine which role this pod plays: + +```bash +# Extract the role from the PodClique name +# The role is typically the last component after the final hyphen +ROLE=$(echo $GROVE_PCLQ_NAME | awk -F- '{print $NF}') + +if [ "$ROLE" = "leader" ]; then + echo "I am a leader pod" +elif [ "$ROLE" = "worker" ]; then + echo "I am a worker pod" +fi +``` + +### Pattern 5: Using Headless Service for Pod Discovery + +The `GROVE_HEADLESS_SERVICE` provides a DNS name that resolves to all pods in the PodCliqueSet replica: + +```bash +# This will return DNS records for all pods in the same PodCliqueSet replica +nslookup $GROVE_HEADLESS_SERVICE +``` + +--- + +## Key Takeaways + +1. **Automatic Context Injection** + Grove injects a consistent set of environment variables into every pod, giving each container precise runtime context about *where it sits* in the PodCliqueSet hierarchy. + +2. **Explicit, Predictable Addressing** + Grove does not hide pod topology. Instead, it provides the building blocks (`GROVE_PCS_NAME`, `GROVE_PCLQ_NAME`, `GROVE_PCSG_NAME`, indices, and the headless service domain) so applications can **explicitly construct the addresses they need**, including those of other PodCliques. + +3. **Stable Pod Identity** + `GROVE_PCLQ_POD_INDEX` gives each pod a stable, deterministic identity within its PodClique, making it easy to assign ranks, shard work, or implement leader/worker logic. + +4. **Scaling-Group Awareness** + For pods in a PodCliqueScalingGroup, Grove exposes additional variables that identify the PCSG replica and its composition. This allows components to understand which *logical unit (super-pod)* they belong to and how many peers are expected. + +5. **Designed for Distributed Systems** + Grove's environment variables are intentionally low-level and composable. They are meant to support a wide range of distributed system patterns—leader election, sharding, rendezvous, collective communication—without imposing a fixed discovery or coordination model. diff --git a/operator/samples/user-guide/concept-overview/complete-inference-pipeline.yaml b/operator/samples/user-guide/01_core-concepts/complete-inference-pipeline.yaml similarity index 74% rename from operator/samples/user-guide/concept-overview/complete-inference-pipeline.yaml rename to operator/samples/user-guide/01_core-concepts/complete-inference-pipeline.yaml index 4471d14a3..ee29162b5 100644 --- a/operator/samples/user-guide/concept-overview/complete-inference-pipeline.yaml +++ b/operator/samples/user-guide/01_core-concepts/complete-inference-pipeline.yaml @@ -1,3 +1,15 @@ +# Example: Complete Inference Pipeline +# Documentation: docs/user-guide/01_core-concepts/04_takeaways.md +# +# Demonstrates a complete inference system with multiple components: +# - Single-node components: Frontend, Vision Encoder +# - Multi-node components: Prefill PCSG, Decode PCSG +# Shows how Grove can represent an arbitrary number of single-node and multi-node components. +# +# NOTE: This example uses fake-node tolerations for the local KIND demo cluster +# created with `make kind-up FAKE_NODES=40`. Remove the tolerations when deploying +# to a real cluster. +--- apiVersion: grove.io/v1alpha1 kind: PodCliqueSet metadata: @@ -22,11 +34,11 @@ spec: - name: frontend image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Frontend Service on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Frontend Service on node:' && hostname && sleep infinity"] resources: requests: - cpu: "0.5" - memory: "1Gi" + cpu: "10m" + memory: "32Mi" - name: vision-encoder spec: roleName: vision-encoder @@ -41,11 +53,11 @@ spec: - name: vision-encoder image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Vision Encoder on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Vision Encoder on node:' && hostname && sleep infinity"] resources: requests: - cpu: "3" - memory: "6Gi" + cpu: "10m" + memory: "32Mi" # Multi-node components - name: pleader spec: @@ -61,11 +73,11 @@ spec: - name: prefill-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" - name: pworker spec: roleName: pworker @@ -80,11 +92,11 @@ spec: - name: prefill-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "4" - memory: "8Gi" + cpu: "10m" + memory: "32Mi" - name: dleader spec: roleName: dleader @@ -99,11 +111,11 @@ spec: - name: decode-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep infinity"] resources: requests: - cpu: "1" - memory: "2Gi" + cpu: "10m" + memory: "32Mi" - name: dworker spec: roleName: dworker @@ -118,11 +130,11 @@ spec: - name: decode-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" podCliqueScalingGroups: - name: prefill cliqueNames: [pleader, pworker] diff --git a/operator/samples/user-guide/concept-overview/multi-node-aggregated.yaml b/operator/samples/user-guide/01_core-concepts/multi-node-aggregated.yaml similarity index 63% rename from operator/samples/user-guide/concept-overview/multi-node-aggregated.yaml rename to operator/samples/user-guide/01_core-concepts/multi-node-aggregated.yaml index 7e5854f16..611e176ae 100644 --- a/operator/samples/user-guide/concept-overview/multi-node-aggregated.yaml +++ b/operator/samples/user-guide/01_core-concepts/multi-node-aggregated.yaml @@ -1,3 +1,14 @@ +# Example: Multi-Node Aggregated Inference +# Documentation: docs/user-guide/01_core-concepts/03_pcsg_intro.md +# +# Demonstrates PodCliqueScalingGroup for multi-node deployments with a +# leader-worker topology. Scaling the PCSG replicates the entire unit +# (1 leader + N workers) while preserving the ratio. +# +# NOTE: This example uses fake-node tolerations for the local KIND demo cluster +# created with `make kind-up FAKE_NODES=40`. Remove the tolerations when deploying +# to a real cluster. +--- apiVersion: grove.io/v1alpha1 kind: PodCliqueSet metadata: @@ -21,11 +32,11 @@ spec: - name: model-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Model Leader (Aggregated) on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Model Leader (Aggregated) on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" - name: worker spec: roleName: worker @@ -40,11 +51,11 @@ spec: - name: model-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep infinity"] resources: requests: - cpu: "4" - memory: "8Gi" + cpu: "10m" + memory: "32Mi" podCliqueScalingGroups: - name: model-instance cliqueNames: [leader, worker] diff --git a/operator/samples/user-guide/concept-overview/multi-node-disaggregated.yaml b/operator/samples/user-guide/01_core-concepts/multi-node-disaggregated.yaml similarity index 72% rename from operator/samples/user-guide/concept-overview/multi-node-disaggregated.yaml rename to operator/samples/user-guide/01_core-concepts/multi-node-disaggregated.yaml index fb8c60d1b..20fb40e58 100644 --- a/operator/samples/user-guide/concept-overview/multi-node-disaggregated.yaml +++ b/operator/samples/user-guide/01_core-concepts/multi-node-disaggregated.yaml @@ -1,3 +1,14 @@ +# Example: Multi-Node Disaggregated Inference +# Documentation: docs/user-guide/01_core-concepts/03_pcsg_intro.md +# +# Demonstrates a multi-node disaggregated serving scenario where +# both prefill and decode components are multi-node with their own PCSGs. +# Each PCSG has leader and worker PodCliques that scale together. +# +# NOTE: This example uses fake-node tolerations for the local KIND demo cluster +# created with `make kind-up FAKE_NODES=40`. Remove the tolerations when deploying +# to a real cluster. +--- apiVersion: grove.io/v1alpha1 kind: PodCliqueSet metadata: @@ -21,11 +32,11 @@ spec: - name: prefill-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Leader on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" - name: pworker spec: roleName: pworker @@ -40,11 +51,11 @@ spec: - name: prefill-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "4" - memory: "8Gi" + cpu: "10m" + memory: "32Mi" - name: dleader spec: roleName: dleader @@ -59,11 +70,11 @@ spec: - name: decode-leader image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Leader on node:' && hostname && sleep infinity"] resources: requests: - cpu: "1" - memory: "2Gi" + cpu: "10m" + memory: "32Mi" - name: dworker spec: roleName: dworker @@ -78,11 +89,11 @@ spec: - name: decode-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" podCliqueScalingGroups: - name: prefill cliqueNames: [pleader, pworker] diff --git a/operator/samples/user-guide/concept-overview/single-node-aggregated.yaml b/operator/samples/user-guide/01_core-concepts/single-node-aggregated.yaml similarity index 52% rename from operator/samples/user-guide/concept-overview/single-node-aggregated.yaml rename to operator/samples/user-guide/01_core-concepts/single-node-aggregated.yaml index 291b0795d..325e12c28 100644 --- a/operator/samples/user-guide/concept-overview/single-node-aggregated.yaml +++ b/operator/samples/user-guide/01_core-concepts/single-node-aggregated.yaml @@ -1,3 +1,13 @@ +# Example: Single-Node Aggregated Inference +# Documentation: docs/user-guide/01_core-concepts/02_pcs_and_pclq_intro.md +# +# Demonstrates a simple PodCliqueSet with a single standalone PodClique where +# each pod is a complete model instance that can service requests independently. +# +# NOTE: This example uses fake-node tolerations for the local KIND demo cluster +# created with `make kind-up FAKE_NODES=40`. Remove the tolerations when deploying +# to a real cluster. +--- apiVersion: grove.io/v1alpha1 kind: PodCliqueSet metadata: @@ -21,8 +31,8 @@ spec: - name: model-worker image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Model Worker (Aggregated) on node:' && hostname && sleep infinity"] resources: requests: - cpu: "1" - memory: "2Gi" \ No newline at end of file + cpu: "10m" + memory: "32Mi" \ No newline at end of file diff --git a/operator/samples/user-guide/concept-overview/single-node-disaggregated.yaml b/operator/samples/user-guide/01_core-concepts/single-node-disaggregated.yaml similarity index 62% rename from operator/samples/user-guide/concept-overview/single-node-disaggregated.yaml rename to operator/samples/user-guide/01_core-concepts/single-node-disaggregated.yaml index 69aae30af..108abfb81 100644 --- a/operator/samples/user-guide/concept-overview/single-node-disaggregated.yaml +++ b/operator/samples/user-guide/01_core-concepts/single-node-disaggregated.yaml @@ -1,3 +1,13 @@ +# Example: Single-Node Disaggregated Inference +# Documentation: docs/user-guide/01_core-concepts/02_pcs_and_pclq_intro.md +# +# Demonstrates disaggregated serving with separate PodCliques for prefill and +# decode operations, allowing independent scaling of each component. +# +# NOTE: This example uses fake-node tolerations for the local KIND demo cluster +# created with `make kind-up FAKE_NODES=40`. Remove the tolerations when deploying +# to a real cluster. +--- apiVersion: grove.io/v1alpha1 kind: PodCliqueSet metadata: @@ -21,11 +31,11 @@ spec: - name: prefill image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Prefill Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "2" - memory: "4Gi" + cpu: "10m" + memory: "32Mi" - name: decode spec: roleName: decode @@ -40,8 +50,8 @@ spec: - name: decode image: nginx:latest command: ["/bin/sh"] - args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep 3600"] + args: ["-c", "echo 'Decode Worker on node:' && hostname && sleep infinity"] resources: requests: - cpu: "1" - memory: "2Gi" \ No newline at end of file + cpu: "10m" + memory: "32Mi" \ No newline at end of file diff --git a/operator/samples/user-guide/02_pod-and-resource-naming-conventions/multinode-disaggregated-with-frontend.yaml b/operator/samples/user-guide/02_pod-and-resource-naming-conventions/multinode-disaggregated-with-frontend.yaml new file mode 100644 index 000000000..0793d4e1f --- /dev/null +++ b/operator/samples/user-guide/02_pod-and-resource-naming-conventions/multinode-disaggregated-with-frontend.yaml @@ -0,0 +1,128 @@ +# Example: Multi-Node Disaggregated Inference with Frontend +# Documentation: docs/user-guide/02_pod-and-resource-naming-conventions/03_hands-on-example.md +# +# Demonstrates Grove's hierarchical pod naming scheme with: +# - A standalone frontend PodClique +# - Prefill PodCliqueScalingGroup (pleader + pworker) +# - Decode PodCliqueScalingGroup (dleader + dworker) +# +# NOTE: This example uses fake-node tolerations for the local KIND demo cluster +# created with `make kind-up FAKE_NODES=40`. Remove the tolerations when deploying +# to a real cluster. +--- +apiVersion: grove.io/v1alpha1 +kind: PodCliqueSet +metadata: + name: multinode-disaggregated + namespace: default +spec: + replicas: 1 + template: + cliques: + # Standalone PodClique + - name: frontend + spec: + replicas: 2 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: frontend + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Frontend' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + # Prefill PodCliqueScalingGroup PodCliques + - name: pleader + spec: + roleName: pleader + replicas: 1 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: pleader + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Prefill Leader' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + - name: pworker + spec: + roleName: pworker + replicas: 3 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: pworker + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Prefill Worker' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + # Decode PodCliqueScalingGroup PodCliques + - name: dleader + spec: + roleName: dleader + replicas: 1 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: dleader + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Decode Leader' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + - name: dworker + spec: + roleName: dworker + replicas: 2 + podSpec: + tolerations: + - key: fake-node + operator: Equal + value: "true" + effect: NoSchedule + containers: + - name: dworker + image: nginx:latest + command: ["/bin/sh"] + args: ["-c", "echo 'Decode Worker' && hostname && sleep infinity"] + resources: + requests: + cpu: "10m" + memory: "32Mi" + podCliqueScalingGroups: + - name: prefill + cliqueNames: [pleader, pworker] + replicas: 2 + - name: decode + cliqueNames: [dleader, dworker] + replicas: 1 + + + diff --git a/operator/samples/user-guide/03_environment-variables-for-pod-discovery/pcsg-env-vars.yaml b/operator/samples/user-guide/03_environment-variables-for-pod-discovery/pcsg-env-vars.yaml new file mode 100644 index 000000000..dfeb4961a --- /dev/null +++ b/operator/samples/user-guide/03_environment-variables-for-pod-discovery/pcsg-env-vars.yaml @@ -0,0 +1,88 @@ +# Example: PodCliqueScalingGroup Environment Variables with Leader-Worker Discovery +# Documentation: docs/user-guide/03_environment-variables-for-pod-discovery/03_hands-on-examples.md +# +# Demonstrates PCSG-specific environment variables and leader-worker communication: +# - GROVE_PCSG_NAME, GROVE_PCSG_INDEX, GROVE_PCSG_TEMPLATE_NUM_PODS +# - How workers construct the leader's FQDN using environment variables +--- +apiVersion: grove.io/v1alpha1 +kind: PodCliqueSet +metadata: + name: env-demo-pcsg + namespace: default +spec: + replicas: 1 + template: + cliques: + - name: leader + spec: + roleName: leader + replicas: 1 + podSpec: + containers: + - name: leader + image: busybox:latest + command: ["/bin/sh"] + args: + - "-c" + - | + echo "=== Leader Pod ===" + echo "GROVE_PCS_NAME=$GROVE_PCS_NAME" + echo "GROVE_PCS_INDEX=$GROVE_PCS_INDEX" + echo "GROVE_PCLQ_NAME=$GROVE_PCLQ_NAME" + echo "GROVE_PCSG_NAME=$GROVE_PCSG_NAME" + echo "GROVE_PCSG_INDEX=$GROVE_PCSG_INDEX" + echo "GROVE_PCLQ_POD_INDEX=$GROVE_PCLQ_POD_INDEX" + echo "GROVE_PCSG_TEMPLATE_NUM_PODS=$GROVE_PCSG_TEMPLATE_NUM_PODS" + echo "" + echo "My FQDN: $GROVE_PCLQ_NAME-$GROVE_PCLQ_POD_INDEX.$GROVE_HEADLESS_SERVICE" + echo "" + echo "Listening for worker connections..." + sleep infinity + resources: + requests: + cpu: "10m" + memory: "32Mi" + - name: worker + spec: + roleName: worker + replicas: 3 + podSpec: + containers: + - name: worker + image: busybox:latest + command: ["/bin/sh"] + args: + - "-c" + - | + echo "=== Worker Pod ===" + echo "GROVE_PCS_NAME=$GROVE_PCS_NAME" + echo "GROVE_PCS_INDEX=$GROVE_PCS_INDEX" + echo "GROVE_PCLQ_NAME=$GROVE_PCLQ_NAME" + echo "GROVE_PCSG_NAME=$GROVE_PCSG_NAME" + echo "GROVE_PCSG_INDEX=$GROVE_PCSG_INDEX" + echo "GROVE_PCLQ_POD_INDEX=$GROVE_PCLQ_POD_INDEX" + echo "GROVE_PCSG_TEMPLATE_NUM_PODS=$GROVE_PCSG_TEMPLATE_NUM_PODS" + echo "" + echo "=== Constructing Leader Address ===" + # The leader PodClique name is: PCSG name + PCSG index + "-leader" + LEADER_PCLQ_NAME="$GROVE_PCSG_NAME-$GROVE_PCSG_INDEX-leader" + # Leader is always pod index 0 since there's only 1 leader replica + LEADER_POD_INDEX=0 + # Construct the leader's FQDN + LEADER_FQDN="$LEADER_PCLQ_NAME-$LEADER_POD_INDEX.$GROVE_HEADLESS_SERVICE" + echo "Connecting to leader at: $LEADER_FQDN" + echo "" + echo "Sleeping..." + sleep infinity + resources: + requests: + cpu: "10m" + memory: "32Mi" + podCliqueScalingGroups: + - name: model-instance + cliqueNames: [leader, worker] + replicas: 2 + + + diff --git a/operator/samples/user-guide/03_environment-variables-for-pod-discovery/standalone-env-vars.yaml b/operator/samples/user-guide/03_environment-variables-for-pod-discovery/standalone-env-vars.yaml new file mode 100644 index 000000000..8b8aba9c9 --- /dev/null +++ b/operator/samples/user-guide/03_environment-variables-for-pod-discovery/standalone-env-vars.yaml @@ -0,0 +1,54 @@ +# Example: Standalone PodClique Environment Variables +# Documentation: docs/user-guide/03_environment-variables-for-pod-discovery/03_hands-on-examples.md +# +# Demonstrates the environment variables Grove injects into standalone PodCliques: +# - GROVE_PCS_NAME, GROVE_PCS_INDEX, GROVE_PCLQ_NAME +# - GROVE_HEADLESS_SERVICE, GROVE_PCLQ_POD_INDEX +--- +apiVersion: grove.io/v1alpha1 +kind: PodCliqueSet +metadata: + name: env-demo-standalone + namespace: default +spec: + replicas: 1 + template: + cliques: + - name: frontend + spec: + replicas: 2 + podSpec: + containers: + - name: app + image: busybox:latest + command: ["/bin/sh"] + args: + - "-c" + - | + echo "=== Grove Environment Variables ===" + echo "GROVE_PCS_NAME=$GROVE_PCS_NAME" + echo "GROVE_PCS_INDEX=$GROVE_PCS_INDEX" + echo "GROVE_PCLQ_NAME=$GROVE_PCLQ_NAME" + echo "GROVE_HEADLESS_SERVICE=$GROVE_HEADLESS_SERVICE" + echo "GROVE_PCLQ_POD_INDEX=$GROVE_PCLQ_POD_INDEX" + echo "" + echo "=== Pod Name vs Hostname ===" + echo "Pod Name (random suffix): $POD_NAME" + echo "Hostname (deterministic): $GROVE_PCLQ_NAME-$GROVE_PCLQ_POD_INDEX" + echo "" + echo "My FQDN: $GROVE_PCLQ_NAME-$GROVE_PCLQ_POD_INDEX.$GROVE_HEADLESS_SERVICE" + echo "" + echo "Sleeping..." + sleep infinity + env: + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + resources: + requests: + cpu: "10m" + memory: "32Mi" + + +