|
| 1 | +# Naming Conventions |
| 2 | + |
| 3 | +This guide explains Grove's hierarchical pod and resource naming scheme and best practices for naming your resources. |
| 4 | + |
| 5 | +## Why Hierarchical Naming Matters |
| 6 | + |
| 7 | +Grove's naming scheme serves two critical purposes: |
| 8 | + |
| 9 | +1. **Immediate Visual Understanding**: Pod names encode the complete hierarchy, so `kubectl get pods` output is self-explanatory. You can instantly see which pods belong together and how they're organized. |
| 10 | + |
| 11 | +2. **Programmatic Service Discovery**: The hierarchical structure enables pods to discover and communicate with each other using fully qualified domain names (FQDNs). The [Environment Variables guide](../environment-variables.md) demonstrates how to programmatically construct these FQDNs using Grove's injected environment variables. |
| 12 | + |
| 13 | +## Prerequisites |
| 14 | + |
| 15 | +Before starting this guide: |
| 16 | +- Review the [core concepts tutorial](../core-concepts/overview.md) to understand Grove's primitives |
| 17 | +- Set up a cluster following the [installation guide](../../installation.md), the two options are: |
| 18 | + - [A local KIND demo cluster](../../installation.md#local-kind-cluster-set-up) Create the cluster with `make kind-up FAKE_NODES=40`, set `KUBECONFIG` env variable as directed, and run `make deploy` |
| 19 | + - [A remote Kubernetes cluster](../../installation.md#remote-cluster-set-up) with [Grove installed from package](../../installation.md#install-grove-from-package) |
| 20 | + |
| 21 | +## Pod Naming Patterns |
| 22 | + |
| 23 | +### Standalone PodCliques |
| 24 | + |
| 25 | +For PodCliques that are **not** part of a PodCliqueScalingGroup, the pod naming follows this pattern: |
| 26 | + |
| 27 | +``` |
| 28 | +<pcs-name>-<pcs-replica-index>-<pclq-name>-<random-suffix> |
| 29 | +``` |
| 30 | + |
| 31 | +**Components:** |
| 32 | +- `<pcs-name>`: The name of the PodCliqueSet |
| 33 | +- `<pcs-replica-index>`: The replica index of the PodCliqueSet (0-based) |
| 34 | +- `<pclq-name>`: The name of the PodClique template defined in the PodCliqueSet spec |
| 35 | +- `<random-suffix>`: A random 5-character suffix generated by Kubernetes |
| 36 | + |
| 37 | +**Example:** `multinode-disaggregated-0-frontend-a7b3c` |
| 38 | + |
| 39 | +Looking at this name, you can immediately tell: |
| 40 | +- It belongs to the `multinode-disaggregated` PodCliqueSet |
| 41 | +- It's part of PodCliqueSet replica 0 |
| 42 | +- It's from the `frontend` PodClique |
| 43 | + |
| 44 | +### PodCliques in a PodCliqueScalingGroup |
| 45 | + |
| 46 | +For PodCliques that **are** part of a PodCliqueScalingGroup, the pod naming includes the PCSG information: |
| 47 | + |
| 48 | +``` |
| 49 | +<pcs-name>-<pcs-replica-index>-<pcsg-name>-<pcsg-replica-index>-<pclq-name>-<random-suffix> |
| 50 | +``` |
| 51 | + |
| 52 | +**Components:** |
| 53 | +- `<pcs-name>`: The name of the PodCliqueSet |
| 54 | +- `<pcs-replica-index>`: The replica index of the PodCliqueSet (0-based) |
| 55 | +- `<pcsg-name>`: The name of the PodCliqueScalingGroup template |
| 56 | +- `<pcsg-replica-index>`: The replica index of the PodCliqueScalingGroup (0-based) |
| 57 | +- `<pclq-name>`: The name of the PodClique template within the PCSG |
| 58 | +- `<random-suffix>`: A random 5-character suffix generated by Kubernetes |
| 59 | + |
| 60 | +**Example:** `multinode-disaggregated-0-prefill-1-pworker-m9n0o` |
| 61 | + |
| 62 | +Looking at this name, you can immediately tell: |
| 63 | +- It belongs to the `multinode-disaggregated` PodCliqueSet (replica 0) |
| 64 | +- It's part of the `prefill` PodCliqueScalingGroup (replica 1) |
| 65 | +- It's from the `pworker` PodClique (prefill worker) |
| 66 | + |
| 67 | +## Naming Best Practices |
| 68 | + |
| 69 | +### Kubernetes Name Length Limit |
| 70 | + |
| 71 | +Kubernetes has a **63-character limit** for resource names. Since Grove constructs full pod names by combining multiple components, you need to be mindful of name lengths when choosing names for your resources. |
| 72 | + |
| 73 | +**How Grove constructs names:** |
| 74 | + |
| 75 | +For standalone PodCliques, the final pod name is: |
| 76 | +``` |
| 77 | +<pcs-name>-<pcs-replica-idx>-<pclq-name>-<5-char-suffix> |
| 78 | +``` |
| 79 | + |
| 80 | +For PodCliques in a PCSG, the final pod name is: |
| 81 | +``` |
| 82 | +<pcs-name>-<pcs-replica-idx>-<pcsg-name>-<pcsg-replica-idx>-<pclq-name>-<5-char-suffix> |
| 83 | +``` |
| 84 | + |
| 85 | +**Character budget breakdown:** |
| 86 | +- `<5-char-suffix>`: 5 characters (fixed by Kubernetes) |
| 87 | +- `-` separators: 3-5 characters depending on structure |
| 88 | +- Replica indices: 1+ characters each (single digit for 0-9, two digits for 10-99, etc.) |
| 89 | +- Your chosen names: Remaining characters |
| 90 | + |
| 91 | +### Naming Guidelines |
| 92 | + |
| 93 | +1. **Use Short, Descriptive Names**: Choose concise but meaningful names |
| 94 | + - ✅ Good: `frontend`, `api`, `db`, `cache` |
| 95 | + - ❌ Avoid: `frontend-service-component`, `api-gateway-server` (too long) |
| 96 | + - ❌ Avoid: `f`, `a`, `d`, `c` (too cryptic) |
| 97 | + |
| 98 | +2. **Use Abbreviations for Multi-Component Systems**: When you have multiple PodCliqueScalingGroups with similar roles, use prefixes or abbreviations |
| 99 | + - ✅ Good: `pleader`, `pworker` (prefill), `dleader`, `dworker` (decode) |
| 100 | + - ❌ Avoid: `prefill-leader`, `prefill-worker`, `decode-leader`, `decode-worker` |
| 101 | + |
| 102 | +3. **Keep PodCliqueSet Names Short**: Remember that the PCS name is included in every pod name |
| 103 | + - ✅ Good: `ml-inference`, `web-app`, `data-pipeline` |
| 104 | + - ❌ Avoid: `machine-learning-inference-service`, `web-application-stack` |
| 105 | + |
| 106 | +4. **Plan for Scaling**: Consider whether you'll need double-digit replica indices (adds 1 character per additional digit) |
| 107 | + - If you plan to scale to 10+ or 100+ or 1000+ replicas, budget accordingly |
| 108 | + |
| 109 | +5. **Unique PodClique Names Within a PodCliqueSet**: All PodClique names must be unique within a PodCliqueSet. We explain the rationale for this further in the [Hands-On Example](./03_hands-on-example.md#why-unique-podclique-names-matter). |
| 110 | + - If you have leader/worker patterns in multiple PCSGs, you **must** use different names (e.g., `pleader`/`pworker` and `dleader`/`dworker`) |
| 111 | + |
| 112 | +### Example: Planning Names for a Complex System |
| 113 | + |
| 114 | +Let's plan names for a multi-node disaggregated inference system with a frontend: |
| 115 | + |
| 116 | +**Requirements:** |
| 117 | +- 1 standalone frontend component |
| 118 | +- 2 multi-node components: prefill and decode |
| 119 | +- Each multi-node component has leader/worker roles |
| 120 | +- All PodClique names must be unique within the PodCliqueSet |
| 121 | +- Names should be short to allow for scaling headroom while remaining descriptive |
| 122 | + |
| 123 | +**Name choices:** |
| 124 | +- PodCliqueSet: `mn-disagg` (short, 9 chars) |
| 125 | +- Standalone PodClique: `frontend` (8 chars) |
| 126 | +- PCSG for prefill: `prefill` (7 chars) |
| 127 | + - Leader PodClique: `pleader` (7 chars) |
| 128 | + - Worker PodClique: `pworker` (7 chars) |
| 129 | +- PCSG for decode: `decode` (6 chars) |
| 130 | + - Leader PodClique: `dleader` (7 chars) |
| 131 | + - Worker PodClique: `dworker` (7 chars) |
| 132 | + |
| 133 | +**Resulting pod names:** |
| 134 | +- Frontend: `mn-disagg-0-frontend-a7b3c` (26 chars) ✅ |
| 135 | +- Prefill leader: `mn-disagg-0-prefill-0-pleader-a7b3c` (35 chars) ✅ |
| 136 | +- Prefill worker: `mn-disagg-0-prefill-0-pworker-a7b3c` (35 chars) ✅ |
| 137 | +- Decode leader: `mn-disagg-0-decode-0-dleader-a7b3c` (34 chars) ✅ |
| 138 | +- Decode worker: `mn-disagg-0-decode-0-dworker-a7b3c` (34 chars) ✅ |
| 139 | + |
| 140 | +**Scaling headroom:** The longest name (`mn-disagg-0-prefill-0-pworker-a7b3c`) is 35 characters, leaving 28 characters of headroom. Each additional digit in a replica index adds 1 character: |
| 141 | +- 2-digit indices for PCS and PCSG (10-99): 37 chars → scales to 99 PCS replicas × 99 PCSG replicas ✅ |
| 142 | +- 3-digit indices for PCS and PCSG (100-999): 39 chars → scales to 999 × 999 replicas ✅ |
| 143 | +- 7-digit indices for PCS and PCSG: 47 chars → scales to millions of replicas ✅ |
| 144 | + |
| 145 | +With these name choices, you could scale to millions of replicas on both dimensions without hitting the limit. All names are well under the 63-character limit with room for scaling growth while remaining descriptive! |
| 146 | + |
| 147 | +To deploy a PodCliqueSet with this structure and explore the naming hierarchy through `kubectl`, continue to the [Hands-On Example](./03_hands-on-example.md). |
| 148 | + |
| 149 | +--- |
| 150 | + |
| 151 | +## Resource Naming Reference |
| 152 | + |
| 153 | +### Grove Resources and Their Naming |
| 154 | + |
| 155 | +| Resource | You Name | Grove Generates | Pattern | |
| 156 | +|----------|----------|-----------------|---------| |
| 157 | +| **PodCliqueSet** | ✅ | - | `<your-pcs-name>` | |
| 158 | +| **PodClique (template)** | ✅ | - | `<your-pclq-name>` (in spec.template.cliques) | |
| 159 | +| **PCSG (template)** | ✅ | - | `<your-pcsg-name>` (in spec.template.podCliqueScalingGroups) | |
| 160 | +| **PodClique (resource, standalone)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pclq-name>` | |
| 161 | +| **PodClique (resource, in PCSG)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pcsg-name>-<pcsg-idx>-<pclq-name>` | |
| 162 | +| **PCSG (resource)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pcsg-name>` | |
| 163 | +| **Pod (standalone)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pclq-name>-<suffix>` | |
| 164 | +| **Pod (in PCSG)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pcsg-name>-<pcsg-idx>-<pclq-name>-<suffix>` | |
| 165 | + |
| 166 | +**You control:** PodCliqueSet name, PodClique template names, PCSG template names |
| 167 | +**Grove generates:** All resource instances with hierarchical naming |
| 168 | + |
| 169 | +## Key Takeaways |
| 170 | + |
| 171 | +1. **Self-Documenting Hierarchy**: Pod names encode the complete hierarchy from PodCliqueSet → PCSG (if applicable) → PodClique → Pod, making `kubectl get pods` output immediately understandable. |
| 172 | + |
| 173 | +2. **63-Character Limit**: Kubernetes enforces a 63-character limit on resource names. Use short, meaningful names for your resources, especially PodCliqueSet and PCSG names which appear in every generated name. |
| 174 | + |
| 175 | +3. **Unique PodClique Names**: All PodClique names must be unique within a PodCliqueSet. When you have similar roles in multiple PCSGs (e.g., leader/worker in both prefill and decode), use prefixes or abbreviations (e.g., `pleader`/`pworker` and `dleader`/`dworker`). |
| 176 | + |
| 177 | +4. **Predictable Patterns**: The naming scheme is consistent whether you're using standalone PodCliques or PodCliqueScalingGroups, making it easy to understand your system at a glance. |
| 178 | + |
| 179 | +5. **Planning is Key**: Before creating resources, plan your names considering the full hierarchy and potential scaling needs. |
| 180 | + |
| 181 | +## Next Steps |
| 182 | + |
| 183 | +Now that you understand Grove's naming scheme and best practices: |
| 184 | + |
| 185 | +- **See it in action**: Continue to the [Hands-On Example](./03_hands-on-example.md) to deploy an example system and observe the naming hierarchy firsthand. |
| 186 | + |
| 187 | +- **Learn programmatic discovery**: Head to the [Environment Variables guide](../environment-variables.md) to learn how to use these names programmatically for service discovery, including how Grove injects environment variables and how to construct FQDNs for pod-to-pod communication. |
| 188 | + |
0 commit comments