Skip to content

Commit 116d0e5

Browse files
committed
split pod and resource naming conventions into pieces for improved readability
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
1 parent d69f909 commit 116d0e5

File tree

6 files changed

+484
-448
lines changed

6 files changed

+484
-448
lines changed

docs/user-guide/environment-variables.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This guide explains the environment variables that Grove automatically injects i
66

77
Before starting this guide:
88
- Review the [core concepts tutorial](./core-concepts/overview.md) to understand Grove's primitives
9-
- Read the [Pod Naming guide](./pod-naming.md) to understand Grove's naming conventions
9+
- Read the [Pod Naming guide](./pod-and-resource-naming-conventions/01_overview.md) to understand Grove's naming conventions
1010
- Set up a cluster following the [installation guide](../installation.md), the two options are:
1111
- [A local KIND demo cluster](../installation.md#local-kind-cluster-set-up): Create the cluster with `make kind-up FAKE_NODES=40`, set `KUBECONFIG` env variable as directed, and run `make deploy`
1212
- [A remote Kubernetes cluster](../installation.md#remote-cluster-set-up) with [Grove installed from package](../installation.md#install-grove-from-package)
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Pod and Resource Naming Conventions
2+
3+
This section explains Grove's hierarchical naming scheme for pods and resources. Grove's naming convention is designed to be **self-documenting**: when you run `kubectl get pods`, the pod names immediately tell you which PodCliqueSet, PodCliqueScalingGroup (if applicable), and PodClique each pod belongs to.
4+
5+
## Guides in This Section
6+
7+
1. **[Naming Conventions](./02_naming-conventions.md)**: Learn the naming patterns, best practices, and how to plan names for your resources.
8+
9+
2. **[Hands-On Example](./03_hands-on-example.md)**: Deploy an example with the structure of a multi-node disaggregated inference system and observe the naming hierarchy in action.
10+
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Naming Conventions
2+
3+
This guide explains Grove's hierarchical pod and resource naming scheme and best practices for naming your resources.
4+
5+
## Why Hierarchical Naming Matters
6+
7+
Grove's naming scheme serves two critical purposes:
8+
9+
1. **Immediate Visual Understanding**: Pod names encode the complete hierarchy, so `kubectl get pods` output is self-explanatory. You can instantly see which pods belong together and how they're organized.
10+
11+
2. **Programmatic Service Discovery**: The hierarchical structure enables pods to discover and communicate with each other using fully qualified domain names (FQDNs). The [Environment Variables guide](../environment-variables.md) demonstrates how to programmatically construct these FQDNs using Grove's injected environment variables.
12+
13+
## Prerequisites
14+
15+
Before starting this guide:
16+
- Review the [core concepts tutorial](../core-concepts/overview.md) to understand Grove's primitives
17+
- Set up a cluster following the [installation guide](../../installation.md), the two options are:
18+
- [A local KIND demo cluster](../../installation.md#local-kind-cluster-set-up) Create the cluster with `make kind-up FAKE_NODES=40`, set `KUBECONFIG` env variable as directed, and run `make deploy`
19+
- [A remote Kubernetes cluster](../../installation.md#remote-cluster-set-up) with [Grove installed from package](../../installation.md#install-grove-from-package)
20+
21+
## Pod Naming Patterns
22+
23+
### Standalone PodCliques
24+
25+
For PodCliques that are **not** part of a PodCliqueScalingGroup, the pod naming follows this pattern:
26+
27+
```
28+
<pcs-name>-<pcs-replica-index>-<pclq-name>-<random-suffix>
29+
```
30+
31+
**Components:**
32+
- `<pcs-name>`: The name of the PodCliqueSet
33+
- `<pcs-replica-index>`: The replica index of the PodCliqueSet (0-based)
34+
- `<pclq-name>`: The name of the PodClique template defined in the PodCliqueSet spec
35+
- `<random-suffix>`: A random 5-character suffix generated by Kubernetes
36+
37+
**Example:** `multinode-disaggregated-0-frontend-a7b3c`
38+
39+
Looking at this name, you can immediately tell:
40+
- It belongs to the `multinode-disaggregated` PodCliqueSet
41+
- It's part of PodCliqueSet replica 0
42+
- It's from the `frontend` PodClique
43+
44+
### PodCliques in a PodCliqueScalingGroup
45+
46+
For PodCliques that **are** part of a PodCliqueScalingGroup, the pod naming includes the PCSG information:
47+
48+
```
49+
<pcs-name>-<pcs-replica-index>-<pcsg-name>-<pcsg-replica-index>-<pclq-name>-<random-suffix>
50+
```
51+
52+
**Components:**
53+
- `<pcs-name>`: The name of the PodCliqueSet
54+
- `<pcs-replica-index>`: The replica index of the PodCliqueSet (0-based)
55+
- `<pcsg-name>`: The name of the PodCliqueScalingGroup template
56+
- `<pcsg-replica-index>`: The replica index of the PodCliqueScalingGroup (0-based)
57+
- `<pclq-name>`: The name of the PodClique template within the PCSG
58+
- `<random-suffix>`: A random 5-character suffix generated by Kubernetes
59+
60+
**Example:** `multinode-disaggregated-0-prefill-1-pworker-m9n0o`
61+
62+
Looking at this name, you can immediately tell:
63+
- It belongs to the `multinode-disaggregated` PodCliqueSet (replica 0)
64+
- It's part of the `prefill` PodCliqueScalingGroup (replica 1)
65+
- It's from the `pworker` PodClique (prefill worker)
66+
67+
## Naming Best Practices
68+
69+
### Kubernetes Name Length Limit
70+
71+
Kubernetes has a **63-character limit** for resource names. Since Grove constructs full pod names by combining multiple components, you need to be mindful of name lengths when choosing names for your resources.
72+
73+
**How Grove constructs names:**
74+
75+
For standalone PodCliques, the final pod name is:
76+
```
77+
<pcs-name>-<pcs-replica-idx>-<pclq-name>-<5-char-suffix>
78+
```
79+
80+
For PodCliques in a PCSG, the final pod name is:
81+
```
82+
<pcs-name>-<pcs-replica-idx>-<pcsg-name>-<pcsg-replica-idx>-<pclq-name>-<5-char-suffix>
83+
```
84+
85+
**Character budget breakdown:**
86+
- `<5-char-suffix>`: 5 characters (fixed by Kubernetes)
87+
- `-` separators: 3-5 characters depending on structure
88+
- Replica indices: 1+ characters each (single digit for 0-9, two digits for 10-99, etc.)
89+
- Your chosen names: Remaining characters
90+
91+
### Naming Guidelines
92+
93+
1. **Use Short, Descriptive Names**: Choose concise but meaningful names
94+
- ✅ Good: `frontend`, `api`, `db`, `cache`
95+
- ❌ Avoid: `frontend-service-component`, `api-gateway-server` (too long)
96+
- ❌ Avoid: `f`, `a`, `d`, `c` (too cryptic)
97+
98+
2. **Use Abbreviations for Multi-Component Systems**: When you have multiple PodCliqueScalingGroups with similar roles, use prefixes or abbreviations
99+
- ✅ Good: `pleader`, `pworker` (prefill), `dleader`, `dworker` (decode)
100+
- ❌ Avoid: `prefill-leader`, `prefill-worker`, `decode-leader`, `decode-worker`
101+
102+
3. **Keep PodCliqueSet Names Short**: Remember that the PCS name is included in every pod name
103+
- ✅ Good: `ml-inference`, `web-app`, `data-pipeline`
104+
- ❌ Avoid: `machine-learning-inference-service`, `web-application-stack`
105+
106+
4. **Plan for Scaling**: Consider whether you'll need double-digit replica indices (adds 1 character per additional digit)
107+
- If you plan to scale to 10+ or 100+ or 1000+ replicas, budget accordingly
108+
109+
5. **Unique PodClique Names Within a PodCliqueSet**: All PodClique names must be unique within a PodCliqueSet. We explain the rationale for this further in the [Hands-On Example](./03_hands-on-example.md#why-unique-podclique-names-matter).
110+
- If you have leader/worker patterns in multiple PCSGs, you **must** use different names (e.g., `pleader`/`pworker` and `dleader`/`dworker`)
111+
112+
### Example: Planning Names for a Complex System
113+
114+
Let's plan names for a multi-node disaggregated inference system with a frontend:
115+
116+
**Requirements:**
117+
- 1 standalone frontend component
118+
- 2 multi-node components: prefill and decode
119+
- Each multi-node component has leader/worker roles
120+
- All PodClique names must be unique within the PodCliqueSet
121+
- Names should be short to allow for scaling headroom while remaining descriptive
122+
123+
**Name choices:**
124+
- PodCliqueSet: `mn-disagg` (short, 9 chars)
125+
- Standalone PodClique: `frontend` (8 chars)
126+
- PCSG for prefill: `prefill` (7 chars)
127+
- Leader PodClique: `pleader` (7 chars)
128+
- Worker PodClique: `pworker` (7 chars)
129+
- PCSG for decode: `decode` (6 chars)
130+
- Leader PodClique: `dleader` (7 chars)
131+
- Worker PodClique: `dworker` (7 chars)
132+
133+
**Resulting pod names:**
134+
- Frontend: `mn-disagg-0-frontend-a7b3c` (26 chars) ✅
135+
- Prefill leader: `mn-disagg-0-prefill-0-pleader-a7b3c` (35 chars) ✅
136+
- Prefill worker: `mn-disagg-0-prefill-0-pworker-a7b3c` (35 chars) ✅
137+
- Decode leader: `mn-disagg-0-decode-0-dleader-a7b3c` (34 chars) ✅
138+
- Decode worker: `mn-disagg-0-decode-0-dworker-a7b3c` (34 chars) ✅
139+
140+
**Scaling headroom:** The longest name (`mn-disagg-0-prefill-0-pworker-a7b3c`) is 35 characters, leaving 28 characters of headroom. Each additional digit in a replica index adds 1 character:
141+
- 2-digit indices for PCS and PCSG (10-99): 37 chars → scales to 99 PCS replicas × 99 PCSG replicas ✅
142+
- 3-digit indices for PCS and PCSG (100-999): 39 chars → scales to 999 × 999 replicas ✅
143+
- 7-digit indices for PCS and PCSG: 47 chars → scales to millions of replicas ✅
144+
145+
With these name choices, you could scale to millions of replicas on both dimensions without hitting the limit. All names are well under the 63-character limit with room for scaling growth while remaining descriptive!
146+
147+
To deploy a PodCliqueSet with this structure and explore the naming hierarchy through `kubectl`, continue to the [Hands-On Example](./03_hands-on-example.md).
148+
149+
---
150+
151+
## Resource Naming Reference
152+
153+
### Grove Resources and Their Naming
154+
155+
| Resource | You Name | Grove Generates | Pattern |
156+
|----------|----------|-----------------|---------|
157+
| **PodCliqueSet** || - | `<your-pcs-name>` |
158+
| **PodClique (template)** || - | `<your-pclq-name>` (in spec.template.cliques) |
159+
| **PCSG (template)** || - | `<your-pcsg-name>` (in spec.template.podCliqueScalingGroups) |
160+
| **PodClique (resource, standalone)** | - || `<pcs-name>-<pcs-idx>-<pclq-name>` |
161+
| **PodClique (resource, in PCSG)** | - || `<pcs-name>-<pcs-idx>-<pcsg-name>-<pcsg-idx>-<pclq-name>` |
162+
| **PCSG (resource)** | - || `<pcs-name>-<pcs-idx>-<pcsg-name>` |
163+
| **Pod (standalone)** | - || `<pcs-name>-<pcs-idx>-<pclq-name>-<suffix>` |
164+
| **Pod (in PCSG)** | - || `<pcs-name>-<pcs-idx>-<pcsg-name>-<pcsg-idx>-<pclq-name>-<suffix>` |
165+
166+
**You control:** PodCliqueSet name, PodClique template names, PCSG template names
167+
**Grove generates:** All resource instances with hierarchical naming
168+
169+
## Key Takeaways
170+
171+
1. **Self-Documenting Hierarchy**: Pod names encode the complete hierarchy from PodCliqueSet → PCSG (if applicable) → PodClique → Pod, making `kubectl get pods` output immediately understandable.
172+
173+
2. **63-Character Limit**: Kubernetes enforces a 63-character limit on resource names. Use short, meaningful names for your resources, especially PodCliqueSet and PCSG names which appear in every generated name.
174+
175+
3. **Unique PodClique Names**: All PodClique names must be unique within a PodCliqueSet. When you have similar roles in multiple PCSGs (e.g., leader/worker in both prefill and decode), use prefixes or abbreviations (e.g., `pleader`/`pworker` and `dleader`/`dworker`).
176+
177+
4. **Predictable Patterns**: The naming scheme is consistent whether you're using standalone PodCliques or PodCliqueScalingGroups, making it easy to understand your system at a glance.
178+
179+
5. **Planning is Key**: Before creating resources, plan your names considering the full hierarchy and potential scaling needs.
180+
181+
## Next Steps
182+
183+
Now that you understand Grove's naming scheme and best practices:
184+
185+
- **See it in action**: Continue to the [Hands-On Example](./03_hands-on-example.md) to deploy an example system and observe the naming hierarchy firsthand.
186+
187+
- **Learn programmatic discovery**: Head to the [Environment Variables guide](../environment-variables.md) to learn how to use these names programmatically for service discovery, including how Grove injects environment variables and how to construct FQDNs for pod-to-pod communication.
188+

0 commit comments

Comments
 (0)