-
Notifications
You must be signed in to change notification settings - Fork 35
doc: Add pod-naming and environment-variables docs #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
doc: Add pod-naming and environment-variables docs #355
Conversation
…s in user guide, update resource requirements in samples to work on one real node Signed-off-by: Rohan Varma <rohanv@nvidia.com>
shayasoolin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, added one comment
|
Just a few minor things, otherwise looks good. |
Co-authored-by: Geoff Flarity <geoff.flarity@gmail.com> Signed-off-by: Rohan Varma <rohanv@nvidia.com>
|
Docs are quite long but it's not a bad thing. @nvrohanv would double check if there's any room to consolidate. also, keep in mind 63-char Kubernetes limit otherwise LGTM! |
renormalize
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, but overall the changes look good to me.
| - Example: `my-service-0-model-instance` | ||
| - **Note:** This does NOT include the PCSG replica index. To construct a sibling PodClique name within the same PCSG replica, use: `$GROVE_PCSG_NAME-$GROVE_PCSG_INDEX-<pclq-template-name>` | ||
|
|
||
| **Note:** `GROVE_PCSG_TEMPLATE_NUM_PODS` represents the total number of pods defined in the PodCliqueScalingGroup template, calculated as the sum of replicas across all PodCliques in the PCSG. For example, if a PCSG has 1 leader replica and 3 worker replicas, this value would be 4. This value does not change based on scaling, so is only guaranteed to be accurate at startup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it only guaranteed to be accurate at startup if the value does not change due to scaling?
This is the sum of the number of pods across all PodCliques that belong to a particular PodCliqueScalingGroup. It is guaranteed to be accurate throughout. This value doesn't change when a PodCliqueScalingGroup scales either.
| **Note:** `GROVE_PCSG_TEMPLATE_NUM_PODS` represents the total number of pods defined in the PodCliqueScalingGroup template, calculated as the sum of replicas across all PodCliques in the PCSG. For example, if a PCSG has 1 leader replica and 3 worker replicas, this value would be 4. This value does not change based on scaling, so is only guaranteed to be accurate at startup. | |
| **Note:** `GROVE_PCSG_TEMPLATE_NUM_PODS` represents the total number of pods defined in the PodCliqueScalingGroup template, calculated as the sum of replicas across all PodCliques in the PCSG. For example, if a PCSG has 1 leader replica and 3 worker replicas, this value would be 4. This value does not change based on scaling, so is guaranteed to be accurate through the entirety of the pod's lifecycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would change if you scale the PodCliques within a PCSG replica right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can explain that it would only change if the PodClique within PCSG replica is scaled if thats correct.
operator/samples/user-guide/naming-and-env-vars/standalone-env-vars.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
…adability Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
| - ✅ Good: `pleader`, `pworker` (prefill), `dleader`, `dworker` (decode) | ||
| - ❌ Avoid: `prefill-leader`, `prefill-worker`, `decode-leader`, `decode-worker` | ||
|
|
||
| 3. **Keep PodCliqueSet Names Short**: Remember that the PCS name is included in every pod name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the above example of multinode-disaggregated as PCS name is short enough as an example.. maybe mn-disagg is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed will put mn-disagg everywhere
| apiVersion: grove.io/v1alpha1 | ||
| kind: PodCliqueSet | ||
| metadata: | ||
| name: multinode-disaggregated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If mn-disagg is recommended above, let's use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
|
|
||
| PodClique names **must be unique within a PodCliqueSet**. Reusing generic names like `leader` and `worker` across multiple PodCliqueScalingGroups is therefore **not allowed**. | ||
|
|
||
| More importantly, this reflects **Grove's core philosophy**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the philosophy part is more suitable to the main readme file, not the hands-on example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the uniqueness issue requires so much attention. Due to the nature of the PCS yaml, where PCLQs are all specified in the same hierarchy level, it's almost self-explanatory that names are expected to be unique. You rarely expect a list to allow repeating the same names.
So worth mentioning, but I'd consider shortening it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that it is really calling it out. Still the first reason I'm wanting to go into such detail is that when people see a name like prefill-pleader they tend to wonder why its not just prefill-leader so I wanted to go into detail about why we are requiring this. The second reason is that a good chunk of users will probably just ask AI about things these days so in general i'm putting in stuff like so that the AI picks up a clear explanation when it searches grove repo. Let me know if you have strong opinion on this.
Regarding the philosophy part I agree that main readme probably should have general philosophy. I'm thinking i can maybe keep this for now and then in a separate pr (since this one is already so big) reference the core philosophy in a previous doc and then just call it out again.
|
For next update: If I was looking to find the exact names of the various PCS components I'd probably just use labels rather than futz around with constructing a name: Grove Labels ReferenceThis document summarizes the Kubernetes labels that Grove sets on pods and resources, enabling programmatic queries via Source: Grove-Specific Labels (
|
| Label Key | Description |
|---|---|
grove.io/podclique |
The name of the PodClique the pod belongs to |
grove.io/podcliqueset-replica-index |
The replica index of the PodCliqueSet (0-based) |
grove.io/podcliquescalinggroup |
The name of the PodCliqueScalingGroup (if applicable) |
grove.io/podcliquescalinggroup-replica-index |
The replica index of the PCSG (0-based) |
grove.io/podgang |
The name of the PodGang for gang scheduling |
grove.io/base-podgang |
Base PodGang name for scaled PodGangs (beyond MinAvailable) |
grove.io/pod-template-hash |
Hash of the PodSpec (used for rolling updates) |
Standard Kubernetes Labels (app.kubernetes.io/)
| Label Key | Value | Description |
|---|---|---|
app.kubernetes.io/part-of |
PodCliqueSet name | All resources belonging to a PodCliqueSet share this label |
app.kubernetes.io/managed-by |
grove-operator |
Identifies resources managed by Grove |
app.kubernetes.io/component |
varies | Component type (see below) |
app.kubernetes.io/name |
resource name | Name of the specific resource |
Component Values (app.kubernetes.io/component)
| Value | Description |
|---|---|
pcs-podclique |
PodClique owned directly by PodCliqueSet (standalone) |
pcsg-podclique |
PodClique owned by a PodCliqueScalingGroup |
pcs-podcliquescalinggroup |
PodCliqueScalingGroup resource |
pcs-headless-service |
Headless service for a PodCliqueSet replica |
podgang |
PodGang resource |
pod-role |
Role for pods in the PodCliqueSet |
pod-role-binding |
RoleBinding for pods |
pod-service-account |
ServiceAccount for pods |
pcs-hpa |
HorizontalPodAutoscaler for scaling |
Usage Examples
Get All Pods in a PodCliqueSet
kubectl get pods -l app.kubernetes.io/part-of=my-inference-appGet Pods in a Specific PodClique
kubectl get pods -l grove.io/podclique=prefill-0-pleaderGet Pods in a PodCliqueScalingGroup
kubectl get pods -l grove.io/podcliquescalinggroup=prefillGet Pods in a Specific PCSG Replica
kubectl get pods -l grove.io/podcliquescalinggroup=prefill,grove.io/podcliquescalinggroup-replica-index=0Get Pods in a Specific PodCliqueSet Replica
kubectl get pods -l app.kubernetes.io/part-of=my-inference-app,grove.io/podcliqueset-replica-index=0Get All Grove-Managed Resources
kubectl get pods -l app.kubernetes.io/managed-by=grove-operatorGet All Standalone PodCliques (not in a PCSG)
kubectl get pclq -l app.kubernetes.io/component=pcs-podcliqueGet All PodCliques in PodCliqueScalingGroups
kubectl get pclq -l app.kubernetes.io/component=pcsg-podcliqueCombining Labels with Environment Variables
Grove also injects environment variables into pods for programmatic discovery from within containers. See the Environment Variables documentation for details on:
GROVE_PCS_NAME- PodCliqueSet nameGROVE_PCS_INDEX- PodCliqueSet replica indexGROVE_PCLQ_NAME- PodClique nameGROVE_PCSG_NAME- PodCliqueScalingGroup nameGROVE_PCSG_INDEX- PodCliqueScalingGroup replica indexGROVE_HEADLESS_SERVICE- Headless service for DNS resolutionGROVE_PCLQ_POD_INDEX- Pod index within the PodClique
Labels vs Environment Variables:
- Use labels for external queries (
kubectl, monitoring, alerting) - Use environment variables for internal pod-to-pod discovery and coordination
gflarity
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the labels I mentioned might make it a lot easier for those who need to find/communication with different components. So I threw them in there for consideration. Regardless this is a huge improvement 👍
Yes this is great, I'll add this in as a separate pr since this one is already quite large. |
Add Pod Naming and Environment Variables Documentation
Summary
This PR adds comprehensive documentation for Grove's pod naming conventions and environment variables for pod discovery. The documentation is organized into two new user guide sections:
docs/user-guide/02_pod-and-resource-naming-conventions/) - Explains Grove's hierarchical naming scheme and best practices for naming resourcesdocs/user-guide/03_environment-variables-for-pod-discovery/) - Documents the environment variables Grove injects for pod discovery and coordination and shows examples of how to use them to construct pod fully qualified domain names (FQDN).Changes
New Documentation Structure
docs/user-guide/02_pod-and-resource-naming-conventions/01_overview.md- Prerequisites and guide overview02_naming-conventions.md- Complete guide covering:pleader/pworkeranddleader/dworker)03_hands-on-example.md- Deployable multi-node disaggregated inference example demonstrating the naming hierarchydocs/user-guide/03_environment-variables-for-pod-discovery/01_overview.md- Prerequisites and guide overview02_env_var_reference.md- Reference guide covering:GROVE_PCS_NAME,GROVE_PCLQ_NAME,GROVE_PCSG_NAME, etc.)03_hands-on-examples.md- Two worked examples with deployable YAMLs:04_common-patterns-and-takeaways.md- Practical patterns for constructing FQDNs, finding leaders, discovering peers, and key takeawaysNew Sample Files
operator/samples/user-guide/02_pod-and-resource-naming-conventions/multinode-disaggregated-with-frontend.yamloperator/samples/user-guide/03_environment-variables-for-pod-discovery/standalone-env-vars.yamloperator/samples/user-guide/03_environment-variables-for-pod-discovery/pcsg-env-vars.yamlReorganized Existing Files
Documentation Directories - Added numeric prefixes for ordering:
docs/user-guide/core-concepts/→docs/user-guide/01_core-concepts/docs/user-guide/pod-and-resource-naming-conventions/→docs/user-guide/02_pod-and-resource-naming-conventions/docs/user-guide/environment-variables-for-pod-discovery/→docs/user-guide/03_environment-variables-for-pod-discovery/Core Concepts Files - Renamed with numeric prefixes:
overview.md→01_overview.mdpcs_and_pclq_intro.md→02_pcs_and_pclq_intro.mdpcsg_intro.md→03_pcsg_intro.mdtakeaways.md→04_takeaways.mdSample YAML Directories - Reorganized to match documentation structure:
operator/samples/user-guide/concept-overview/→operator/samples/user-guide/01_core-concepts/operator/samples/user-guide/pod-and-resource-naming-conventions/→operator/samples/user-guide/02_pod-and-resource-naming-conventions/operator/samples/user-guide/environment-variables-for-pod-discovery/→operator/samples/user-guide/03_environment-variables-for-pod-discovery/All sample YAMLs updated with:
cpu: 10m,memory: 32Mi) to work on a single real node in KIND clustersUpdated References
README.mdanddocs/quickstart.mdwith links to new documentation structureTesting
All commands documented in the guides were tested end-to-end:
make kind-up FAKE_NODES=40make deploykubectl apply,kubectl get pods/pclq/pcsg,kubectl logs, andkubectl exec -- nslookupcommands