Skip to content

Conversation

@Ronkahn21
Copy link
Contributor

Summary

  • Add comprehensive pod name constraints validation for all PodClique templates during PodGangSet admission
  • Implement selective validation logic that validates PodCliques differently based on whether they belong to scaling groups
  • Improve error messages with clearer, more actionable feedback for constraint violations

Changes

  • Added pod name constraints validation to validatePodCliqueTemplates for PodCliques NOT in scaling groups (45-char limit: pgs-name + pclq-name)
  • Added pod name constraints validation to validateScalingGroupPodCliqueNames for PodCliques IN scaling groups (45-char limit: pgs-name + pcsg-name + pclq-name)
  • Updated error messages to be more user-friendly while maintaining technical accuracy

Technical Details

Pod names follow different formats:

  • PCSG pods: <pgs-name>-<pgs-index>-<pcsg-name>-<pcsg-index>-<pclq-name>-<random>
  • Non-PCSG pods: <pgs-name>-<pgs-index>-<pclq-name>-<random>

The validation ensures:

  • PodCliques NOT in scaling groups: validated with 45-character limit for pgs-name + pclq-name
  • PodCliques IN scaling groups: validated with 45-character limit for pgs-name + pcsg-name + pclq-name
  • Total pod names stay under 64 characters for annotation support

Copy link
Collaborator

@renormalize renormalize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ronkahn21 for the PR!
Most things look good, I just have a few nits here and there.

Tests all look great as well!

@Ronkahn21 Ronkahn21 force-pushed the pod-name-validation-pgs branch from 986de2e to bc54e66 Compare July 12, 2025 16:43
- Implement validation for pod name length constraints (45 chars max)
- Add separate validation for PodClique templates not in scaling groups
- Ensure total pod names stay under 64 characters for annotation support
- Add comprehensive test coverage for validation scenarios
- Handle nil pointer cases and improve error messages

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
@Ronkahn21 Ronkahn21 force-pushed the pod-name-validation-pgs branch from bc54e66 to 58fb776 Compare July 12, 2025 16:46
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Copy link
Collaborator

@renormalize renormalize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit that I've added. Will merge the PR into main once it is addressed. Thanks!

Comment on lines +110 to +111
// Get all clique names that belong to scaling groups
scalingGroupCliqueNames := v.getScalingGroupCliqueNames()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved into v.validatePodCliqueTemplateSpec or validatePodCliqueNameConstraints, since the latter is the only place that this information is used? It would also make the function signature of validatePodCliqueTemplateSpec cleaner if it is.

@renormalize renormalize merged commit 37f2aa1 into ai-dynamo:main Jul 15, 2025
4 checks passed
@renormalize renormalize removed their assignment Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants