Skip to content

design: Kubernetes name length enforcement for Velero-created objects (#8815)#9770

Open
kaovilai wants to merge 1 commit into
velero-io:mainfrom
kaovilai:design/8815-name-length-enforcement
Open

design: Kubernetes name length enforcement for Velero-created objects (#8815)#9770
kaovilai wants to merge 1 commit into
velero-io:mainfrom
kaovilai:design/8815-name-length-enforcement

Conversation

@kaovilai

@kaovilai kaovilai commented May 1, 2026

Copy link
Copy Markdown
Member

Summary

  • Adds design document for issue #8815: Velero constructs Kubernetes object names from user-controlled strings without enforcing length limits, causing silent failures when names exceed 253 characters (DNS subdomain) or 63 characters (label values).
  • Audit found 12 name-length bugs across 5 categories (GenerateName prefix, deterministic Name, derived name+suffix, label value, label selector mismatch) and 3 GenerateName sites missing the CreateRetryGenerateName collision-retry wrapper.
  • Design proposes two new helper functions (GetValidGenerateName, GetValidObjectName) that delegate to a shared private getValidNameWithMaxLen, refactoring the existing GetValidName to use the same helper. All affected call sites are enumerated with before/after.

Design highlights

  • No behavior change for names within current limits — helpers return input unchanged.
  • SHA-256 hash suffix on truncation (same strategy as existing GetValidName) preserves uniqueness for distinct long names.
  • Retry consistency (Category F): switches VolumeSnapshot, DataUpload, and DataDownload CSI creation sites from bare crClient.Create to veleroclient.CreateRetryGenerateName, aligning with KEP 4420 intent.
  • Compatibility: no existing objects are affected — Kubernetes itself rejects names > 253 characters at admission, so any deployment hitting these bugs today gets a hard failure with no persisted object.

Test plan

  • Review design document at design/8815-kubernetes-name-length-enforcement_design.md
  • Verify all 12 name-length locations and 3 retry locations are correctly identified
  • Confirm compatibility analysis for BackupRepository, cache PVC, and label selector fix
  • Approve design before implementation begins

Note

Responses generated with Claude

…-io#8815)

Audit found 12 name-length bugs across 5 categories and 3 GenerateName
sites missing CreateRetryGenerateName. Design covers new helper functions
GetValidGenerateName and GetValidObjectName, all fix sites, compatibility
analysis, and alignment with KEP 4420 retry behavior.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
@github-actions github-actions Bot added the Area/Design Design Documents label May 1, 2026
@kaovilai kaovilai added the kind/changelog-not-required PR does not require a user changelog. Often for docs, website, or build changes label May 1, 2026
@kaovilai kaovilai marked this pull request as ready for review May 1, 2026 06:22
Copilot AI review requested due to automatic review settings May 1, 2026 06:22
@github-actions github-actions Bot requested review from reasonerjt and ywk253100 May 1, 2026 06:22
@codecov

codecov Bot commented May 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a design document proposing a consistent approach for preventing Velero-created Kubernetes objects (and labels/selectors) from exceeding Kubernetes name/label length limits, including refactoring/centralizing name truncation logic and enumerating affected call sites.

Changes:

  • Introduces a design doc for issue #8815 describing name-length enforcement across multiple object/label patterns.
  • Proposes new helper APIs (GetValidGenerateName, GetValidObjectName) built on a shared implementation alongside GetValidName.
  • Catalogs impacted call sites (name-length fixes + CreateRetryGenerateName consistency) and outlines an implementation/test plan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +228 to +230
Any Velero deployment that encountered these bugs received a Kubernetes API error at object creation time and the backup or restore operation failed.
No such object was ever persisted in etcd because Kubernetes itself enforces name limits at admission.
There are therefore no existing objects to migrate.

Copilot AI May 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Objects that previously failed to create” compatibility claim is too broad: the Category E RestoreNameLabel selector mismatch can occur even when object creation succeeds (label value is hashed to ≤63), leaving persisted objects that are just not found by the current selector. Consider narrowing this section to only the admission-rejected cases (name/label values exceeding max length) and explicitly calling out that Category E is a lookup bug against existing objects rather than a creation failure.

Suggested change
Any Velero deployment that encountered these bugs received a Kubernetes API error at object creation time and the backup or restore operation failed.
No such object was ever persisted in etcd because Kubernetes itself enforces name limits at admission.
There are therefore no existing objects to migrate.
For the admission-rejected cases in this design—object names, `GenerateName` prefixes, or raw label values that exceed Kubernetes maximum lengths—Velero receives a Kubernetes API error at object creation time and the backup or restore operation fails.
In those cases, no such object is ever persisted in etcd because Kubernetes enforces these limits at admission, so there are no admission-rejected objects to migrate.
Category E is different: it is a lookup bug against existing objects, where creation can succeed because the persisted `RestoreNameLabel` value is already bounded (for example, hashed to fit within 63 characters) but the current selector does not match that stored value.

Copilot uses AI. Check for mistakes.
@blackpiglet blackpiglet requested review from blackpiglet and removed request for reasonerjt and ywk253100 May 6, 2026 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area/Design Design Documents kind/changelog-not-required PR does not require a user changelog. Often for docs, website, or build changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design: ensure object creation does not exceed Kubernetes maximum name length.

2 participants