Skip to content

feat(bundler): add pre-flight checks to deploy.sh and post-flight to undeploy.sh#364

Open
yuanchen8911 wants to merge 1 commit intoNVIDIA:mainfrom
yuanchen8911:feat/deploy-preflight-checks
Open

feat(bundler): add pre-flight checks to deploy.sh and post-flight to undeploy.sh#364
yuanchen8911 wants to merge 1 commit intoNVIDIA:mainfrom
yuanchen8911:feat/deploy-preflight-checks

Conversation

@yuanchen8911
Copy link
Contributor

Summary

Add pre-flight safety checks to deploy.sh and post-flight verification to undeploy.sh to prevent silent deployment failures caused by stale cluster-scoped resources from a previous install/uninstall cycle.

Problem

When undeploy.sh or manual helm uninstall leaves behind stale webhooks, terminating namespaces, or unavailable API services, a subsequent deploy.sh fails silently:

  • Stale mutating webhooks (e.g., KAI scheduler admission) block all pod creation
  • Terminating namespaces prevent Helm from creating releases
  • Stale API services (e.g., custom.metrics.k8s.io from prometheus-adapter) block namespace deletion

Changes

deploy.sh pre-flight checks

  • Detects terminating namespaces that overlap with bundle components
  • Detects stale mutating/validating webhooks whose backing services no longer exist
  • Detects unavailable API services
  • Aborts with actionable fix instructions

undeploy.sh post-flight verification

  • Warns about remaining terminating namespaces
  • Warns about stale webhooks missed by orphan cleanup
  • Warns about unavailable API services
  • Reports clean/dirty state

Test plan

  • Build succeeds with updated templates
  • Generated deploy.sh contains pre-flight section
  • Generated undeploy.sh contains post-flight section
  • Pre-flight checks require jq for webhook inspection (gracefully skipped if missing)

@yuanchen8911 yuanchen8911 requested a review from a team as a code owner March 12, 2026 01:11
@yuanchen8911 yuanchen8911 added enhancement New feature or request area/recipes labels Mar 12, 2026
@yuanchen8911 yuanchen8911 force-pushed the feat/deploy-preflight-checks branch from 6676a4a to 5d75b1b Compare March 12, 2026 01:16
…undeploy.sh

deploy.sh now runs pre-flight checks before installing components:
- Detects terminating namespaces that overlap with bundle components
- Detects stale mutating/validating webhooks whose backing services
  no longer exist (blocks pod creation with fail-closed webhooks)
- Detects unavailable API services (blocks namespace deletion)
- Aborts with actionable fix instructions if issues are found

undeploy.sh now runs post-flight verification after cleanup:
- Warns about remaining terminating namespaces
- Warns about stale webhooks missed by the orphan cleanup
- Warns about unavailable API services
- Reports clean/dirty state for subsequent deploy.sh

These checks prevent silent deployment failures caused by stale
cluster-scoped resources from a previous install/uninstall cycle.
@yuanchen8911 yuanchen8911 force-pushed the feat/deploy-preflight-checks branch from 5d75b1b to 3edb19d Compare March 12, 2026 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant