Skip to content

feat: replace Helm umbrella chart with per-component bundle directories#84

Closed
mchmarny wants to merge 6 commits intomainfrom
feature/helm-bundle
Closed

feat: replace Helm umbrella chart with per-component bundle directories#84
mchmarny wants to merge 6 commits intomainfrom
feature/helm-bundle

Conversation

@mchmarny
Copy link
Member

@mchmarny mchmarny commented Feb 9, 2026

Summary

  • Replace single umbrella Helm chart output with per-component directories where each component gets its own namespace, values.yaml, README, and optional manifests
  • Add deploy.sh script for one-command deployment of all components in order
  • Remove Chart.yaml.tmpl template and umbrella chart generation logic
  • Update bundler orchestrator: makeUmbrellaChart()makeHelmBundle(), collectManifestContents()collectComponentManifests()
  • Resolve recipe and output paths to absolute in CLI for robustness against working directory changes
  • Fix bash variable scoping bug in e2e verify_helm_bundle that caused false test failures
  • Update all doc.go files, documentation, and examples to reflect per-component architecture

New output structure

output/
├── README.md
├── deploy.sh
├── recipe.yaml
├── checksums.txt
├── cert-manager/
│   ├── values.yaml
│   └── README.md
├── gpu-operator/
│   ├── values.yaml
│   ├── README.md
│   └── manifests/
│       └── dcgm-exporter.yaml
└── network-operator/
    ├── values.yaml
    └── README.md

Test plan

  • go test -race ./pkg/bundler/... — all pass
  • go test -race ./pkg/cli/... — all pass
  • go test -race ./pkg/oci/... — all pass
  • make lint — 0 issues
  • make test — all pass
  • E2E tests — 109/109 pass
  • KWOK scheduling tests (need separate update for kwok/scripts/validate-scheduling.sh deploy function)
  • Manual: generate bundle and inspect per-component directory structure

Known follow-up work

  • kwok/scripts/validate-scheduling.sh still uses helm dependency update for the old umbrella chart approach and needs to be updated to use deploy.sh or per-component install
  • docs/user/api-reference.md has bundlers query parameter references that may need API-level changes

@mchmarny mchmarny requested review from a team as code owners February 9, 2026 23:31
Copilot AI review requested due to automatic review settings February 9, 2026 23:31
@mchmarny mchmarny self-assigned this Feb 9, 2026
@mchmarny mchmarny added the enhancement New feature or request label Feb 9, 2026

This comment was marked as resolved.

@mchmarny mchmarny force-pushed the feature/helm-bundle branch from a21ebed to b1b9a0e Compare February 9, 2026 23:46
Replace the single umbrella Helm chart output with per-component directories
where each component gets its own namespace, values.yaml, and README. Adds a
deploy.sh script for one-command deployment. This simplifies operations by
allowing independent component lifecycle management.

Output structure:
  README.md, deploy.sh, recipe.yaml, checksums.txt
  <component>/values.yaml, <component>/README.md, <component>/manifests/

- Rewrite Helm generator to emit per-component directories
- Add deploy.sh and component-README templates
- Remove Chart.yaml.tmpl (no longer needed)
- Update bundler orchestrator (makeHelmBundle, collectComponentManifests)
- Resolve recipe/output paths to absolute in CLI for robustness
- Fix e2e variable scoping bug in verify_helm_bundle
- Update all tests, doc.go files, and documentation
@mchmarny mchmarny force-pushed the feature/helm-bundle branch from b1b9a0e to 478e051 Compare February 9, 2026 23:54
Mermaid diagram still referenced "Chart.yaml + values.yaml" output
for the Helm deployer. Updated to reflect per-component layout.
Add safeJoin helper to both Helm and ArgoCD deployers that validates
constructed paths stay within the output directory (filepath.Abs +
filepath.Clean + prefix check). Refactor generateFromTemplate and
writeValuesFile to accept (baseDir, filename) so path sanitization
occurs at each filesystem operation site.

Update tests/e2e/run.sh and kwok/scripts/validate-scheduling.sh to
use per-component bundle structure (deploy.sh, component directories)
instead of stale umbrella chart references (Chart.yaml, helm lint).
@mchmarny mchmarny force-pushed the feature/helm-bundle branch from 9d72592 to bf780b1 Compare February 10, 2026 00:33
@mchmarny mchmarny enabled auto-merge (squash) February 10, 2026 00:36
Per-component Helm installation fails when a chart contains resources
(e.g. ServiceMonitor) whose CRDs are provided by a later component
(kube-prometheus-stack). The --disable-openapi-validation flag skips
server-side schema validation, allowing charts to reference CRDs that
will be installed by subsequent components in the deployment order.
@mchmarny mchmarny marked this pull request as draft February 10, 2026 00:47
auto-merge was automatically disabled February 10, 2026 00:47

Pull request was converted to draft

…heus-stack

Per-component Helm deployment installs cert-manager, gpu-operator, and
kubeflow-trainer before kube-prometheus-stack. These charts fail when
ServiceMonitor is enabled because the CRD (monitoring.coreos.com/v1)
does not exist yet. Disable ServiceMonitor creation in base values for
all three components. Also revert --disable-openapi-validation which
does not address REST mapper resource discovery failures.
KWOK clusters validate pod scheduling, not pod readiness. The --wait
flag causes helm to block until Deployments/DaemonSets report Ready,
which never happens in KWOK since it simulates pod placement without
satisfying readiness probes. Add --no-wait flag to deploy.sh so KWOK
tests can skip readiness checks while real deployments retain --wait.
@mchmarny mchmarny closed this Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants