Skip to content

build: deploy companion charts in parallel#6428

Merged
Ian-wang-liyang merged 3 commits into
mainfrom
6252-parallel-companion-charts
Jun 24, 2026
Merged

build: deploy companion charts in parallel#6428
Ian-wang-liyang merged 3 commits into
mainfrom
6252-parallel-companion-charts

Conversation

@Ian-wang-liyang

Copy link
Copy Markdown
Contributor

Which problem does the PR fix?

Fixes #6252.

CI scenarios with multiple companion charts (e.g. 8.10 eske = Elasticsearch + Keycloak) deployed companions serially, each with --wait, so total pre-deploy time was the sum of all companion install times (~6 min) before the main Camunda chart even started.

What's in this PR?

Companions are independent of each other (only the main chart depends on them), so they now install concurrently via errgroup:

  • New deployCompanionCharts helper fans out one goroutine per companion. g.Wait() is a barrier that keeps all companions ordered before the post-infra hooks and the main chart — preserving the maintainer-required lifecycle (post-infra runs after companions are ready, e.g. Bitnami 8.9→8.10 migration scripts). Loop position in deployer.go is unchanged, so the ordering constraint holds.
  • On the first companion failure, the shared context is cancelled, terminating in-flight siblings (verified: helm.RunCaptureStderrexecutil.RunCommandCaptureStderr uses exec.CommandContext).
  • helm repo add/update rewrite the shared repositories.yaml; a mutex guards them so concurrent companions never race on that file. No current scenario pairs two remote-repo companions (8.10 eske: Keycloak is a local chart, only Elasticsearch is remote), so contention is effectively zero today — the guard protects future ones.

Expected impact: pre-deploy time drops from sum → max (~2 min saved per multi-companion scenario, across the 8.10 identity: keycloak matrix).

Tests added in helm_test.go: parallel overlap, error-cancels-siblings, single-companion preserved, repo-registration serialized (all pass under -race).

Files changed are Go tooling under scripts/ only (no user-facing chart files), hence build: per the CI PR-title rule.

Checklist

Please make sure to follow our Contributing Guide.

Before opening the PR:

  • In the repo's root dir, run make go.update-golden-only.
  • There is no other open pull request for the same update/change.
  • Tests for charts are added (if needed).
  • In-repo documentation are updated (if needed).

After opening the PR:

  • Did you sign our CLA (Contributor License Agreement)? It will show once you open the PR.
  • Did all checks/tests pass in the PR?

🤖 Generated with Claude Code

@Ian-wang-liyang Ian-wang-liyang marked this pull request as ready for review June 22, 2026 19:44
@Ian-wang-liyang Ian-wang-liyang requested a review from a team as a code owner June 22, 2026 19:44
@Ian-wang-liyang Ian-wang-liyang requested review from bkenez and Copilot and removed request for a team June 22, 2026 19:44

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves CI deploy performance by installing companion Helm charts concurrently (instead of serially) in the camunda-deployer tooling under scripts/, reducing pre-deploy wall time for scenarios with multiple companions.

Changes:

  • Deploy companion charts in parallel using errgroup, preserving the existing lifecycle barrier before post-infra hooks and the main chart deploy.
  • Serialize helm repo add/update across concurrent companions with a mutex to avoid races on repositories.yaml.
  • Add unit tests validating parallel overlap, sibling cancellation on error, single-companion behavior, and repo op serialization; bump golang.org/x/sync to support errgroup.

Reviewed changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/camunda-deployer/pkg/deployer/helm.go Adds deployCompanionCharts (errgroup fan-out) and mutex-guarded repo add/update in companion deploy path.
scripts/camunda-deployer/pkg/deployer/deployer.go Switches Deploy() from serial companion loop to the new parallel helper while keeping ordering before post-infra hooks.
scripts/camunda-deployer/pkg/deployer/helm_test.go Adds unit tests covering parallelism, cancellation, and repo serialization.
scripts/camunda-deployer/go.mod Adds direct dependency on golang.org/x/sync for errgroup.
scripts/camunda-deployer/go.sum Records golang.org/x/sync v0.21.0 checksums.
scripts/deploy-camunda/go.mod Bumps indirect golang.org/x/sync version.
scripts/deploy-camunda/go.sum Updates golang.org/x/sync checksums to match the new version.

Comment on lines +215 to +218
// companionRepoMu serializes helm repo add/update across concurrent companion
// deployments — those commands rewrite the shared repositories.yaml, so
// concurrent writers would race. No current scenario pairs two remote-repo
// companions, so contention is effectively zero; this guards future ones.
Comment on lines +221 to +227
// deployCompanionCharts deploys all configured companion charts concurrently as
// separate Helm releases in the same namespace. Companions are independent of
// each other (only the main chart depends on them), so they install in
// parallel. The call blocks until every companion is ready; on the first
// failure the shared context is cancelled, terminating in-flight siblings, and
// the first error is returned. A single companion behaves identically to a
// serial deploy.
Comment on lines +117 to +121
// Deploy companion charts (e.g., OpenSearch) as separate Helm releases in
// the same namespace, concurrently — they are independent of each other and
// each is deployed with --wait. deployCompanionCharts blocks until all are
// ready, which keeps them ordered before the post-infra hooks and the main
// Camunda chart below.
@Ian-wang-liyang Ian-wang-liyang requested review from eamonnmoloney and removed request for bkenez June 22, 2026 20:35

@eamonnmoloney eamonnmoloney left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crev review

PR #6428 parallelizes companion chart deployment using errgroup with a mutex guard for helm repo operations — a well-structured change with four new tests. One P1: companionRepoMu.Unlock() is a plain call rather than defer, leaving a panic-induced permanent deadlock path; fix with defer companionRepoMu.Unlock() immediately after Lock(). Two P2: the serialization test's repoUpdate stub is an instant no-op (leaving the full {helmRepoAdd + helmRepoUpdate} atomicity claim untested), and the golang.org/x/sync bump in the separate deploy-camunda module appears incidental and not required by this change.

Specialists run: correctness, adr-conformance, devils-advocate, verifier, escalation-assessor. Devil's-advocate hypotheses: 9 raised, 1 promoted.

Hypotheses by stance: adversarial-input=3 author-blind-spot=2 missing-case=3 scope-discipline=1 · by disposition: covered_by_specialist=2 dropped_low_severity=2 dropped_rubric=2 dropped_ungroundable=2 promoted=1


Escalation: AI review sufficient (score: 0.17, threshold: 0.50). Scripts-only Go change with no chart templates, values.yaml, security surfaces, or golden files touched — all hard escalation rules are clean. Dominant upward signals are first-time contributor (author_familiarity_inverse=1.0) and moderate novelty from introducing an errgroup-based concurrency primitive into CI tooling (novelty_score=0.6). The P1 mutex finding is a scoped correctness issue, not a rule violation. Composite score 0.173 is well below the 0.5 threshold.

}
return nil
}()
companionRepoMu.Unlock()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 · companionRepoMu.Unlock() is not deferred — panic in the locked section permanently deadlocks (via correctness)

The mutex is locked at line 247 with companionRepoMu.Lock() and unlocked with a plain companionRepoMu.Unlock() call at line 257, after the immediately-invoked function expression (IIFE) returns. This pattern is unsafe: if helmRepoAdd or helmRepoUpdate panics inside the IIFE, the anonymous function exits via panic rather than a normal return, and companionRepoMu.Unlock() at line 257 is never reached. The mutex remains permanently locked. Every subsequent goroutine that calls deployCompanionChart with a non-empty RepoName will block forever in companionRepoMu.Lock(), causing deployCompanionCharts (and the entire Deploy call) to hang until the parent context times out.

Go panics in os/exec-based code are rare but not impossible: nil function variables (e.g. if helmRepoAddFn is accidentally nil-set in a test or misconfigured binary), runtime out-of-memory, or stack overflow in recursive calls can all produce panics. The idiomatic Go pattern for any mutex usage is unconditional defer.

Recommendation: Replace the plain unlock with a deferred unlock:

companionRepoMu.Lock()
defer companionRepoMu.Unlock()
err := func() error {
    if err := helmRepoAdd(ctx, cc.RepoName, cc.RepoURL); err != nil {
        return fmt.Errorf("companion chart %q: repo add failed: %w", cc.ReleaseName, err)
    }
    if err := helmRepoUpdate(ctx); err != nil {
        return fmt.Errorf("companion chart %q: repo update failed: %w", cc.ReleaseName, err)
    }
    return nil
}()
if err != nil {
    return err
}

The IIFE is no longer needed once defer handles the unlock — the block can be flattened.

companionRepoMu.Unlock()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 3b41e6d.

One adjustment to the suggested placement: I moved Lock() + defer Unlock() inside the IIFE rather than at the deployCompanionChart scope. A function-level defer would hold the mutex through the subsequent helm upgrade --install --wait (lines 263+), serializing every companion deploy and defeating the parallelism this PR adds. Scoping the defer to the IIFE releases the lock the instant repo registration returns — panic-safe and still concurrent.

Verified go build ./... + go test ./pkg/deployer/... -race (all companion/repo tests pass).

@eamonnmoloney eamonnmoloney added the ai-review-sufficient Label for the crev tool. Indicates crev tool is confident for ai-review to be sufficient. label Jun 23, 2026

@eamonnmoloney eamonnmoloney left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ian-wang-liyang Ian-wang-liyang force-pushed the 6252-parallel-companion-charts branch from 3b41e6d to 154e9ac Compare June 23, 2026 17:09
@Ian-wang-liyang Ian-wang-liyang added this pull request to the merge queue Jun 24, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 24, 2026
Ian-wang-liyang and others added 3 commits June 24, 2026 10:47
CI scenarios with multiple companion charts (e.g. 8.10 eske =
Elasticsearch + Keycloak) deployed companions serially, each with --wait,
so total pre-deploy time was the sum of all companion install times before
the main Camunda chart started. Companions are independent of each other
(only the main chart depends on them), so they can install concurrently.

Fan out companion deploys via errgroup. g.Wait() acts as a barrier that
keeps all companions ordered before the post-infra hooks and the main
chart, preserving the maintainer-required lifecycle (post-infra runs after
companions are ready, e.g. Bitnami 8.9->8.10 migration). On the first
failure the shared context is cancelled, terminating in-flight siblings.

helm repo add/update rewrite the shared repositories.yaml; guard them with
a mutex so concurrent companions never race on that file. No current
scenario pairs two remote-repo companions, so contention is effectively
zero today; the guard protects future ones.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove rationale sentences from companionRepoMu, deployCompanionCharts,
and the Deploy() call site. Keep only the non-obvious behavioral
constraints (mutex protects shared file writes; barrier ordering).
Rationale lives in the PR body and commit message per AGENTS.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move companionRepoMu.Lock()/Unlock() inside the IIFE so the unlock is
deferred and runs even if helmRepoAdd/helmRepoUpdate panic, while keeping
the lock scoped to repo registration only — the long-running helm
upgrade --install --wait still runs unlocked, preserving parallelism.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Ian-wang-liyang Ian-wang-liyang force-pushed the 6252-parallel-companion-charts branch from 154e9ac to 4f5c55a Compare June 24, 2026 14:47
@Ian-wang-liyang Ian-wang-liyang added this pull request to the merge queue Jun 24, 2026
Merged via the queue into main with commit 77f3994 Jun 24, 2026
266 checks passed
@Ian-wang-liyang Ian-wang-liyang deleted the 6252-parallel-companion-charts branch June 24, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-review-sufficient Label for the crev tool. Indicates crev tool is confident for ai-review to be sufficient. tool/script

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: deploy companion charts in parallel instead of serially

3 participants