Skip to content

fix(8.10): back Web Modeler restapi /tmp with a per-pod ephemeral volume#6406

Merged
eamonnmoloney merged 5 commits into
mainfrom
feat/4767-persistence-scenario-8.10
Jun 26, 2026
Merged

fix(8.10): back Web Modeler restapi /tmp with a per-pod ephemeral volume#6406
eamonnmoloney merged 5 commits into
mainfrom
feat/4767-persistence-scenario-8.10

Conversation

@eamonnmoloney

@eamonnmoloney eamonnmoloney commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Backs the Web Modeler restapi /tmp volume with a per-pod generic ephemeral volume instead of a shared chart-managed ReadWriteOnce PVC, and adds the component-persistence (cprst) CI scenario that exercises the persistence code paths.

Why

The restapi persistence volume backs only the container's /tmp scratch directory — pod-local and disposable (Web Modeler's durable state lives in PostgreSQL, the document store, and Elasticsearch). Backing disposable per-pod scratch with a shared, node-locked RWO PVC was the wrong primitive and the root cause of two reported failures:

  • chart-managed PVC claimName casing mismatch → restapi stuck Pending on install (SUPPORT-30069; reproduced live by cprst);
  • RWO single-attach + RollingUpdate surge → Multi-Attach deadlock on helm upgrade.

What changed

  • deployment-restapi.yaml: chart-managed persistence path now renders an ephemeral.volumeClaimTemplate (per-pod PVC created/destroyed with the pod), carrying the same size/accessModes/storageClassName/selector/annotations. emptyDir default and existingClaim paths unchanged.
  • Removed the standalone persistentvolumeclaim-restapi.yaml (no shared PVC anymore).
  • Added the component-persistence (cprst, tier 2) scenario + registry snapshot regen.
  • Unit tests updated to assert the ephemeral source; dropped the now-dead shared-PVC manifest test.

This removes the shared single-attach volume entirely, so the install Pending, the upgrade Multi-Attach deadlock, and any deploymentStrategy: Recreate / pre-upgrade workaround all disappear.

⚠️ Minimum platform versions

Generic ephemeral volumes are GA since Kubernetes 1.23 / OpenShift 4.10 (alpha 1.19, beta-on-by-default 1.21) — well within Camunda's supported matrix. No new storage requirement: a generic ephemeral volume provisions a normal PVC via the same StorageClass / dynamic-provisioning path the old shared PVC used (it is not a CSI inline volume). Any cluster that could provision the old <release>-webmodeler-data PVC can provision the per-pod ephemeral PVC.

Notes

Test plan

  • make go.test chartPath=charts/camunda-platform-8.10
  • make helm.lint chartPath=charts/camunda-platform-8.10
  • nightly cprst (install) green on 8.10

Review follow-ups (crev)

  • ADR 0092 is superseded for Web Modeler by this change. The guard message, values.yaml @param, and generated schema/README/extra-schema have been narrowed so Recreate is described only for the existingClaim-with-RWO case. The ADR amendment + possible knob removal are tracked in chore: amend ADR 0092 and retire webModeler.persistence.deploymentStrategy after ephemeral-volume switch #6409.
  • Upgrade/migration note: operators who previously set webModeler.persistence.enabled=true may have a <release>-webmodeler-data PVC. On GKE's default WaitForFirstConsumer storage class the mis-cased PVC was never consumed, so no disk was provisioned (at most a dangling Pending PVC object, not billed storage). helm upgrade removes the now-unreferenced template PVC; if your provisioner retained one, delete it: kubectl delete pvc <release>-webmodeler-data -n <namespace>.
  • Scope: the persistence.yaml feature file also enables Optimize persistence and orchestration extraVolumeClaimTemplates (SUPPORT-30042 / -30094). This is intentional nightly CI coverage of those known regressions under the cprst umbrella; no Optimize/orchestration template changes are part of this PR.

@github-actions github-actions Bot added version/8.10 Camunda applications/cycle version component/web-modeler labels Jun 18, 2026
@eamonnmoloney eamonnmoloney force-pushed the feat/4767-persistence-scenario-8.10 branch from 629cdf2 to ef0450d Compare June 18, 2026 12:10
@eamonnmoloney eamonnmoloney marked this pull request as ready for review June 18, 2026 12:12
@eamonnmoloney eamonnmoloney requested a review from a team as a code owner June 18, 2026 12:12
@eamonnmoloney eamonnmoloney requested review from Ian-wang-liyang and Copilot and removed request for a team and Copilot June 18, 2026 12:12
@eamonnmoloney eamonnmoloney force-pushed the feat/4767-persistence-scenario-8.10 branch from ef0450d to 858aac4 Compare June 18, 2026 12:36
Copilot AI review requested due to automatic review settings June 18, 2026 12:36
@eamonnmoloney eamonnmoloney force-pushed the feat/4767-persistence-scenario-8.10 branch from 858aac4 to 7a7e0e6 Compare June 18, 2026 12:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Camunda Platform Helm chart (8.10) Web Modeler restapi /tmp persistence to use a per-pod generic ephemeral volume (via ephemeral.volumeClaimTemplate) instead of a shared chart-managed PVC, and introduces a new CI scenario intended to exercise component persistence paths.

Changes:

  • Switch Web Modeler restapi /tmp from a shared PVC to a per-pod generic ephemeral volume when chart-managed persistence is enabled.
  • Remove the now-unused Web Modeler restapi standalone PVC template.
  • Add a tier-2 component-persistence (cprst) CI scenario plus corresponding registry snapshot updates, and adjust unit tests/docs to reflect the new behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
charts/camunda-platform-8.10/values.yaml Updates documentation for webModeler.persistence.deploymentStrategy given the new ephemeral default.
charts/camunda-platform-8.10/values.schema.json Keeps schema descriptions in sync with updated deploymentStrategy semantics.
charts/camunda-platform-8.10/templates/web-modeler/deployment-restapi.yaml Implements the ephemeral volumeClaimTemplate for /tmp when persistence is chart-managed.
charts/camunda-platform-8.10/templates/web-modeler/persistentvolumeclaim-restapi.yaml Removes the shared chart-managed PVC manifest (no longer needed).
charts/camunda-platform-8.10/test/unit/web-modeler/persistence_test.go Updates unit assertions to validate the new ephemeral volume source and removes the old PVC-manifest test.
charts/camunda-platform-8.10/test/integration/scenarios/chart-full-setup/values/features/persistence.yaml Adds a feature layer to exercise persistence-related values in nightly CI.
charts/camunda-platform-8.10/test/ci/registry/scenarios/component-persistence.yaml Adds the new component-persistence scenario definition.
charts/camunda-platform-8.10/test/ci/registry/manifest.yaml Registers the new scenario (cprst) in the registry manifest.
charts/camunda-platform-8.10/test/ci/registry-snapshot.yaml Regenerates the registry snapshot to include the new scenario.
charts/camunda-platform-8.10/README.md Updates generated chart parameter docs for webModeler.persistence.deploymentStrategy.

Comment on lines +194 to +197
{{- with (or .Values.camundaHub.webModeler.persistence.annotations .Values.webModeler.persistence.annotations) }}
metadata:
annotations: {{- toYaml . | nindent 18 }}
{{- end }}
Comment on lines +3 to +4
flows:
- install,upgrade-minor

@Ian-wang-liyang Ian-wang-liyang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eamonnmoloney eamonnmoloney force-pushed the feat/4767-persistence-scenario-8.10 branch from 202a018 to c206ee4 Compare June 19, 2026 09:29
@eamonnmoloney eamonnmoloney enabled auto-merge June 19, 2026 09:31
@eamonnmoloney eamonnmoloney added this pull request to the merge queue Jun 19, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 19, 2026
@eamonnmoloney eamonnmoloney enabled auto-merge June 19, 2026 10:59
@eamonnmoloney eamonnmoloney added this pull request to the merge queue Jun 19, 2026
@eamonnmoloney eamonnmoloney removed this pull request from the merge queue due to a manual request Jun 19, 2026
eamonnmoloney and others added 3 commits June 26, 2026 05:20
The restapi persistence PVC backs only the pod-local /tmp scratch directory.
Backing disposable per-pod scratch with a shared chart-managed ReadWriteOnce
PVC was the root cause of the install Pending (claimName casing mismatch,
SUPPORT-30069) and the helm-upgrade Multi-Attach deadlock.

Switch the chart-managed path to a generic ephemeral volume (per-pod PVC),
remove the standalone persistentvolumeclaim-restapi.yaml, and add the
component-persistence (cprst) CI scenario that exercises it. emptyDir
default and existingClaim paths are unchanged.

Requires Kubernetes 1.23+ / OpenShift 4.10+ (generic ephemeral volumes GA).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The cprst scenario blocked the merge queue because the persistence feature
file also enabled Optimize persistence (SUPPORT-30042) and orchestration
extraVolumeClaimTemplates (SUPPORT-30094) — known-broken paths that this PR
does not fix. The Optimize migration init container hangs on PodInitializing,
failing cprst for reasons unrelated to the Web Modeler restapi /tmp fix.

Narrow the feature to webModeler.persistence only so cprst exercises the path
this PR actually changes. Optimize and orchestration coverage will be
re-enabled in the corresponding template-fix PRs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eamonnmoloney eamonnmoloney force-pushed the feat/4767-persistence-scenario-8.10 branch from d9547f1 to f5440cd Compare June 26, 2026 04:20
@eamonnmoloney eamonnmoloney enabled auto-merge June 26, 2026 04:20
@eamonnmoloney eamonnmoloney added this pull request to the merge queue Jun 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 26, 2026
…sts gate

gh run rerun --failed skips cancelled jobs, causing merge queue reruns
to silently miss jobs that were cancelled when others failed first.
Drop --failed so the full workflow reruns on retry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eamonnmoloney eamonnmoloney enabled auto-merge June 26, 2026 07:47
@eamonnmoloney eamonnmoloney added this pull request to the merge queue Jun 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 26, 2026
When gh run rerun returns \"already running\" it means another agent
(human or the API's own retry) has already started the next attempt.
Previously the gate exhausted all retry slots and exited with error,
failing the merge queue check even though the run ultimately succeeded.

Detect the error string, return ErrRerunAlreadyRunning immediately,
and treat it in Run() as a signal to proceed directly to watching the
next attempt rather than a fatal error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eamonnmoloney eamonnmoloney enabled auto-merge June 26, 2026 09:00
@eamonnmoloney eamonnmoloney added this pull request to the merge queue Jun 26, 2026
Merged via the queue into main with commit c7f2916 Jun 26, 2026
271 checks passed
@eamonnmoloney eamonnmoloney deleted the feat/4767-persistence-scenario-8.10 branch June 26, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants