fix(8.10): back Web Modeler restapi /tmp with a per-pod ephemeral volume#6406
Merged
Conversation
3 tasks
629cdf2 to
ef0450d
Compare
3 tasks
ef0450d to
858aac4
Compare
858aac4 to
7a7e0e6
Compare
3 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the Camunda Platform Helm chart (8.10) Web Modeler restapi /tmp persistence to use a per-pod generic ephemeral volume (via ephemeral.volumeClaimTemplate) instead of a shared chart-managed PVC, and introduces a new CI scenario intended to exercise component persistence paths.
Changes:
- Switch Web Modeler
restapi/tmpfrom a shared PVC to a per-pod generic ephemeral volume when chart-managed persistence is enabled. - Remove the now-unused Web Modeler
restapistandalone PVC template. - Add a tier-2
component-persistence(cprst) CI scenario plus corresponding registry snapshot updates, and adjust unit tests/docs to reflect the new behavior.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| charts/camunda-platform-8.10/values.yaml | Updates documentation for webModeler.persistence.deploymentStrategy given the new ephemeral default. |
| charts/camunda-platform-8.10/values.schema.json | Keeps schema descriptions in sync with updated deploymentStrategy semantics. |
| charts/camunda-platform-8.10/templates/web-modeler/deployment-restapi.yaml | Implements the ephemeral volumeClaimTemplate for /tmp when persistence is chart-managed. |
| charts/camunda-platform-8.10/templates/web-modeler/persistentvolumeclaim-restapi.yaml | Removes the shared chart-managed PVC manifest (no longer needed). |
| charts/camunda-platform-8.10/test/unit/web-modeler/persistence_test.go | Updates unit assertions to validate the new ephemeral volume source and removes the old PVC-manifest test. |
| charts/camunda-platform-8.10/test/integration/scenarios/chart-full-setup/values/features/persistence.yaml | Adds a feature layer to exercise persistence-related values in nightly CI. |
| charts/camunda-platform-8.10/test/ci/registry/scenarios/component-persistence.yaml | Adds the new component-persistence scenario definition. |
| charts/camunda-platform-8.10/test/ci/registry/manifest.yaml | Registers the new scenario (cprst) in the registry manifest. |
| charts/camunda-platform-8.10/test/ci/registry-snapshot.yaml | Regenerates the registry snapshot to include the new scenario. |
| charts/camunda-platform-8.10/README.md | Updates generated chart parameter docs for webModeler.persistence.deploymentStrategy. |
Comment on lines
+194
to
+197
| {{- with (or .Values.camundaHub.webModeler.persistence.annotations .Values.webModeler.persistence.annotations) }} | ||
| metadata: | ||
| annotations: {{- toYaml . | nindent 18 }} | ||
| {{- end }} |
Comment on lines
+3
to
+4
| flows: | ||
| - install,upgrade-minor |
202a018 to
c206ee4
Compare
11 tasks
The restapi persistence PVC backs only the pod-local /tmp scratch directory. Backing disposable per-pod scratch with a shared chart-managed ReadWriteOnce PVC was the root cause of the install Pending (claimName casing mismatch, SUPPORT-30069) and the helm-upgrade Multi-Attach deadlock. Switch the chart-managed path to a generic ephemeral volume (per-pod PVC), remove the standalone persistentvolumeclaim-restapi.yaml, and add the component-persistence (cprst) CI scenario that exercises it. emptyDir default and existingClaim paths are unchanged. Requires Kubernetes 1.23+ / OpenShift 4.10+ (generic ephemeral volumes GA). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The cprst scenario blocked the merge queue because the persistence feature file also enabled Optimize persistence (SUPPORT-30042) and orchestration extraVolumeClaimTemplates (SUPPORT-30094) — known-broken paths that this PR does not fix. The Optimize migration init container hangs on PodInitializing, failing cprst for reasons unrelated to the Web Modeler restapi /tmp fix. Narrow the feature to webModeler.persistence only so cprst exercises the path this PR actually changes. Optimize and orchestration coverage will be re-enabled in the corresponding template-fix PRs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
d9547f1 to
f5440cd
Compare
…sts gate gh run rerun --failed skips cancelled jobs, causing merge queue reruns to silently miss jobs that were cancelled when others failed first. Drop --failed so the full workflow reruns on retry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When gh run rerun returns \"already running\" it means another agent (human or the API's own retry) has already started the next attempt. Previously the gate exhausted all retry slots and exited with error, failing the merge queue check even though the run ultimately succeeded. Detect the error string, return ErrRerunAlreadyRunning immediately, and treat it in Run() as a signal to proceed directly to watching the next attempt rather than a fatal error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Backs the Web Modeler
restapi/tmpvolume with a per-pod generic ephemeral volume instead of a shared chart-managedReadWriteOncePVC, and adds thecomponent-persistence(cprst) CI scenario that exercises the persistence code paths.Why
The restapi persistence volume backs only the container's
/tmpscratch directory — pod-local and disposable (Web Modeler's durable state lives in PostgreSQL, the document store, and Elasticsearch). Backing disposable per-pod scratch with a shared, node-locked RWO PVC was the wrong primitive and the root cause of two reported failures:claimNamecasing mismatch →restapistuckPendingon install (SUPPORT-30069; reproduced live bycprst);RollingUpdatesurge →Multi-Attachdeadlock onhelm upgrade.What changed
deployment-restapi.yaml: chart-managed persistence path now renders anephemeral.volumeClaimTemplate(per-pod PVC created/destroyed with the pod), carrying the samesize/accessModes/storageClassName/selector/annotations.emptyDirdefault andexistingClaimpaths unchanged.persistentvolumeclaim-restapi.yaml(no shared PVC anymore).component-persistence(cprst, tier 2) scenario + registry snapshot regen.This removes the shared single-attach volume entirely, so the install
Pending, the upgradeMulti-Attachdeadlock, and anydeploymentStrategy: Recreate/ pre-upgrade workaround all disappear.Generic ephemeral volumes are GA since Kubernetes 1.23 / OpenShift 4.10 (alpha 1.19, beta-on-by-default 1.21) — well within Camunda's supported matrix. No new storage requirement: a generic ephemeral volume provisions a normal PVC via the same StorageClass / dynamic-provisioning path the old shared PVC used (it is not a CSI inline volume). Any cluster that could provision the old
<release>-webmodeler-dataPVC can provision the per-pod ephemeral PVC.Notes
deploymentStrategy: Recreatemechanism (fix(8.8,8.9,8.10): allow Recreate strategy for Web Modeler persistence #6014 / ADR 0092) for Web Modeler. Same anti-pattern exists for Connectors (fix(connectors): expose opt-in Recreate deployment strategy for RWO persistence #6268), Identity (fix(identity): expose opt-in Recreate deployment strategy for RWO persistence #6269), Optimize (fix(optimize): migrate hardcoded Recreate to opt-in deployment strategy pattern #6270), and 8.8 — candidates for the same fix. The now-redundantwebModeler.persistence.deploymentStrategyknob + an ADR 0092 amendment are follow-ups.Test plan
make go.test chartPath=charts/camunda-platform-8.10make helm.lint chartPath=charts/camunda-platform-8.10cprst(install) green on 8.10Review follow-ups (crev)
values.yaml@param, and generated schema/README/extra-schema have been narrowed soRecreateis described only for theexistingClaim-with-RWO case. The ADR amendment + possible knob removal are tracked in chore: amend ADR 0092 and retire webModeler.persistence.deploymentStrategy after ephemeral-volume switch #6409.webModeler.persistence.enabled=truemay have a<release>-webmodeler-dataPVC. On GKE's defaultWaitForFirstConsumerstorage class the mis-cased PVC was never consumed, so no disk was provisioned (at most a danglingPendingPVC object, not billed storage).helm upgraderemoves the now-unreferenced template PVC; if your provisioner retained one, delete it:kubectl delete pvc <release>-webmodeler-data -n <namespace>.persistence.yamlfeature file also enables Optimize persistence and orchestrationextraVolumeClaimTemplates(SUPPORT-30042 / -30094). This is intentional nightly CI coverage of those known regressions under thecprstumbrella; no Optimize/orchestration template changes are part of this PR.