Summary
When multiple release-v* patch releases are triggered close together (e.g. a batched patch-release.yaml / payload bump across all maintained branches), the operator release automation reliably fails. On 2026-06-04, all 5 patch releases (v0.74.2, v0.76.1, v0.77.1, v0.78.2, v0.79.2) failed.
Symptoms
All 5 runs passed precheck → unit-tests → build-test, then failed on the two publish-images tasks:
- 9/10
publish-images-platform-{kubernetes,openshift} tasks → TaskRunTimeout at exactly 60 min
- 1/10 →
InitContainerOOM (injected prepare init container, exit 137 — node-level memory pressure)
Root cause
- No
concurrency_limit on the Repository CR (tektoncd-operator, in plumbing), so PAC runs all matched patch PipelineRuns simultaneously.
- Each run launches 2 multi-arch
ko builds (linux/amd64,arm64,s390x,ppc64le, non-amd64 via qemu emulation) → 10 heavy emulated builds in parallel on a 4-node (~48 vCPU / ~96 GB) cluster → CPU/memory saturation.
- The PipelineRun sets
timeouts.pipeline: 3h but no per-task timeout, so each TaskRun inherits the cluster default default-timeout-minutes: 60 — too short for a starved emulated build. Hence the uniform 1h timeouts.
Proposed fix
- plumbing — add
concurrency_limit to the operator Repository CR (repositories/operator.yaml) to serialize patch releases (e.g. 1 or 2). Tracked separately in plumbing; will cross-link.
- operator — add a per-task
timeouts.tasks (or a taskRunSpecs timeout) for the publish tasks in .tekton/release-patch.yaml (e.g. 90m–2h) so a single build fits within the 3h pipeline budget.
- (optional) stagger the
.github/workflows/patch-release.yaml trigger; add memory requests to the run-kustomize-ko step in tekton/build-publish-images-manifests.yaml so the scheduler spreads builds across nodes.
Evidence
- PipelineRuns
release-patch-{hxmtj,l6d9v,nbjmp,v6b8q,gr2tm} in namespace releases-operator, 2026-06-04 ~13:00–14:00.
- 9/10 publish TaskRuns failed
TaskRunTimeout exactly 60 min after start; 1 failed InitContainerOOM (prepare init container, exit 137).
Notes
This is independent of the separate versionTag CEL bug affecting the minor/initial release path (.tekton/release.yaml), which requires PAC ≥ v0.47.0 for the cel: ... .replace(...) expression to evaluate.
Summary
When multiple
release-v*patch releases are triggered close together (e.g. a batchedpatch-release.yaml/ payload bump across all maintained branches), the operator release automation reliably fails. On 2026-06-04, all 5 patch releases (v0.74.2,v0.76.1,v0.77.1,v0.78.2,v0.79.2) failed.Symptoms
All 5 runs passed
precheck→unit-tests→build-test, then failed on the twopublish-imagestasks:publish-images-platform-{kubernetes,openshift}tasks →TaskRunTimeoutat exactly 60 minInitContainerOOM(injectedprepareinit container, exit 137 — node-level memory pressure)Root cause
concurrency_limiton theRepositoryCR (tektoncd-operator, in plumbing), so PAC runs all matched patch PipelineRuns simultaneously.kobuilds (linux/amd64,arm64,s390x,ppc64le, non-amd64 via qemu emulation) → 10 heavy emulated builds in parallel on a 4-node (~48 vCPU / ~96 GB) cluster → CPU/memory saturation.timeouts.pipeline: 3hbut no per-task timeout, so each TaskRun inherits the cluster defaultdefault-timeout-minutes: 60— too short for a starved emulated build. Hence the uniform 1h timeouts.Proposed fix
concurrency_limitto the operatorRepositoryCR (repositories/operator.yaml) to serialize patch releases (e.g.1or2). Tracked separately in plumbing; will cross-link.timeouts.tasks(or ataskRunSpecstimeout) for the publish tasks in.tekton/release-patch.yaml(e.g. 90m–2h) so a single build fits within the 3h pipeline budget..github/workflows/patch-release.yamltrigger; add memory requests to therun-kustomize-kostep intekton/build-publish-images-manifests.yamlso the scheduler spreads builds across nodes.Evidence
release-patch-{hxmtj,l6d9v,nbjmp,v6b8q,gr2tm}in namespacereleases-operator, 2026-06-04 ~13:00–14:00.TaskRunTimeoutexactly 60 min after start; 1 failedInitContainerOOM(prepareinit container, exit 137).Notes
This is independent of the separate
versionTagCEL bug affecting the minor/initial release path (.tekton/release.yaml), which requires PAC ≥ v0.47.0 for thecel: ... .replace(...)expression to evaluate.