From d4337db5716d9e65e59d34535f5429cbb84e5c45 Mon Sep 17 00:00:00 2001 From: Jakub Vulgan Date: Fri, 3 Jul 2026 11:58:46 +0200 Subject: [PATCH 1/2] Fix typo in build pipeline git-clone-depth parameter Assisted-by: Claude Opus 4.6 --- .tekton/build-pipeline.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.tekton/build-pipeline.yaml b/.tekton/build-pipeline.yaml index 2698da88f..6d7a7cfdb 100644 --- a/.tekton/build-pipeline.yaml +++ b/.tekton/build-pipeline.yaml @@ -67,7 +67,7 @@ spec: name: prev-packages-ref type: string - default: "1" - descripton: Git clone depth + description: Git clone depth name: git-clone-depth type: string - name: enable-cache-proxy From bb011012fe1e4f28acf2899572177f8decbf2c25 Mon Sep 17 00:00:00 2001 From: Jakub Vulgan Date: Fri, 3 Jul 2026 11:59:19 +0200 Subject: [PATCH 2/2] Add on-push pipeline retry procedure to debug agent Document how to retry failed on-push pipelines for the original commit, using the current pipeline spec (not the archived one) to avoid stale task bundle digests failing EC checks. Always inject the ignore-supersession annotation to prevent snapshot supersession. Assisted-by: Claude Opus 4.6 --- .claude/agents/debug-package.md | 105 +++++++++++++++++++++++++++++++- 1 file changed, 103 insertions(+), 2 deletions(-) diff --git a/.claude/agents/debug-package.md b/.claude/agents/debug-package.md index f5831ec8e..1b6aaae9a 100644 --- a/.claude/agents/debug-package.md +++ b/.claude/agents/debug-package.md @@ -182,9 +182,9 @@ Release `.status.conditions[].type == "Released"` values: **Symptoms**: No snapshot found for the commit SHA. PipelineRun may be missing too (GC'd) or may show a failed status. -**Root cause**: On-push pipeline never ran (Pipelines-as-Code misconfiguration, webhook failure) or it failed (build error, resource limits). Old PipelineRuns get garbage-collected. +**Root cause**: On-push pipeline never ran (Pipelines-as-Code misconfiguration, webhook failure) or it failed (build error, resource limits, transient image pull errors). Old PipelineRuns get garbage-collected. -**Remediation**: Trigger a new build by making a new commit that touches the package file: +**Remediation**: If the original PipelineRun failed due to a transient error (e.g. registry 503, image pull backoff), retry it for the original commit — see [Retrying a Failed On-Push Pipeline](#retrying-a-failed-on-push-pipeline) below. If there was never a PipelineRun at all, trigger a new build by making a new commit that touches the package file: ```bash hack/onboard.sh ``` @@ -268,6 +268,106 @@ EOF jq -r '.[]' pulp_pkgs.json | grep -i "" ``` +## Retrying a Failed On-Push Pipeline + +When an on-push PipelineRun fails due to a transient error (registry 503, image pull backoff, OOM on infra containers), you need to retry it **for the original commit**. The Konflux UI "Rerun" button does NOT work correctly — it re-resolves PaC template variables (`{{revision}}`) against the current HEAD of main, so `identify-packages` diffs the wrong commit pair and builds the wrong (or no) packages. + +### Why "Rerun" uses the wrong commit + +The on-push template (`.tekton/calunga-v2-index-main-push.yaml`) uses: +```yaml +- name: revision + value: '{{revision}}' # resolved from push webhook payload +- name: prev-packages-ref + value: 'HEAD^' # parent of the checked-out commit +``` + +On rerun, PaC resolves `{{revision}}` to the latest commit on main, not the original. Since `HEAD^` is relative to the checked-out revision, the entire diff window shifts. + +### Correct retry procedure + +Extract the params and metadata from the original PipelineRun (from kubearchive if GC'd), combine with the **current** pipeline definition from the repo, and create a new run. + +**Important**: Do NOT reuse the `pipelineSpec` from the archived PipelineRun. It contains old task bundle references (pinned by SHA digest) that may no longer be in the EC trusted task list, causing `required_untrusted_task_found` violations. Always use the current `.tekton/build-pipeline.yaml`. + +```bash +FAILED_PLR="" + +# 1. Fetch the archived PipelineRun +# Try live cluster first, fall back to kubearchive +oc get pipelinerun "$FAILED_PLR" -n calunga-tenant -o json > /tmp/archived-plr.json 2>/dev/null \ + || kubectl ka get pipelineruns.v1.tekton.dev "$FAILED_PLR" -n calunga-tenant -o json \ + | jq '.items[0]' > /tmp/archived-plr.json + +# 2. Verify the commit +jq -r '{ + revision: (.spec.params[] | select(.name == "revision") | .value), + title: .metadata.annotations["pipelinesascode.tekton.dev/sha-title"] +}' /tmp/archived-plr.json + +# 3. Convert the current pipeline definition to JSON +python3 -c " +import yaml, json, sys +with open('.tekton/build-pipeline.yaml') as f: + pipeline = yaml.safe_load(f) +json.dump(pipeline['spec'], sys.stdout) +" > /tmp/current-pipeline-spec.json + +# 4. Build the retry PipelineRun (archived params + current pipeline spec) +jq -n \ + --slurpfile archived /tmp/archived-plr.json \ + --slurpfile pipelineSpec /tmp/current-pipeline-spec.json \ +'{ + apiVersion: $archived[0].apiVersion, + kind: $archived[0].kind, + metadata: { + generateName: "calunga-v2-index-main-on-push-retry-", + namespace: $archived[0].metadata.namespace, + annotations: ( + $archived[0].metadata.annotations | with_entries( + select(.key | test("^(build\\.appstudio|test\\.appstudio|pipelinesascode\\.tekton\\.dev/(cancel-in-progress|max-keep-runs|on-cel-expression|original-prname|repository|sha|sha-title|sha-url|event-type|branch|source-branch|source-repo-url|repo-url|url-org|url-repository|git-provider|installation-id))")) + ) + # Ensure ignore-supersession is set (older PLRs predate this annotation) + + {"test.appstudio.openshift.io/ignore-supersession": "true"} + ), + labels: ( + $archived[0].metadata.labels | with_entries( + select(.key | test("^(appstudio|pipelines\\.appstudio|pipelinesascode\\.tekton\\.dev/(original-prname|repository|sha|event-type|url-org|url-repository|cancel-in-progress))|tekton\\.dev/pipeline")) + ) + ) + }, + spec: { + params: $archived[0].spec.params, + pipelineSpec: $pipelineSpec[0], + taskRunTemplate: $archived[0].spec.taskRunTemplate, + taskRunSpecs: $archived[0].spec.taskRunSpecs, + workspaces: [ + { + name: "git-auth", + secret: { + secretName: "git-auth-dummy" + } + } + ] + } +}' > /tmp/retry-plr.json + +# 5. Apply +oc create -f /tmp/retry-plr.json -n calunga-tenant + +# 6. Verify it started +oc get pipelinerun -n calunga-tenant -l "pipelinesascode.tekton.dev/sha=$(jq -r '.spec.params[] | select(.name == "revision") | .value' /tmp/archived-plr.json)" \ + -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.conditions[0].reason}{"\n"}{end}' +``` + +### Key details + +- **Current pipelineSpec**: The retry uses the pipeline definition from the current `.tekton/build-pipeline.yaml`, NOT the inlined spec from the archived run. Archived specs contain stale task bundle digests that fail EC trusted-task checks. +- **git-auth-dummy**: PaC creates ephemeral `pac-gitauth-*` secrets per run — they're deleted after the run. The `git-auth-dummy` secret (empty credentials) works because the repo is public. +- **Stripped annotations**: The `check-run-id` and `git-auth-secret` PaC annotations are intentionally removed to prevent PaC from trying to update a stale GitHub check or use a deleted secret. The remaining PaC annotations (`sha`, `repository`, `original-prname`, etc.) are kept so the Integration Service can create a Snapshot for the correct commit. +- **ignore-supersession**: The retry always injects `test.appstudio.openshift.io/ignore-supersession: "true"`. Archived PLRs from before commit `adbf127f` won't have this annotation, and without it the new snapshot can get superseded ("Released in newer Snapshot"), requiring a manual Release CR. +- **Alternative**: If you have webhook admin access on the GitHub repo, you can redeliver the original push webhook from Settings > Webhooks > Recent Deliveries. This is simpler but requires elevated access. + ## Bulk Operations ### Find all packages missing from Pulp @@ -374,6 +474,7 @@ done - **Snapshots may have multiple releases.** Always query as an array and iterate: `jq '[.items[] | select(...)]'` then `jq -c '.[]' | while read -r rel`. - **Timed-out releases often succeed eventually.** A release stuck at "Progressing" for 10+ minutes usually finishes — it's just slow, not broken. Check back later. - **PipelineRuns are garbage-collected.** After ~5 days, old PipelineRuns are deleted. The snapshot still exists and references the build, but you can't inspect the PipelineRun directly. +- **Do NOT use the Konflux UI "Rerun" button for on-push pipelines.** It re-resolves `{{revision}}` to the latest commit on main, causing `identify-packages` to diff the wrong commits. Use the manual retry procedure instead. - **Batch onboarding causes "Released in newer Snapshot".** When many packages are committed in quick succession, only the latest snapshot gets auto-released. All earlier snapshots (each containing a unique package build) need manual Release CRs. This has been mitigated by adding `test.appstudio.openshift.io/ignore-supersession: "true"` to the on-push PipelineRun annotation (commit `adbf127f`), but older snapshots from before the fix may still be affected. - **Name normalization is real.** Always check Pulp with both dash and dot variants. Common prefixes: `backports`, `jaraco`, `zope`, `ruamel`. - **Release CR template requires `releasePlan: calunga`.** This references the ReleasePlan CR in the namespace. The `gracePeriodDays: 7` field controls how long the release artifacts are retained.