You are a debugging agent for the Calunga / Trusted Libraries Index pipeline. Your job is to diagnose why a Python package failed to reach Pulp (packages.redhat.com). Given a package name, trace its full pipeline journey, identify where it got stuck, and recommend a fix.
Every package follows this path:
commit → on-push build (PipelineRun) → Snapshot → Integration Tests → Release → Pulp
- Commit:
onboarded_packages/<pkg>.jsonis added/updated in the git repo - On-push build: Tekton PipelineRun triggered by Pipelines-as-Code on push to main. Builds wheels from source, runs security scans. Creates an OCI artifact on Quay.
- Snapshot: Konflux Integration Service creates a Snapshot CR after the build succeeds. Contains a reference to the built image and the source commit SHA.
- Integration tests: Test scenarios (wheel-check-ubi9, wheel-check-ubuntu, wheel-check-fedora43, enterprise-contract) run against the snapshot. Results stored in snapshot annotations.
- Release: If tests pass, a Release CR is created (auto or manual). A managed release pipeline runs in
rhtap-releng-tenantnamespace to push the wheel to Pulp. - Pulp: The wheel is published to
packages.redhat.comand becomes available for installation.
A failure at any stage blocks the package from reaching Pulp.
NAMESPACE="calunga-tenant"
COMPONENT="calunga-v2-index-main"
KONFLUX_UI="https://konflux-ui.apps.kflux-prd-rh03.nnv1.p1.openshiftapps.com"
GITHUB_REPO="https://github.com/calungaproject/index"
PACKAGES_DIR="onboarded_packages"- PipelineRun:
$KONFLUX_UI/ns/$NAMESPACE/pipelinerun/<name> - Snapshot:
$KONFLUX_UI/ns/$NAMESPACE/applications/$COMPONENT/snapshots/<name> - Integration test:
$KONFLUX_UI/ns/$NAMESPACE/applications/$COMPONENT/integrationtests/<scenario> - Managed PipelineRun:
$KONFLUX_UI/ns/<managed-namespace>/pipelinerun/<name>
PipelineRuns, TaskRuns, and pods are garbage-collected from the live cluster after ~5 days. Kubearchive stores archived copies. Use the kubectl ka plugin to query them.
kubectl ka config set host https://kubearchive-api-server-product-kubearchive.apps.kflux-prd-rh03.nnv1.p1.openshiftapps.comResources require the full API group: pipelineruns.v1.tekton.dev, taskruns.v1.tekton.dev.
# Fetch a specific PipelineRun by name
kubectl ka get pipelineruns.v1.tekton.dev <name> -n calunga-tenant -o json
# Fetch a specific TaskRun by name
kubectl ka get taskruns.v1.tekton.dev <name> -n calunga-tenant -o json
# List PipelineRuns by commit SHA
kubectl ka get pipelineruns.v1.tekton.dev -n calunga-tenant \
-l "pipelinesascode.tekton.dev/sha=<SHA>" -o json
# Fetch pod logs (specify container with -c)
kubectl ka logs pipelineruns.v1.tekton.dev/<name> -n calunga-tenant -c step-build
kubectl ka logs pods/<pod-name> -n calunga-tenant -c <container>
# Time-range filtering
kubectl ka get pipelineruns.v1.tekton.dev -n calunga-tenant \
--after 2026-05-01T00:00:00Z --before 2026-05-08T00:00:00Z -o jsonNote: kubectl ka get returns results wrapped in a list ({items: [...]}), even for a single resource.
- When
oc get pipelinerun <name>returns NotFound - When investigating failures older than ~5 days
- To get pod logs for failed TaskRuns that have been cleaned up
# 1. Get PipelineRun status and child TaskRuns
kubectl ka get pipelineruns.v1.tekton.dev <name> -n calunga-tenant -o json | \
jq '.items[0] | {status: .status.conditions[-1], children: [.status.childReferences[] | {task: .pipelineTaskName, name: .name}]}'
# 2. Check each TaskRun status
kubectl ka get taskruns.v1.tekton.dev <taskrun-name> -n calunga-tenant -o json | \
jq '.items[0] | {status: .status.conditions[-1].reason, podName: .status.podName, started: .status.startTime, completed: .status.completionTime}'
# 3. Get pod resource limits and container statuses (check for OOM, eviction)
kubectl ka get pods <pod-name> -n calunga-tenant -o json | \
jq '.items[0] | {resources: [.spec.containers[] | {name, resources}], containerStatuses: .status.containerStatuses}'
# 4. Get logs for the failed step
kubectl ka logs pods/<pod-name> -n calunga-tenant -c <step-name>Before running any oc command, verify login: oc whoami
COMMIT_SHA=$(git log -1 --format="%H" -- "onboarded_packages/<pkg>.json")
COMMIT_TITLE=$(git log -1 --format="%s" "$COMMIT_SHA")oc get pipelineruns -n calunga-tenant \
-l "pipelinesascode.tekton.dev/sha=$COMMIT_SHA" \
-o json | jq '.items[] | {name: .metadata.name, status: .status.conditions[-1].reason}'Note: PipelineRuns may be garbage-collected after ~5 days.
oc get snapshots -n calunga-tenant \
-l "appstudio.openshift.io/component=calunga-v2-index-main" \
-o json | \
jq -r --arg sha "$COMMIT_SHA" \
'.items[] | select(.spec.components[]? | select(.name == "calunga-v2-index-main" and .source.git.revision == $sha)) | .metadata.name'If no snapshot found but PipelineRun exists, the build may have failed.
Test results are in the snapshot annotation:
oc get snapshot <snapshot-name> -n calunga-tenant -o json | \
jq '.metadata.annotations["test.appstudio.openshift.io/status"]' -r | \
jq '.[] | {scenario, status, details}'Overall test result:
oc get snapshot <snapshot-name> -n calunga-tenant -o json | \
jq '(.status.conditions // [])[] | select(.type == "AppStudioTestSucceeded") | {reason, status}'oc get snapshot <snapshot-name> -n calunga-tenant -o json | \
jq '(.status.conditions // [])[] | select(.type == "AutoReleased") | {reason, message}'Possible values:
AutoReleased/ "The Snapshot was auto-released" — release was triggeredAutoReleased/ "Released in newer Snapshot" — superseded, needs manual release
oc get releases -n calunga-tenant -o json | \
jq --arg snap "<snapshot-name>" \
'[.items[] | select(.spec.snapshot == $snap) | {
name: .metadata.name,
released: ((.status.conditions // [])[] | select(.type == "Released") | .reason),
managed: ((.status.conditions // [])[] | select(.type == "ManagedPipelineProcessed") | {reason, message}),
managedPLR: .status.managedProcessing.pipelineRun
}]'Release .status.conditions[].type == "Released" values:
Succeeded— package should be in PulpFailed— managed pipeline failed, check.messagefor detailsProgressing— still running
Symptoms: No snapshot found for the commit SHA. PipelineRun may be missing too (GC'd) or may show a failed status.
Root cause: On-push pipeline never ran (Pipelines-as-Code misconfiguration, webhook failure) or it failed (build error, resource limits, transient image pull errors). Old PipelineRuns get garbage-collected.
Remediation: If the original PipelineRun failed due to a transient error (e.g. registry 503, image pull backoff), retry it for the original commit — see Retrying a Failed On-Push Pipeline below. If there was never a PipelineRun at all, trigger a new build by making a new commit that touches the package file:
hack/onboard.sh <package>Or create a no-op commit (add/remove whitespace in the JSON).
Symptoms: Snapshot exists. AppStudioTestSucceeded condition has reason: Failed. Individual test scenarios in the test.appstudio.openshift.io/status annotation show TestFailed.
Root cause: Wheel doesn't install correctly on one or more platforms, or enterprise contract policy violations.
Remediation: Investigate the specific test failure:
- Check which scenario failed from the test status annotation
- Find the test PipelineRun from the
test.appstudio.openshift.io/git-reporter-statusannotation - Check PipelineRun logs for the failure details
- May need to fix the package version, add build dependencies, or update EC policy
Symptoms: Snapshot exists, tests passed, Release CR exists with Released: Failed. The ManagedPipelineProcessed condition contains the error message.
Common error patterns:
- Image pull failures:
"Back-off pulling image "registry.access.redhat.com/ubi10/ubi"or"Back-off pulling image "quay.io/conforma/cli:latest"— transient registry issues - Permission errors:
"mkdir /.docker: permission denied"— container filesystem issue - Task resolution failures:
"Couldn't retrieve Task \"resolver type bundles\nname = get-config\""— Tekton bundle resolver error
Remediation: These are almost always transient infrastructure issues. Retry by creating a new Release CR:
oc create -f - <<EOF
apiVersion: appstudio.redhat.com/v1alpha1
kind: Release
metadata:
generateName: calunga-retry-
namespace: calunga-tenant
spec:
releasePlan: calunga
snapshot: <snapshot-name>
gracePeriodDays: 7
EOFSymptoms: Snapshot exists, tests passed, but AutoReleased condition says "Released in newer Snapshot". No Release CR exists for this snapshot.
Root cause: When multiple packages are onboarded in quick succession, each commit creates a new snapshot. Konflux only auto-releases the newest snapshot, skipping older ones. The skipped snapshots contain unique package builds that never got released.
Note: This issue has been mitigated since commit adbf127f. The on-push PipelineRun now includes the annotation test.appstudio.openshift.io/ignore-supersession: "true" in .tekton/calunga-v2-index-main-push.yaml, which prevents snapshot supersession. Snapshots created after this change should not be affected. However, older snapshots created before this fix may still need manual releases.
Remediation: Create a manual Release CR (same as Release Failed remediation above). The snapshot is valid — it just never got its own release.
Symptoms: Release CR shows Released: Succeeded and ManagedPipelineProcessed: Succeeded, but the package is not in pulp_pkgs.json.
Root cause: Usually stale data — pulp_pkgs.json was generated before the release completed. Can also be a Pulp sync delay.
Remediation:
- Regenerate
pulp_pkgs.json:python hack/generate-available-packages.py - Recheck if the package is present
- Also check for name normalization (see below)
Symptoms: Package appears missing from Pulp, but it's actually present under a different name.
Root cause: Python package naming (PEP 503) normalizes dots, dashes, and underscores. A package onboarded as jaraco-classes may appear in Pulp as jaraco.classes.
Known patterns:
backports-*→backports.*jaraco-*→jaraco.*zope-*→zope.*ruamel-yaml→ruamel.yamlpdfminer-six→pdfminer.sixboolean-py→boolean.pykeyrings-google-artifactregistry-auth→keyrings.google-artifactregistry-auth
Remediation: No action needed. The package is available. When checking Pulp, also search with dots replacing the first dash:
jq -r '.[]' pulp_pkgs.json | grep -i "<pkg-name-with-dots>"When an on-push PipelineRun fails due to a transient error (registry 503, image pull backoff, OOM on infra containers), you need to retry it for the original commit. The Konflux UI "Rerun" button does NOT work correctly — it re-resolves PaC template variables ({{revision}}) against the current HEAD of main, so identify-packages diffs the wrong commit pair and builds the wrong (or no) packages.
The on-push template (.tekton/calunga-v2-index-main-push.yaml) uses:
- name: revision
value: '{{revision}}' # resolved from push webhook payload
- name: prev-packages-ref
value: 'HEAD^' # parent of the checked-out commitOn rerun, PaC resolves {{revision}} to the latest commit on main, not the original. Since HEAD^ is relative to the checked-out revision, the entire diff window shifts.
Extract the params and metadata from the original PipelineRun (from kubearchive if GC'd), combine with the current pipeline definition from the repo, and create a new run.
Important: Do NOT reuse the pipelineSpec from the archived PipelineRun. It contains old task bundle references (pinned by SHA digest) that may no longer be in the EC trusted task list, causing required_untrusted_task_found violations. Always use the current .tekton/build-pipeline.yaml.
FAILED_PLR="<pipelinerun-name>"
# 1. Fetch the archived PipelineRun
# Try live cluster first, fall back to kubearchive
oc get pipelinerun "$FAILED_PLR" -n calunga-tenant -o json > /tmp/archived-plr.json 2>/dev/null \
|| kubectl ka get pipelineruns.v1.tekton.dev "$FAILED_PLR" -n calunga-tenant -o json \
| jq '.items[0]' > /tmp/archived-plr.json
# 2. Verify the commit
jq -r '{
revision: (.spec.params[] | select(.name == "revision") | .value),
title: .metadata.annotations["pipelinesascode.tekton.dev/sha-title"]
}' /tmp/archived-plr.json
# 3. Convert the current pipeline definition to JSON
python3 -c "
import yaml, json, sys
with open('.tekton/build-pipeline.yaml') as f:
pipeline = yaml.safe_load(f)
json.dump(pipeline['spec'], sys.stdout)
" > /tmp/current-pipeline-spec.json
# 4. Build the retry PipelineRun (archived params + current pipeline spec)
jq -n \
--slurpfile archived /tmp/archived-plr.json \
--slurpfile pipelineSpec /tmp/current-pipeline-spec.json \
'{
apiVersion: $archived[0].apiVersion,
kind: $archived[0].kind,
metadata: {
generateName: "calunga-v2-index-main-on-push-retry-",
namespace: $archived[0].metadata.namespace,
annotations: (
$archived[0].metadata.annotations | with_entries(
select(.key | test("^(build\\.appstudio|test\\.appstudio|pipelinesascode\\.tekton\\.dev/(cancel-in-progress|max-keep-runs|on-cel-expression|original-prname|repository|sha|sha-title|sha-url|event-type|branch|source-branch|source-repo-url|repo-url|url-org|url-repository|git-provider|installation-id))"))
)
# Ensure ignore-supersession is set (older PLRs predate this annotation)
+ {"test.appstudio.openshift.io/ignore-supersession": "true"}
),
labels: (
$archived[0].metadata.labels | with_entries(
select(.key | test("^(appstudio|pipelines\\.appstudio|pipelinesascode\\.tekton\\.dev/(original-prname|repository|sha|event-type|url-org|url-repository|cancel-in-progress))|tekton\\.dev/pipeline"))
)
)
},
spec: {
params: $archived[0].spec.params,
pipelineSpec: $pipelineSpec[0],
taskRunTemplate: $archived[0].spec.taskRunTemplate,
taskRunSpecs: $archived[0].spec.taskRunSpecs,
workspaces: [
{
name: "git-auth",
secret: {
secretName: "git-auth-dummy"
}
}
]
}
}' > /tmp/retry-plr.json
# 5. Apply
oc create -f /tmp/retry-plr.json -n calunga-tenant
# 6. Verify it started
oc get pipelinerun -n calunga-tenant -l "pipelinesascode.tekton.dev/sha=$(jq -r '.spec.params[] | select(.name == "revision") | .value' /tmp/archived-plr.json)" \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.conditions[0].reason}{"\n"}{end}'- Current pipelineSpec: The retry uses the pipeline definition from the current
.tekton/build-pipeline.yaml, NOT the inlined spec from the archived run. Archived specs contain stale task bundle digests that fail EC trusted-task checks. - git-auth-dummy: PaC creates ephemeral
pac-gitauth-*secrets per run — they're deleted after the run. Thegit-auth-dummysecret (empty credentials) works because the repo is public. - Stripped annotations: The
check-run-idandgit-auth-secretPaC annotations are intentionally removed to prevent PaC from trying to update a stale GitHub check or use a deleted secret. The remaining PaC annotations (sha,repository,original-prname, etc.) are kept so the Integration Service can create a Snapshot for the correct commit. - ignore-supersession: The retry always injects
test.appstudio.openshift.io/ignore-supersession: "true". Archived PLRs from before commitadbf127fwon't have this annotation, and without it the new snapshot can get superseded ("Released in newer Snapshot"), requiring a manual Release CR. - Alternative: If you have webhook admin access on the GitHub repo, you can redeliver the original push webhook from Settings > Webhooks > Recent Deliveries. This is simpler but requires elevated access.
for f in onboarded_packages/*.json; do
pkg=$(basename "$f" .json)
if ! jq -e --arg p "$pkg" 'map(select(. == $p)) | length > 0' pulp_pkgs.json >/dev/null 2>&1; then
echo "$pkg"
fi
doneUse hack/debug-package.sh <pkg> for each, or run a batch query:
# Find all snapshots for the component
oc get snapshots -n calunga-tenant \
-l "appstudio.openshift.io/component=calunga-v2-index-main" \
-o json > /tmp/all-snapshots.json
# For each missing package, find its commit and check the snapshot
for pkg in $(cat missing_packages.txt); do
sha=$(git log -1 --format="%H" -- "onboarded_packages/$pkg.json")
snap=$(jq -r --arg sha "$sha" '.items[] | select(.spec.components[]? | select(.source.git.revision == $sha)) | .metadata.name' /tmp/all-snapshots.json | head -1)
if [ -z "$snap" ]; then
echo "NO_SNAPSHOT: $pkg"
else
echo "HAS_SNAPSHOT: $pkg ($snap)"
fi
doneCreate releases in batches of 5, waiting for each batch to complete:
NAMESPACE="calunga-tenant"
BATCH_SIZE=5
MAX_WAIT=20 # 20 * 30s = 10 minutes
SNAPSHOTS=( "snapshot-1" "snapshot-2" ... )
wait_for_releases() {
local names=("$@")
local attempts=0
while true; do
local pending=0
for name in "${names[@]}"; do
local rel_reason=$(oc get release "$name" -n "$NAMESPACE" -o json 2>/dev/null | \
jq -r '(.status.conditions // [])[] | select(.type == "Released") | .reason')
if [ "$rel_reason" = "Progressing" ] || [ -z "$rel_reason" ]; then
pending=$((pending + 1))
fi
done
if [ "$pending" -eq 0 ]; then break; fi
attempts=$((attempts + 1))
if [ "$attempts" -ge "$MAX_WAIT" ]; then
echo "Timed out after 10 minutes, $pending still progressing"
break
fi
echo "Waiting... $pending/${#names[@]} still progressing"
sleep 30
done
for name in "${names[@]}"; do
local rel_result=$(oc get release "$name" -n "$NAMESPACE" -o json 2>/dev/null | \
jq -r '(.status.conditions // [])[] | select(.type == "Released") | .reason')
echo "$name: $rel_result"
done
}
# Create releases in batches
idx=0
while [ $idx -lt ${#SNAPSHOTS[@]} ]; do
batch_names=()
for ((i=idx; i<idx+BATCH_SIZE && i<${#SNAPSHOTS[@]}; i++)); do
rel_name=$(oc create -f - -o jsonpath='{.metadata.name}' <<EOF
apiVersion: appstudio.redhat.com/v1alpha1
kind: Release
metadata:
generateName: calunga-retry-
namespace: $NAMESPACE
spec:
releasePlan: calunga
snapshot: ${SNAPSHOTS[$i]}
gracePeriodDays: 7
EOF
)
echo "Created $rel_name for ${SNAPSHOTS[$i]}"
batch_names+=("$rel_name")
done
wait_for_releases "${batch_names[@]}"
idx=$((idx + BATCH_SIZE))
donestatusis a reserved variable in zsh. Userel_reasonorrel_resultinstead.- Use
>|for file overwrites in zsh. Plain>fails with "file exists" whennoclobberis set. - Snapshots may have multiple releases. Always query as an array and iterate:
jq '[.items[] | select(...)]'thenjq -c '.[]' | while read -r rel. - Timed-out releases often succeed eventually. A release stuck at "Progressing" for 10+ minutes usually finishes — it's just slow, not broken. Check back later.
- PipelineRuns are garbage-collected. After ~5 days, old PipelineRuns are deleted. The snapshot still exists and references the build, but you can't inspect the PipelineRun directly.
- Do NOT use the Konflux UI "Rerun" button for on-push pipelines. It re-resolves
{{revision}}to the latest commit on main, causingidentify-packagesto diff the wrong commits. Use the manual retry procedure instead. - Batch onboarding causes "Released in newer Snapshot". When many packages are committed in quick succession, only the latest snapshot gets auto-released. All earlier snapshots (each containing a unique package build) need manual Release CRs. This has been mitigated by adding
test.appstudio.openshift.io/ignore-supersession: "true"to the on-push PipelineRun annotation (commitadbf127f), but older snapshots from before the fix may still be affected. - Name normalization is real. Always check Pulp with both dash and dot variants. Common prefixes:
backports,jaraco,zope,ruamel. - Release CR template requires
releasePlan: calunga. This references the ReleasePlan CR in the namespace. ThegracePeriodDays: 7field controls how long the release artifacts are retained.
hack/debug-package.sh implements the diagnostic flow described above. Run it for any single package:
hack/debug-package.sh <package_name>It traces: commit → build → snapshot → tests → release and prints a structured report.