Skip to content

Commit 550aa74

Browse files
author
Nissan Pow
committed
fix: disable pod cleanup for argo — was deleting pods mid-workflow
The background kubectl delete pods loop was racing with the argo workflow controller: pods briefly show Succeeded before the controller reads their task results. Deleting them causes "pod deleted" errors and workflow failures. This was the root cause of all argo deployer test failures across 3 consecutive runs. Keep cleanup for airflow only (it manages pods differently) and increase interval to 120s for safety.
1 parent d3d7aac commit 550aa74

1 file changed

Lines changed: 3 additions & 2 deletions

File tree

.github/workflows/ux-tests.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -237,13 +237,14 @@ jobs:
237237
run: devtools/ci/wait-airflow-api.sh
238238

239239
- name: Clean up completed pods and start background cleanup
240-
if: matrix.backend == 'argo-kubernetes' || matrix.backend == 'airflow-kubernetes'
240+
if: matrix.backend == 'airflow-kubernetes'
241241
run: |
242242
kubectl delete pods --field-selector=status.phase=Succeeded --all-namespaces 2>/dev/null || true
243243
kubectl delete pods --field-selector=status.phase=Failed --all-namespaces 2>/dev/null || true
244244
# Periodically clean up completed pods during test runs to free cluster resources
245+
# NOTE: Only safe for airflow — argo controller needs Succeeded pods to read task results
245246
while true; do
246-
sleep 60
247+
sleep 120
247248
kubectl delete pods --field-selector=status.phase=Succeeded --all-namespaces 2>/dev/null || true
248249
kubectl delete pods --field-selector=status.phase=Failed --all-namespaces 2>/dev/null || true
249250
done &

0 commit comments

Comments
 (0)