Skip to content

Commit c6119ec

Browse files
MK8S-68: add retry mechanism for failed nightlies with Slack notifications
Add automatic retry of failed nightly workflows: - Retry schedule at 06:00 (4 hours after last nightly) - Maximum 2 retry attempts for failed jobs only - Slack notifications to #squad-supsetup-alerts when: * A retry is triggered * Manual action is needed after max retries exhausted - Uses scality/actions/action-retry-workflow@1.17.0
1 parent 87fd690 commit c6119ec

File tree

1 file changed

+60
-1
lines changed

1 file changed

+60
-1
lines changed

.github/workflows/crons.yaml

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,14 @@ on:
1313
# Run V -1 nightly every weekday at 02:00
1414
- cron: "0 2 * * 1-5"
1515

16+
# Retry failed nightlies every weekday at 06:00 (4 hours after last nightly)
17+
- cron: "0 6 * * 1-5"
18+
1619
jobs:
1720
crons:
1821
runs-on: ubuntu-24.04
22+
env:
23+
MAX_RETRIES: '2'
1924
strategy:
2025
fail-fast: false
2126
matrix:
@@ -47,14 +52,68 @@ jobs:
4752
cron: "0 2 * * 1-5"
4853
branch: "development/131.0"
4954
workflow: "nightly.yaml"
55+
56+
# Nightly retry jobs
57+
# current=132.0
58+
- name: "Retry Nightly for MetalK8s 132.0"
59+
cron: "0 6 * * 1-5"
60+
branch: "development/132.0"
61+
workflow: "nightly.yaml"
62+
type: "retry"
63+
# old=131.0
64+
- name: "Retry Nightly for MetalK8s 131.0"
65+
cron: "0 6 * * 1-5"
66+
branch: "development/131.0"
67+
workflow: "nightly.yaml"
68+
type: "retry"
5069
steps:
5170
- name: Checkout
5271
if: github.event.schedule == matrix.cron
5372
uses: actions/checkout@v6
5473
- name: "Smart trigger for ${{ matrix.name }}"
55-
if: github.event.schedule == matrix.cron
74+
if: github.event.schedule == matrix.cron && matrix.type != 'retry'
5675
uses: scality/actions/actions-nightly-trigger@1.17.0
5776
with:
5877
branch: ${{ matrix.branch }}
5978
workflow: ${{ matrix.workflow }}
6079
access_token: ${{ secrets.GIT_ACCESS_TOKEN }}
80+
- name: "Retry workflow for ${{ matrix.name }}"
81+
id: retry
82+
if: github.event.schedule == matrix.cron && matrix.type == 'retry'
83+
uses: scality/actions/action-retry-workflow@1.17.0
84+
with:
85+
branch: ${{ matrix.branch }}
86+
workflow: ${{ matrix.workflow }}
87+
step-name: 'Spawn cluster with Terraform'
88+
max-retries: ${{ env.MAX_RETRIES }}
89+
retry-mode: 'failed-only'
90+
access_token: ${{ secrets.GITHUB_TOKEN }}
91+
- name: Notify Slack on retry triggered
92+
if: matrix.type == 'retry' && steps.retry.outputs.was-retried == 'true'
93+
uses: slackapi/slack-github-action@v1
94+
with:
95+
channel-id: '#squad-supsetup-alerts'
96+
slack-message: |
97+
Nightly retry triggered for ${{ matrix.branch }}
98+
Job: ${{ matrix.name }}
99+
Status: ${{ steps.retry.outputs.status }}
100+
Retries performed: ${{ steps.retry.outputs.retry-count }}
101+
Retried run: https://github.com/${{ github.repository }}/actions/runs/${{ steps.retry.outputs.run-id }}
102+
Cron run: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
103+
env:
104+
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
105+
- name: Notify Slack on manual action needed
106+
if: matrix.type == 'retry' && steps.retry.outputs.retry-count != '' && fromJSON(steps.retry.outputs.retry-count) >= fromJSON(env.MAX_RETRIES)
107+
uses: slackapi/slack-github-action@v1
108+
with:
109+
channel-id: '#squad-supsetup-alerts'
110+
slack-message: |
111+
⚠️ MANUAL ACTION REQUIRED ⚠️
112+
Nightly for ${{ matrix.branch }} failed after ${{ steps.retry.outputs.retry-count }} retry attempts
113+
Job: ${{ matrix.name }}
114+
Max retries (${{ env.MAX_RETRIES }}) exhausted - manual intervention needed
115+
Workflow: ${{ matrix.workflow }}
116+
Failed run: https://github.com/${{ github.repository }}/actions/runs/${{ steps.retry.outputs.run-id }}
117+
Cron run: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
118+
env:
119+
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

0 commit comments

Comments
 (0)