Skip to content

Commit dc7264f

Browse files
yoblinclaude
andcommitted
Move Slack alert after CoreWeave teardown
Teardown should not be delayed by the Slack notification — H100 costs matter more than a few seconds of alert latency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 1ef9f96 commit dc7264f

1 file changed

Lines changed: 9 additions & 9 deletions

File tree

.github/workflows/marin-canary-ferry-cw.yaml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -185,15 +185,6 @@ jobs:
185185
# `cluster stop` only deletes Pods; NodePools survive and rely on the
186186
# CW autoscaler to scale down. Delete them explicitly to avoid lingering
187187
# H100 costs.
188-
- name: Notify Slack on failure
189-
if: failure() && github.event_name == 'schedule'
190-
uses: slackapi/slack-github-action@v2
191-
with:
192-
webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
193-
webhook-type: incoming-webhook
194-
payload: |
195-
text: ":red_circle: *GPU Canary failed*\nRun: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
196-
197188
- name: Tear down CoreWeave cluster
198189
if: always()
199190
run: |
@@ -204,3 +195,12 @@ jobs:
204195
else
205196
echo "Keeping node pool alive (keep_nodepool=true)"
206197
fi
198+
199+
- name: Notify Slack on failure
200+
if: failure() && github.event_name == 'schedule'
201+
uses: slackapi/slack-github-action@v2
202+
with:
203+
webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
204+
webhook-type: incoming-webhook
205+
payload: |
206+
text: ":red_circle: *GPU Canary failed*\nRun: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

0 commit comments

Comments
 (0)