Skip to content

Add Slack alerts for canary workflow failures#4158

Merged
yonromai merged 2 commits intomainfrom
romain/canary-slack-alerts
Mar 26, 2026
Merged

Add Slack alerts for canary workflow failures#4158
yonromai merged 2 commits intomainfrom
romain/canary-slack-alerts

Conversation

@yonromai
Copy link
Copy Markdown
Contributor

Summary

  • Adds a Slack notification step to both GPU (marin-canary-ferry-cw) and TPU (marin-canary-ferry) canary workflows
  • Fires only on scheduled runs that fail (not manual workflow_dispatch)
  • Posts to #oa-marin-eng via incoming webhook (SLACK_WEBHOOK_URL secret already set)

Test plan

  • SLACK_WEBHOOK_URL secret added to repo
  • Webhook verified working from local curl
  • Verify alert appears in #oa-marin-eng on next canary failure (or trigger manually and check GH Actions logs for the Slack step)

🤖 Generated with Claude Code

Notify #oa-marin-eng when the daily GPU or TPU canary fails on schedule.
Uses slackapi/slack-github-action with the SLACK_WEBHOOK_URL secret.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ef9f967d0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/marin-canary-ferry-cw.yaml Outdated
Teardown should not be delayed by the Slack notification — H100 costs
matter more than a few seconds of alert latency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yonromai yonromai requested review from ravwojdyla and rjpower March 26, 2026 00:42
@yonromai yonromai merged commit b35a428 into main Mar 26, 2026
41 of 42 checks passed
@yonromai yonromai deleted the romain/canary-slack-alerts branch March 26, 2026 00:46
Helw150 pushed a commit that referenced this pull request Apr 8, 2026
## Summary
- Adds a Slack notification step to both GPU (`marin-canary-ferry-cw`)
and TPU (`marin-canary-ferry`) canary workflows
- Fires only on scheduled runs that fail (not manual
`workflow_dispatch`)
- Posts to `#oa-marin-eng` via incoming webhook (`SLACK_WEBHOOK_URL`
secret already set)

## Test plan
- [x] `SLACK_WEBHOOK_URL` secret added to repo
- [x] Webhook verified working from local curl
- [ ] Verify alert appears in `#oa-marin-eng` on next canary failure (or
trigger manually and check GH Actions logs for the Slack step)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: yoblin <268258002+yoblin@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants