Skip to content

fix for issue #4638 -- create in-alarm variable to prevent excessive … #5409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sTarunRaaj
Copy link

…elasticsearch health alerts

Fixes

Fixes #4638

Description

This pull request introduces cooldown logic for the Elasticsearch cluster healthcheck DAG to prevent redundant Slack alerts. Specifically, it tracks the timestamp of the last alert using an Airflow Variable. If a new DAG run detects the same issue within the past 2 hours, it suppresses the alert. Once the cluster recovers (status returns to green), the timestamp is cleared to allow future alerts.

Testing Instructions

  1. Trigger the DAG with a failing healthcheck (e.g., simulate status=red).
  2. Confirm an alert is sent and that an Airflow Variable (es_last_alert_time_<env>) is created.
  3. Trigger the DAG again within 6 hours — confirm that no alert is sent.
  4. Trigger with a healthy status — confirm the variable is deleted.
  5. Trigger again with another failure after the variable was cleared — confirm a new alert is sent.

Checklist

  • My pull request has a descriptive title.
  • My pull request targets the main branch.
  • My commit messages follow best practices.
  • My code follows the repository’s code style.
  • I added or updated tests for the changes I made.
  • I ran the DAG locally and verified the cooldown behavior.

@sTarunRaaj sTarunRaaj requested a review from a team as a code owner April 25, 2025 17:29
@sTarunRaaj sTarunRaaj requested review from krysal and obulat and removed request for a team April 25, 2025 17:29
@openverse-bot openverse-bot added 🧱 stack: catalog Related to the catalog and Airflow DAGs 🟩 priority: low Low priority and doesn't need to be rushed ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository labels Apr 25, 2025
@openverse-bot openverse-bot moved this to 👀 Needs Review in Openverse PRs Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: 👀 Needs Review
Development

Successfully merging this pull request may close these issues.

Use an "in-alarm" variable to prevent sending Slack alerts every 15 minutes for Elasticsearch cluster health
2 participants