AlertForge - Event-Driven Alerting Engine

AlertForge is a production-grade, event-driven alerting system demonstrating distributed systems reliability patterns. It features asynchronous ingestion, Redis-based sliding-window rate limiting, SHA-256 fingerprint deduplication, exponential backoff retries with dead-letter queue (DLQ) support, and scheduled AI-driven alert storm summarization.

Built with FastAPI, Celery, Redis, PostgreSQL, React, and Docker.

System Architecture

graph TD
    Client[React Simulator Dashboard] -->|POST /api/v1/alerts| API[FastAPI Ingestion Server]
    API -->|1. Generate Hash & Check| Dedup{Redis Deduplication}
    
    Dedup -->|Active Hash Exists| Suppress[Increment dup_count in PostgreSQL]
    Dedup -->|New Unique Hash| Store[Save Event to PostgreSQL]
    
    Store -->|2. Register Suppression Key| RedisCache[Cache Hash in Redis for 60s]
    Store -->|3. Enqueue Background Job| Queue[Redis Task Broker]
    
    Queue -->|4. Pull Job| Worker[Celery Worker]
    Worker -->|5. Verify Dispatch Rate| RateLimiter{Redis sliding-window rate limiter}
    
    RateLimiter -->|Rate Exceeded| Retry[Trigger Exponential Backoff]
    RateLimiter -->|Under Limit| Dispatch[Send Notifications]
    
    Dispatch -->|Webhook| Slack[Slack Channel]
    Dispatch -->|SMTP| Mailhog[Mailhog Email Catcher]
    
    Retry -->|Max 3 Retries Exceeded| DLQ[Save to Dead-Letter Queue Table]
    
    Scheduler[Celery Beat Scheduler] -->|Every 60s| Summarizer[AI Summarizer Task]
    Summarizer -->|Query Last 5m| DB[(PostgreSQL Database)]
    DB -->|Provide Logs| Summarizer
    Summarizer -->|6. Generate Digest report| Gemini[Google Gemini LLM / Fallback]
    Gemini -->|Save Digest| DB

Reliability Patterns Demonstrated

Pattern	Technical Implementation
Asynchronous Ingestion	Ingestion API accepts payloads, writes to SQL database, and returns HTTP 202 Accepted immediately. Dispatch is offloaded to background Celery workers.
sliding-window Deduplication	Normalizes incoming alerts (service, event type, severity) and generates a SHA-256 fingerprint. Redis keys with a 60-second time-to-live suppress identical alerts, updating duplicate counts on the parent record.
Exponential Backoff	Notifications failed due to network timeouts or rate limits are retried at increasing delay intervals (2^n seconds where n is the retry attempt).
Dead Letter Queue (DLQ)	Failed tasks that exceed the max retry threshold (3 retries) are marked as DLQ and routed to a dedicated isolation database table with raw error logs and payload context.
Sliding-Window Rate Limiting	Enforces a maximum dispatch rate (e.g., 5 notifications per channel per minute) using Redis Sorted Sets (ZSET) to protect downstream services from spam during outages.
AI Storm Summarizer	Celery Beat triggers a cron-like schedule every 60 seconds. If a storm is active (5 or more unique alerts in 5 minutes), the Gemini API groups the logs and compiles root-cause incident reports.

Getting Started

Prerequisites

Docker Desktop installed and running

1. Clone and Configure

git clone https://github.com/YOUR_USERNAME/AlertForge.git
cd AlertForge
cp .env.example .env

Define configuration settings in your .env file (both settings are optional; the application runs with mock engines if empty):

SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T.../B.../xxx
GEMINI_API_KEY=your_gemini_api_key

2. Launch Services

Run the application containers:

docker compose up --build

3. Service Access Endpoints

Component	URL
React Dashboard	http://localhost:5173
API Docs (Swagger)	http://localhost:8000/docs
Mailhog Web Console	http://localhost:8025

Verification and Testing

Through the Dashboard

Deduplication Verification: Click the "Ingest Incident Alert" button multiple times rapidly. A single alert is created in the database, and subsequent duplicate alerts increment the suppression counter.
Retry and DLQ Verification: Toggle the "Fail Slack Webhook" checkbox and send an alert. Observe the worker retry with exponential delays in the terminal. After exceeding max retries, the status shifts to DLQ.
Rate Limiting Verification: Trigger a simulated storm using the "Simulate Outage Storm" button. Check the worker terminal logs to see rate limiters blocking notification delivery after 5 dispatches.
AI Storm Summarization: Generate a storm. Click "Run AI Summarization" to trigger the summarizer task. The compiled report will display under the AI Storm Digests tab.
Email Delivery Verification: Open the Mailhog console at http://localhost:8025 to view captured notification emails.

Running Automated Integration Tests

Execute the test suite inside the API container:

docker compose exec api pytest -v

Project Structure

AlertForge/
├── docker-compose.yml          # Multi-container service configuration
├── .env.example                # Configuration template
├── README.md                   # Systems documentation
└── backend/
    ├── Dockerfile
    ├── requirements.txt
    └── app/
        ├── main.py             # FastAPI routing and request controllers
        ├── models.py           # SQLAlchemy database schemas (Events, DLQ, Digests)
        ├── schemas.py          # Pydantic validation rules
        ├── config.py           # Settings and configuration loader
        ├── database.py         # SQLAlchemy engine and connection manager
        ├── redis_client.py     # Deduplication and rate limiter logic
        ├── worker.py           # Celery application configuration and schedule
        ├── tasks.py            # Notification worker and task runners
        ├── summarizer.py       # AI Storm Summarizer service
        └── tests/
            └── test_alerts.py  # Automated integration tests

Technologies Used

Backend API: Python, FastAPI, SQLAlchemy, Pydantic
Background Workers: Celery, Redis (broker and backend)
Database: PostgreSQL
Cache: Redis ZSET (rate limiting) and TTL (deduplication)
AI Engine: Google Gemini 1.5 Flash (with local fallback)
Notifications: Slack Webhooks, SMTP (via Mailhog)
Frontend: React, Vite, CSS
Infrastructure: Docker Compose

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlertForge - Event-Driven Alerting Engine

System Architecture

Reliability Patterns Demonstrated

Getting Started

Prerequisites

1. Clone and Configure

2. Launch Services

3. Service Access Endpoints

Verification and Testing

Through the Dashboard

Running Automated Integration Tests

Project Structure

Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AlertForge - Event-Driven Alerting Engine

System Architecture

Reliability Patterns Demonstrated

Getting Started

Prerequisites

1. Clone and Configure

2. Launch Services

3. Service Access Endpoints

Verification and Testing

Through the Dashboard

Running Automated Integration Tests

Project Structure

Technologies Used

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages