AlertForge is a production-grade, event-driven alerting system demonstrating distributed systems reliability patterns. It features asynchronous ingestion, Redis-based sliding-window rate limiting, SHA-256 fingerprint deduplication, exponential backoff retries with dead-letter queue (DLQ) support, and scheduled AI-driven alert storm summarization.
Built with FastAPI, Celery, Redis, PostgreSQL, React, and Docker.
graph TD
Client[React Simulator Dashboard] -->|POST /api/v1/alerts| API[FastAPI Ingestion Server]
API -->|1. Generate Hash & Check| Dedup{Redis Deduplication}
Dedup -->|Active Hash Exists| Suppress[Increment dup_count in PostgreSQL]
Dedup -->|New Unique Hash| Store[Save Event to PostgreSQL]
Store -->|2. Register Suppression Key| RedisCache[Cache Hash in Redis for 60s]
Store -->|3. Enqueue Background Job| Queue[Redis Task Broker]
Queue -->|4. Pull Job| Worker[Celery Worker]
Worker -->|5. Verify Dispatch Rate| RateLimiter{Redis sliding-window rate limiter}
RateLimiter -->|Rate Exceeded| Retry[Trigger Exponential Backoff]
RateLimiter -->|Under Limit| Dispatch[Send Notifications]
Dispatch -->|Webhook| Slack[Slack Channel]
Dispatch -->|SMTP| Mailhog[Mailhog Email Catcher]
Retry -->|Max 3 Retries Exceeded| DLQ[Save to Dead-Letter Queue Table]
Scheduler[Celery Beat Scheduler] -->|Every 60s| Summarizer[AI Summarizer Task]
Summarizer -->|Query Last 5m| DB[(PostgreSQL Database)]
DB -->|Provide Logs| Summarizer
Summarizer -->|6. Generate Digest report| Gemini[Google Gemini LLM / Fallback]
Gemini -->|Save Digest| DB
| Pattern | Technical Implementation |
|---|---|
| Asynchronous Ingestion | Ingestion API accepts payloads, writes to SQL database, and returns HTTP 202 Accepted immediately. Dispatch is offloaded to background Celery workers. |
| sliding-window Deduplication | Normalizes incoming alerts (service, event type, severity) and generates a SHA-256 fingerprint. Redis keys with a 60-second time-to-live suppress identical alerts, updating duplicate counts on the parent record. |
| Exponential Backoff | Notifications failed due to network timeouts or rate limits are retried at increasing delay intervals (2^n seconds where n is the retry attempt). |
| Dead Letter Queue (DLQ) | Failed tasks that exceed the max retry threshold (3 retries) are marked as DLQ and routed to a dedicated isolation database table with raw error logs and payload context. |
| Sliding-Window Rate Limiting | Enforces a maximum dispatch rate (e.g., 5 notifications per channel per minute) using Redis Sorted Sets (ZSET) to protect downstream services from spam during outages. |
| AI Storm Summarizer | Celery Beat triggers a cron-like schedule every 60 seconds. If a storm is active (5 or more unique alerts in 5 minutes), the Gemini API groups the logs and compiles root-cause incident reports. |
- Docker Desktop installed and running
git clone https://github.com/YOUR_USERNAME/AlertForge.git
cd AlertForge
cp .env.example .envDefine configuration settings in your .env file (both settings are optional; the application runs with mock engines if empty):
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T.../B.../xxx
GEMINI_API_KEY=your_gemini_api_keyRun the application containers:
docker compose up --build| Component | URL |
|---|---|
| React Dashboard | http://localhost:5173 |
| API Docs (Swagger) | http://localhost:8000/docs |
| Mailhog Web Console | http://localhost:8025 |
- Deduplication Verification: Click the "Ingest Incident Alert" button multiple times rapidly. A single alert is created in the database, and subsequent duplicate alerts increment the suppression counter.
- Retry and DLQ Verification: Toggle the "Fail Slack Webhook" checkbox and send an alert. Observe the worker retry with exponential delays in the terminal. After exceeding max retries, the status shifts to DLQ.
- Rate Limiting Verification: Trigger a simulated storm using the "Simulate Outage Storm" button. Check the worker terminal logs to see rate limiters blocking notification delivery after 5 dispatches.
- AI Storm Summarization: Generate a storm. Click "Run AI Summarization" to trigger the summarizer task. The compiled report will display under the AI Storm Digests tab.
- Email Delivery Verification: Open the Mailhog console at http://localhost:8025 to view captured notification emails.
Execute the test suite inside the API container:
docker compose exec api pytest -vAlertForge/
├── docker-compose.yml # Multi-container service configuration
├── .env.example # Configuration template
├── README.md # Systems documentation
└── backend/
├── Dockerfile
├── requirements.txt
└── app/
├── main.py # FastAPI routing and request controllers
├── models.py # SQLAlchemy database schemas (Events, DLQ, Digests)
├── schemas.py # Pydantic validation rules
├── config.py # Settings and configuration loader
├── database.py # SQLAlchemy engine and connection manager
├── redis_client.py # Deduplication and rate limiter logic
├── worker.py # Celery application configuration and schedule
├── tasks.py # Notification worker and task runners
├── summarizer.py # AI Storm Summarizer service
└── tests/
└── test_alerts.py # Automated integration tests
- Backend API: Python, FastAPI, SQLAlchemy, Pydantic
- Background Workers: Celery, Redis (broker and backend)
- Database: PostgreSQL
- Cache: Redis ZSET (rate limiting) and TTL (deduplication)
- AI Engine: Google Gemini 1.5 Flash (with local fallback)
- Notifications: Slack Webhooks, SMTP (via Mailhog)
- Frontend: React, Vite, CSS
- Infrastructure: Docker Compose