Skip to content

Commit 92af2c1

Browse files
committed
Implement production-grade High Availability and Serverless features
Features: - Add circuit breaker pattern to Wazuh API client with automatic recovery - Integrate exponential backoff retry logic (3 attempts, 1-10s delays) - Implement graceful shutdown with connection draining (30s timeout) - Add pluggable session storage architecture (in-memory/Redis) - Enable serverless deployments with Redis session backend - Support horizontal scaling and multi-instance deployments Implementation: - WazuhClient: Circuit breakers + retry decorators on all API calls - SessionManager: Abstract storage interface with Redis/in-memory backends - GracefulShutdown: Connection tracking and cleanup task execution - Configuration: Redis URL and session TTL environment variables Backward Compatible: - Default behavior unchanged (in-memory sessions) - Zero configuration required for single-instance deployments - Optional Redis backend enabled via REDIS_URL environment variable - All existing code paths work without modification Dependencies: - Add redis[async]>=5.0.0 for optional Redis session storage - All resilience patterns integrated from existing resilience.py module Documentation: - Update README with Advanced Features section - Add HA and Serverless configuration examples - Document session storage modes and verification steps
1 parent 9dc074b commit 92af2c1

File tree

6 files changed

+621
-144
lines changed

6 files changed

+621
-144
lines changed

.env.example

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,10 @@ LOG_LEVEL=INFO
4747

4848
# === Wazuh SSL ===
4949
WAZUH_VERIFY_SSL=false
50-
WAZUH_ALLOW_SELF_SIGNED=true
50+
WAZUH_ALLOW_SELF_SIGNED=true
51+
52+
# === Session Storage (Serverless Ready) ===
53+
# Optional Redis URL for serverless/multi-instance deployments
54+
# If not set, uses in-memory storage (single-instance only)
55+
# REDIS_URL=redis://localhost:6379/0
56+
# SESSION_TTL_SECONDS=1800

README.md

Lines changed: 94 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ A **production-ready, enterprise-grade** MCP-compliant remote server that provid
2929
- **📊 Comprehensive Monitoring**: Prometheus metrics, health checks, structured logging
3030
- **🐳 100% Containerized**: Everything in Docker - OS-agnostic deployment (Windows/macOS/Linux)
3131
- **🌍 Zero Host Dependencies**: No Python, tools, or libraries needed on host system
32-
- **🔄 High Availability**: Circuit breakers, retry logic, graceful shutdown
33-
- **☁️ Serverless Ready**: Can scale to zero when idle with Streamable HTTP
32+
- **🔄 High Availability**: Integrated circuit breakers, exponential backoff retry logic, graceful shutdown with connection draining
33+
- **☁️ Serverless Ready**: Pluggable session storage (Redis or in-memory), stateless operations, horizontal scaling support
3434

3535
### 🏅 MCP 2025-06-18 Specification Compliance
3636

@@ -217,6 +217,8 @@ curl http://localhost:3000/health
217217
| `LOG_LEVEL` | Logging level | `INFO` ||
218218
| `WAZUH_VERIFY_SSL` | SSL verification | `false` ||
219219
| `ALLOWED_ORIGINS` | CORS origins | `https://claude.ai` ||
220+
| `REDIS_URL` | Redis URL for serverless sessions | - ||
221+
| `SESSION_TTL_SECONDS` | Session TTL (Redis only) | `1800` ||
220222

221223
### Docker Compose Configuration
222224

@@ -302,6 +304,96 @@ docker compose pull
302304
docker compose up -d
303305
```
304306

307+
## 🚀 Advanced Features
308+
309+
### High Availability (HA)
310+
311+
The server includes production-grade HA features for maximum reliability:
312+
313+
**Circuit Breakers**
314+
- Automatically opens after 5 consecutive failures
315+
- Prevents cascading failures to Wazuh API
316+
- Recovers automatically after 60 seconds
317+
- Falls back gracefully during outages
318+
319+
**Retry Logic**
320+
- Exponential backoff with jitter
321+
- 3 retry attempts with 1-10 second delays
322+
- Applies to all Wazuh API calls
323+
- Handles transient network failures
324+
325+
**Graceful Shutdown**
326+
- Waits for active connections to complete (max 30s)
327+
- Runs cleanup tasks before termination
328+
- Prevents data loss during restarts
329+
- Integrates with Docker health checks
330+
331+
**Implementation:**
332+
```python
333+
# Automatically applied to all Wazuh API calls
334+
# No configuration required - works out of the box
335+
```
336+
337+
### Serverless Ready
338+
339+
Enable horizontally scalable, serverless deployments with external session storage:
340+
341+
**Default Mode: In-Memory Sessions**
342+
```bash
343+
# Single-instance deployments (default)
344+
# No configuration needed
345+
docker compose up -d
346+
```
347+
- ✅ Zero configuration
348+
- ✅ Works immediately
349+
- ❌ Sessions lost on restart
350+
- ❌ Cannot scale horizontally
351+
352+
**Serverless Mode: Redis Sessions**
353+
```bash
354+
# Multi-instance/serverless deployments
355+
# Configure Redis in .env file
356+
REDIS_URL=redis://redis:6379/0
357+
SESSION_TTL_SECONDS=1800 # 30 minutes
358+
359+
# Deploy with Redis
360+
docker compose -f compose.yml -f compose.redis.yml up -d
361+
```
362+
- ✅ Sessions persist across restarts
363+
- ✅ Horizontal scaling support
364+
- ✅ Serverless compatible (AWS Lambda, Cloud Run)
365+
- ✅ Automatic session expiration
366+
367+
**Redis Setup (Optional):**
368+
```yaml
369+
# compose.redis.yml
370+
services:
371+
redis:
372+
image: redis:7-alpine
373+
ports:
374+
- "6379:6379"
375+
volumes:
376+
- redis-data:/data
377+
healthcheck:
378+
test: ["CMD", "redis-cli", "ping"]
379+
interval: 5s
380+
381+
volumes:
382+
redis-data:
383+
```
384+
385+
**Verification:**
386+
```bash
387+
# Check session storage mode
388+
curl http://localhost:3000/health | jq '.session_storage'
389+
390+
# Output:
391+
# {
392+
# "type": "InMemorySessionStore" # or "RedisSessionStore"
393+
# "sessions_count": 5
394+
# }
395+
```
396+
305397
## 📊 Monitoring & Operations
306398

307399
### Health Monitoring

requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,5 +29,8 @@ psutil>=5.9.0
2929
tenacity>=8.2.0 # Retry logic
3030
aiofiles>=23.0.0 # Async file operations
3131

32+
# Session storage for serverless (optional)
33+
redis[async]>=5.0.0 # Redis async client for serverless session storage
34+
3235
# Security and encryption
3336
cryptography>=41.0.0 # Encryption for secrets

src/wazuh_mcp_server/api/wazuh_client.py

Lines changed: 65 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,21 @@
55
import time
66
from typing import Dict, Any, Optional
77
import httpx
8+
import logging
89

910
from wazuh_mcp_server.config import WazuhConfig
11+
from wazuh_mcp_server.resilience import (
12+
CircuitBreaker,
13+
CircuitBreakerConfig,
14+
RetryConfig
15+
)
16+
17+
logger = logging.getLogger(__name__)
1018

1119

1220
class WazuhClient:
13-
"""Simplified Wazuh API client with rate limiting."""
14-
21+
"""Simplified Wazuh API client with rate limiting, circuit breaker, and retry logic."""
22+
1523
def __init__(self, config: WazuhConfig):
1624
self.config = config
1725
self.token: Optional[str] = None
@@ -21,6 +29,15 @@ def __init__(self, config: WazuhConfig):
2129
self._request_times = []
2230
self._max_requests_per_minute = getattr(config, 'max_requests_per_minute', 100)
2331
self._rate_limit_enabled = True
32+
33+
# Circuit breaker for API resilience
34+
circuit_config = CircuitBreakerConfig(
35+
failure_threshold=5,
36+
recovery_timeout=60,
37+
expected_exception=Exception
38+
)
39+
self._circuit_breaker = CircuitBreaker(circuit_config)
40+
logger.info("WazuhClient initialized with circuit breaker and retry logic")
2441

2542
async def initialize(self):
2643
"""Initialize the HTTP client and authenticate."""
@@ -180,45 +197,58 @@ async def _rate_limit_check(self):
180197
self._request_times.append(current_time)
181198

182199
async def _request(self, method: str, endpoint: str, **kwargs) -> Dict[str, Any]:
183-
"""Make authenticated request to Wazuh API with rate limiting."""
200+
"""Make authenticated request to Wazuh API with rate limiting, circuit breaker, and retry logic."""
184201
# Apply rate limiting
185202
async with self._rate_limiter:
186203
await self._rate_limit_check()
187-
188-
if not self.token:
204+
205+
# Apply circuit breaker and retry logic
206+
return await self._request_with_resilience(method, endpoint, **kwargs)
207+
208+
@RetryConfig.WAZUH_API_RETRY
209+
async def _request_with_resilience(self, method: str, endpoint: str, **kwargs) -> Dict[str, Any]:
210+
"""Execute request with circuit breaker and retry logic."""
211+
return await self._circuit_breaker._call(self._execute_request, method, endpoint, **kwargs)
212+
213+
async def _execute_request(self, method: str, endpoint: str, **kwargs) -> Dict[str, Any]:
214+
"""Execute the actual HTTP request to Wazuh API."""
215+
if not self.token:
216+
await self._authenticate()
217+
218+
url = f"{self.config.base_url}{endpoint}"
219+
headers = {"Authorization": f"Bearer {self.token}"}
220+
221+
try:
222+
response = await self.client.request(method, url, headers=headers, **kwargs)
223+
response.raise_for_status()
224+
225+
data = response.json()
226+
227+
# Validate response structure
228+
if "data" not in data:
229+
raise ValueError(f"Invalid response structure from Wazuh API: {endpoint}")
230+
231+
return data
232+
233+
except httpx.HTTPStatusError as e:
234+
if e.response.status_code == 401:
235+
# Token might be expired, try to re-authenticate
236+
self.token = None
189237
await self._authenticate()
190-
191-
url = f"{self.config.base_url}{endpoint}"
192-
headers = {"Authorization": f"Bearer {self.token}"}
193-
194-
try:
238+
# Retry the request once
239+
headers = {"Authorization": f"Bearer {self.token}"}
195240
response = await self.client.request(method, url, headers=headers, **kwargs)
196241
response.raise_for_status()
197-
198-
data = response.json()
199-
200-
# Validate response structure
201-
if "data" not in data:
202-
raise ValueError(f"Invalid response structure from Wazuh API: {endpoint}")
203-
204-
return data
205-
206-
except httpx.HTTPStatusError as e:
207-
if e.response.status_code == 401:
208-
# Token might be expired, try to re-authenticate
209-
self.token = None
210-
await self._authenticate()
211-
# Retry the request once
212-
headers = {"Authorization": f"Bearer {self.token}"}
213-
response = await self.client.request(method, url, headers=headers, **kwargs)
214-
response.raise_for_status()
215-
return response.json()
216-
else:
217-
raise ValueError(f"Wazuh API request failed: {e.response.status_code} - {e.response.text}")
218-
except httpx.ConnectError:
219-
raise ConnectionError(f"Lost connection to Wazuh server at {self.config.wazuh_host}")
220-
except httpx.TimeoutException:
221-
raise ConnectionError(f"Request timeout to Wazuh server")
242+
return response.json()
243+
else:
244+
logger.error(f"Wazuh API request failed: {e.response.status_code} - {e.response.text}")
245+
raise ValueError(f"Wazuh API request failed: {e.response.status_code} - {e.response.text}")
246+
except httpx.ConnectError as e:
247+
logger.error(f"Lost connection to Wazuh server at {self.config.wazuh_host}")
248+
raise ConnectionError(f"Lost connection to Wazuh server at {self.config.wazuh_host}")
249+
except httpx.TimeoutException as e:
250+
logger.error(f"Request timeout to Wazuh server")
251+
raise ConnectionError(f"Request timeout to Wazuh server")
222252

223253
async def get_manager_info(self) -> Dict[str, Any]:
224254
"""Get Wazuh manager information."""

0 commit comments

Comments
 (0)