Skip to content

[FEATURE]: Centralized Redis configuration #1660

@crivetimihai

Description

@crivetimihai

Summary

Add Redis connection pool and leader election settings to config.py and .env.example to eliminate hardcoded values scattered across the codebase.

Problem: Hardcoded Values

Setting Hardcoded Location Current Value
decode_responses llmchat_router.py:65, oauth_manager.py:66, event_service.py:203 True
socket_connect_timeout oauth_manager.py:66 5
socket_timeout oauth_manager.py:66 5
leader_ttl gateway_service.py:347 40
max_connections (not configured)
health_check_interval (not configured)

Solution: Add to config.py

Add after existing Redis settings (around line 1098):

# Redis Configuration - Performance Optimized

# Connection Pool Settings
redis_decode_responses: bool = Field(default=True, description="Return strings instead of bytes")
redis_max_connections: int = Field(default=50, description="Connection pool size per worker")
redis_socket_timeout: float = Field(default=2.0, description="Socket read/write timeout in seconds")
redis_socket_connect_timeout: float = Field(default=2.0, description="Connection timeout in seconds")
redis_retry_on_timeout: bool = Field(default=True, description="Retry commands on timeout")
redis_health_check_interval: int = Field(default=30, description="Seconds between connection health checks (0=disabled)")

# Leader Election Settings
redis_leader_ttl: int = Field(default=15, description="Leader election TTL in seconds")
redis_leader_key: str = Field(default="gateway_service_leader", description="Leader key name")
redis_leader_heartbeat_interval: int = Field(default=5, description="Seconds between leader heartbeats")

Solution: Add to .env.example

# Redis Connection Pool - Performance Tuned
REDIS_MAX_CONNECTIONS=50
REDIS_SOCKET_TIMEOUT=2.0
REDIS_SOCKET_CONNECT_TIMEOUT=2.0
REDIS_RETRY_ON_TIMEOUT=true
REDIS_HEALTH_CHECK_INTERVAL=30
REDIS_DECODE_RESPONSES=true

# Redis Leader Election - Multi-Node Deployments
REDIS_LEADER_TTL=15
REDIS_LEADER_HEARTBEAT_INTERVAL=5
REDIS_LEADER_KEY=gateway_service_leader

Performance Optimization Rationale

Setting Default Rationale
max_connections 50 Async apps need more connections; 50 handles ~500 concurrent requests per worker
socket_timeout 2.0s Fast failure detection; Redis ops should complete in <100ms
socket_connect_timeout 2.0s Quick connection failure; don't block event loop waiting
health_check_interval 30s Periodic pool health check prevents stale connections
leader_ttl 15s Faster failover (was 40s); 15s balances speed vs false positives
leader_heartbeat_interval 5s Refresh 3x before TTL expires (15/5=3); ensures leader doesn't lose lock

Acceptance Criteria

  • All 9 new settings added to config.py with Field definitions
  • All settings documented in .env.example with comments
  • Settings use performance-optimized defaults
  • make check-env passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseenhancementNew feature or requestperformancePerformance related items
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions