Scalable agent runtime demonstrating system design, resilience patterns, and production best practices.
- Circuit Breaker Pattern: Prevents cascading failures with automatic recovery
- Exponential Backoff Retry: Resilient operations with configurable retry logic
- Rate Limiting: Token bucket algorithm for API throttling (500 req/min)
- Caching Layer: In-memory cache with TTL for performance optimization
- Distributed Tracing: Correlation IDs and structured logging for observability
- Graceful Shutdown: Proper resource cleanup and signal handling
- Comprehensive Testing: 95%+ test coverage with pytest
- Performance Benchmarking: Automated load testing suite
┌──────────────────────────────────────────────────────┐
│ FastAPI + Middleware Layer │
│ (Correlation ID │ Logging │ Rate Limiting) │
└────────────┬─────────────────────────────────────────┘
│
┌────────▼────────┐
│ Orchestrator │
│ (Async Routing)│
└────────┬────────┘
│
┌────────┴───────────────────┐
│ │
┌───▼────────┐ ┌───────▼────────┐
│ LangGraph │ │ CrewAI │
│ Workflow │ │ Multi-Agent │
│ │ │ (Parallel) │
└───┬────────┘ └───────┬────────┘
│ │
└─────────┬─────────────────┘
│
┌─────────▼──────────┐
│ Resilience Layer │
│ - Circuit Breaker │
│ - Retry Logic │
│ - Rate Limiter │
│ - Cache Manager │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ Memory Manager │
│ (Stateful Store) │
└────────────────────┘
# Install dependencies
pip install -r requirements.txt
# Run tests
pytest tests/ -v --cov=src
# Start server
python main.py
# Run benchmarks (in another terminal)
python benchmark.py
# Run demo
python examples/demo.pyR/
├── src/ # Core modules
│ ├── agents.py # Agent orchestration with retry/circuit breaker
│ ├── memory.py # Stateful memory management
│ ├── inference.py # Inference layer
│ ├── telemetry.py # Metrics and observability
│ ├── resilience.py # Circuit breaker, retry, rate limit, cache
│ ├── middleware.py # Logging, correlation IDs, rate limiting
│ └── config.py # Configuration management
├── tests/ # Comprehensive test suite
│ ├── test_resilience.py # Resilience pattern tests
│ ├── test_agents.py # Agent functionality tests
│ ├── test_memory.py # Memory management tests
│ └── conftest.py # Pytest fixtures
├── examples/
│ └── demo.py # Interactive demo
├── main.py # FastAPI server with middleware
├── benchmark.py # Performance benchmarking suite
└── requirements.txt # Minimal dependencies
- Circuit Breaker: Auto-recovery from failures (3 failures → open, 60s recovery)
- Retry with Exponential Backoff: Configurable retry logic with jitter
- Graceful Degradation: Fail-safe mechanisms throughout
- Rate Limiting: Token bucket algorithm (500 req/min) prevents overload
- Caching: In-memory cache with TTL reduces redundant computation
- Async/Parallel Execution: CrewAI agents run concurrently
- Connection Pooling: Efficient resource utilization
- Structured Logging: JSON logs with contextual information
- Correlation IDs: End-to-end request tracing
- Metrics Collection: Real-time performance metrics
- Health Checks: Liveness and readiness probes
- Graceful Shutdown: SIGTERM/SIGINT handling with cleanup
- Error Handling: Comprehensive exception handling with context
- Configuration Management: Environment-based settings
- API Versioning: Semantic versioning support
- Unit Tests: 95%+ code coverage
- Integration Tests: End-to-end testing
- Performance Benchmarks: Automated load testing
- Test Fixtures: Reusable test components
| Endpoint | Method | Description | Features |
|---|---|---|---|
/api/agent/execute |
POST | Execute agent task | Retry, Circuit Breaker, Caching |
/api/memory/store |
POST | Store in memory | TTL support |
/api/memory/retrieve |
POST | Retrieve from memory | Pagination |
/api/metrics |
GET | System metrics | Real-time stats |
/health |
GET | Health check | Readiness probe |
/docs |
GET | OpenAPI docs | Interactive API |
# Run all tests with coverage
pytest tests/ -v --cov=src --cov-report=html
# Run specific test suite
pytest tests/test_resilience.py -v
# Run with markers
pytest tests/ -v -m asyncio- Resilience Patterns: Circuit breaker, retry, rate limiting
- Agent Functionality: LangGraph, CrewAI orchestration
- Memory Operations: Short-term, long-term, concurrent access
- Error Scenarios: Failure handling, recovery
Benchmark results on M1 Mac (8GB RAM):
| Operation | Avg Latency | P95 | P99 | Throughput |
|---|---|---|---|---|
| Health Check | 2.88ms | 6.27ms | 7.80ms | 347 req/s |
| Agent Execute | 56.34ms | 61.01ms | 75.34ms | 17.75 req/s |
| Memory Store | 3.51ms | 7.07ms | 12.23ms | 284 req/s |
| Concurrent (10) | 124.33ms | - | - | 48.52 req/s |
- Rate Limiting: Prevents DoS attacks (500 req/min default)
- Input Validation: Pydantic models for request validation
- Error Sanitization: No sensitive data in error responses
- Correlation IDs: Audit trail for all requests
Structured logs include:
- Request/response timing
- Error rates and types
- Memory usage statistics
- Cache hit/miss rates
- Circuit breaker state changes
# src/config.py
class Settings:
host: str = "0.0.0.0"
port: int = 8000
max_concurrent_agents: int = 10
agent_timeout: int = 300Prevents cascading failures when inference layer is slow/down. Automatically recovers without manual intervention.
Smooth traffic distribution vs hard limits. Allows burst traffic while maintaining average rate.
Essential for distributed tracing. Links all operations in a request chain for debugging.
Machine-parseable logs enable better alerting and analytics. Critical for production systems.
MIT License - Feel free to use for portfolio/learning