A production-grade Retrieval-Augmented Generation (RAG) system with advanced agentic capabilities, built with LangGraph, Anthropic Claude, and modern Python stack.
- Multi-Modal Document Processing: PDF, images (OCR), tables, markdown, HTML, and code
- Hybrid Search: Dense (semantic) + sparse (BM25) retrieval with re-ranking
- Agentic Workflows: LangGraph-powered state machine with planning, tool use, and self-reflection
- Production-Ready API: FastAPI with async endpoints, rate limiting, and CORS
- Advanced Retrieval: Query transformation, HyDE, multi-query generation
- Full Observability: LangSmith tracing, Prometheus metrics, structured logging
- Evaluation Framework: RAGAS metrics and quality benchmarking
- Scalable Architecture: Redis caching, vector DB (Qdrant/ChromaDB), Docker deployment
- Architecture
- Quick Start
- Installation
- Configuration
- Usage
- API Documentation
- Development
- Testing
- Deployment
- Troubleshooting
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI REST API β
β ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββ β
β β Query β Ingest β Health β Metrics β Streaming β β
β ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAG Agent (LangGraph) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Planner β Retriever β Re-Ranker β Generator β Reflector β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββ
β Tool Registry β β Hybrid Search β β LLM Client β
β - Retriever β β - Dense (Vec) β β (Claude 3.5) β
β - Calculator β β - Sparse (BM25) β ββββββββββββββββββ
β - Web Search β β - Re-ranking β
β - Code Exec β ββββββββββββββββββββ
ββββββββββββββββββ β
βΌ
βββββββββββββββββββββββββ
β Vector Database β
β - Qdrant / ChromaDB β
βββββββββββββββββββββββββ
- Clone and navigate:
git clone <repository-url>
cd multimodal-rag-agent- Create environment file:
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY- Start with Docker:
docker-compose -f docker/docker-compose.yml up -d- Verify it's running:
curl http://localhost:8000/api/v1/health- Test query:
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{"query": "What is machine learning?"}'- Python 3.10+
- Docker & Docker Compose (for containerized deployment)
- Tesseract OCR (for image processing)
- Anthropic API key
- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Install Tesseract (for OCR):
# macOS
brew install tesseract
# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# Windows
# Download from: https://github.com/UB-Mannheim/tesseract/wiki- Set up environment:
cp .env.example .env
# Edit .env with your configuration- Start services:
# Option 1: Use Docker for dependencies only
docker-compose -f docker/docker-compose.yml up -d qdrant redis
# Option 2: Install locally (Qdrant, Redis)Edit .env file with your configuration:
# LLM Configuration
ANTHROPIC_API_KEY=your_api_key_here
LLM_MODEL=claude-3-5-sonnet-20241022
LLM_TEMPERATURE=0.0
LLM_MAX_TOKENS=4096
# Vector Database (choose one)
QDRANT_URL=http://localhost:6333
# OR
CHROMA_PERSIST_DIR=./data/chroma
VECTOR_DB_TYPE=qdrant # or 'chroma'
# Redis Cache
REDIS_URL=redis://localhost:6379/0
# Observability
LANGSMITH_API_KEY=your_langsmith_key # Optional
LANGSMITH_TRACING=false
LOG_LEVEL=INFO
# Retrieval Configuration
RETRIEVAL_TOP_K=10
RERANK_TOP_K=5
CHUNK_SIZE=512
CHUNK_OVERLAP=50Development mode:
make run
# or
uvicorn src.api.main:app --reloadProduction mode:
make run-prod
# or
uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --workers 4Ingest a directory:
python scripts/ingest_documents.py /path/to/documents --collection my_docsVia API:
# Single document
curl -X POST http://localhost:8000/api/v1/ingest \
-H "Content-Type: application/json" \
-d '{
"text": "Your document text here",
"metadata": {"source": "api", "type": "text"}
}'
# Batch from directory
curl -X POST http://localhost:8000/api/v1/ingest/batch \
-H "Content-Type: application/json" \
-d '{
"directory": "/path/to/documents",
"recursive": true
}'
# File upload
curl -X POST http://localhost:8000/api/v1/ingest/upload \
-F "[email protected]"Python SDK:
import asyncio
from src.agents.rag_agent import create_rag_agent
from src.retrieval.vector_store import get_retriever
async def query_rag():
retriever = await get_retriever()
agent = await create_rag_agent(retriever)
result = await agent.run(
query="What is machine learning?",
)
print(f"Answer: {result['answer']}")
print(f"Documents: {len(result['documents'])}")
print(f"Reflection Score: {result['reflection']['score']}/10")
asyncio.run(query_rag())API Call:
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "Explain neural networks",
"top_k": 5,
"use_reflection": true
}'Streaming Response:
curl -X POST http://localhost:8000/api/v1/query/stream \
-H "Content-Type: application/json" \
-d '{"query": "What is deep learning?"}' \
--no-bufferOnce the server is running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
| Endpoint | Method | Description |
|---|---|---|
/api/v1/query |
POST | Query the RAG system |
/api/v1/query/stream |
POST | Stream query results |
/api/v1/ingest |
POST | Ingest single document |
/api/v1/ingest/batch |
POST | Batch ingest from directory |
/api/v1/ingest/upload |
POST | Upload and ingest file |
/api/v1/health |
GET | Health check with dependencies |
/api/v1/health/readiness |
GET | Kubernetes readiness probe |
/api/v1/health/liveness |
GET | Kubernetes liveness probe |
multimodal-rag-agent/
βββ src/
β βββ agents/ # Agent workflows and tools
β βββ api/ # FastAPI application
β βββ database/ # Database models and interfaces
β βββ evaluation/ # Metrics and benchmarking
β βββ generation/ # LLM client and prompts
β βββ ingestion/ # Document loaders and chunking
β βββ observability/ # Logging, tracing, metrics
β βββ retrieval/ # Embeddings, vector store, search
β βββ utils/ # Config, cache, exceptions
βββ tests/ # Unit, integration, evaluation tests
βββ scripts/ # CLI scripts for operations
βββ configs/ # Configuration files
βββ docker/ # Docker and deployment files
Linting and formatting:
make lint # Check code quality
make format # Auto-fix issuesType checking:
# Ruff includes basic type checking
make lintRun all tests:
make testRun with coverage:
make test-covRun specific test suites:
make test-unit # Unit tests only
make test-integration # Integration tests onlyIndividual test file:
pytest tests/unit/test_retrieval.py -vRun quality evaluation:
python scripts/evaluate_rag.py configs/evaluation/sample_queries.jsonRun performance benchmark:
python scripts/benchmark.py --num-queries 20Build image:
make docker-buildStart all services:
make docker-upView logs:
make docker-logsStop services:
make docker-down- Set
ENVIRONMENT=productionin.env - Configure proper
ANTHROPIC_API_KEY - Set up persistent volumes for data
- Configure CORS origins
- Set up Prometheus and Grafana
- Enable LangSmith tracing
- Configure rate limiting
- Set up health checks
- Configure SSL/TLS
- Set up log aggregation
Example Kubernetes manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-agent
spec:
replicas: 3
selector:
matchLabels:
app: rag-agent
template:
metadata:
labels:
app: rag-agent
spec:
containers:
- name: rag-agent
image: multimodal-rag-agent:latest
ports:
- containerPort: 8000
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: rag-secrets
key: anthropic-api-key
livenessProbe:
httpGet:
path: /api/v1/health/liveness
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/v1/health/readiness
port: 8000
initialDelaySeconds: 10
periodSeconds: 51. "ANTHROPIC_API_KEY not found"
# Ensure .env file exists and contains your API key
cp .env.example .env
# Edit .env and add ANTHROPIC_API_KEY=your_key2. "Connection refused to Qdrant/Redis"
# Start dependencies with Docker
docker-compose -f docker/docker-compose.yml up -d qdrant redis
# Or check if services are running
docker ps3. "Module not found" errors
# Ensure you're in the virtual environment
source venv/bin/activate
# Reinstall dependencies
pip install -r requirements.txt4. OCR not working
# Install Tesseract OCR
brew install tesseract # macOS
sudo apt-get install tesseract-ocr # Ubuntu5. Out of memory errors
# Reduce batch size in configuration
CHUNK_SIZE=256
RETRIEVAL_TOP_K=5Enable debug logging:
export LOG_LEVEL=DEBUG
python -m src.api.main# Check system health
curl http://localhost:8000/api/v1/health
# Expected response:
{
"status": "healthy",
"version": "0.1.0",
"dependencies": {
"llm": "configured",
"vector_db": "healthy",
"redis": "healthy"
}
}Available at: http://localhost:9090
Key metrics:
rag_queries_total- Total RAG queriesrag_query_duration_seconds- Query latencyrag_reflection_score- Answer quality scoresllm_requests_total- LLM API callscache_hits_total/cache_misses_total- Cache performance
Available at: http://localhost:3000 (admin/admin)
Import dashboards for:
- RAG query performance
- LLM usage and costs
- System resource utilization
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE file for details
- LangChain - LLM framework
- LangGraph - Agentic workflows
- Anthropic - Claude LLM
- Qdrant - Vector database
- FastAPI - Web framework
- GitHub Issues: Create an issue
- Documentation: See
/docsendpoint when running - Email: [email protected]
Built with β€οΈ using Python, LangGraph, and Claude