Skip to content

Kasshern/multimodal-rag-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Multi-Modal RAG Agent with Agentic Workflows

A production-grade Retrieval-Augmented Generation (RAG) system with advanced agentic capabilities, built with LangGraph, Anthropic Claude, and modern Python stack.

🎯 Features

  • Multi-Modal Document Processing: PDF, images (OCR), tables, markdown, HTML, and code
  • Hybrid Search: Dense (semantic) + sparse (BM25) retrieval with re-ranking
  • Agentic Workflows: LangGraph-powered state machine with planning, tool use, and self-reflection
  • Production-Ready API: FastAPI with async endpoints, rate limiting, and CORS
  • Advanced Retrieval: Query transformation, HyDE, multi-query generation
  • Full Observability: LangSmith tracing, Prometheus metrics, structured logging
  • Evaluation Framework: RAGAS metrics and quality benchmarking
  • Scalable Architecture: Redis caching, vector DB (Qdrant/ChromaDB), Docker deployment

πŸ“‹ Table of Contents

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        FastAPI REST API                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  Query   β”‚  Ingest  β”‚  Health  β”‚  Metrics β”‚  Streaming β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    RAG Agent (LangGraph)                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Planner β†’ Retriever β†’ Re-Ranker β†’ Generator β†’ Reflector β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚                  β”‚                  β”‚
           β–Ό                  β–Ό                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Tool Registry β”‚  β”‚  Hybrid Search   β”‚  β”‚  LLM Client    β”‚
β”‚  - Retriever   β”‚  β”‚  - Dense (Vec)   β”‚  β”‚  (Claude 3.5)  β”‚
β”‚  - Calculator  β”‚  β”‚  - Sparse (BM25) β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚  - Web Search  β”‚  β”‚  - Re-ranking    β”‚
β”‚  - Code Exec   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
                             β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚  Vector Database      β”‚
                  β”‚  - Qdrant / ChromaDB  β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

5-Minute Setup

  1. Clone and navigate:
git clone <repository-url>
cd multimodal-rag-agent
  1. Create environment file:
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY
  1. Start with Docker:
docker-compose -f docker/docker-compose.yml up -d
  1. Verify it's running:
curl http://localhost:8000/api/v1/health
  1. Test query:
curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is machine learning?"}'

πŸ“¦ Installation

Prerequisites

  • Python 3.10+
  • Docker & Docker Compose (for containerized deployment)
  • Tesseract OCR (for image processing)
  • Anthropic API key

Local Development Setup

  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Install Tesseract (for OCR):
# macOS
brew install tesseract

# Ubuntu/Debian
sudo apt-get install tesseract-ocr

# Windows
# Download from: https://github.com/UB-Mannheim/tesseract/wiki
  1. Set up environment:
cp .env.example .env
# Edit .env with your configuration
  1. Start services:
# Option 1: Use Docker for dependencies only
docker-compose -f docker/docker-compose.yml up -d qdrant redis

# Option 2: Install locally (Qdrant, Redis)

βš™οΈ Configuration

Environment Variables

Edit .env file with your configuration:

# LLM Configuration
ANTHROPIC_API_KEY=your_api_key_here
LLM_MODEL=claude-3-5-sonnet-20241022
LLM_TEMPERATURE=0.0
LLM_MAX_TOKENS=4096

# Vector Database (choose one)
QDRANT_URL=http://localhost:6333
# OR
CHROMA_PERSIST_DIR=./data/chroma
VECTOR_DB_TYPE=qdrant  # or 'chroma'

# Redis Cache
REDIS_URL=redis://localhost:6379/0

# Observability
LANGSMITH_API_KEY=your_langsmith_key  # Optional
LANGSMITH_TRACING=false
LOG_LEVEL=INFO

# Retrieval Configuration
RETRIEVAL_TOP_K=10
RERANK_TOP_K=5
CHUNK_SIZE=512
CHUNK_OVERLAP=50

πŸ“š Usage

Starting the Application

Development mode:

make run
# or
uvicorn src.api.main:app --reload

Production mode:

make run-prod
# or
uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --workers 4

Document Ingestion

Ingest a directory:

python scripts/ingest_documents.py /path/to/documents --collection my_docs

Via API:

# Single document
curl -X POST http://localhost:8000/api/v1/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your document text here",
    "metadata": {"source": "api", "type": "text"}
  }'

# Batch from directory
curl -X POST http://localhost:8000/api/v1/ingest/batch \
  -H "Content-Type: application/json" \
  -d '{
    "directory": "/path/to/documents",
    "recursive": true
  }'

# File upload
curl -X POST http://localhost:8000/api/v1/ingest/upload \
  -F "[email protected]"

Querying the RAG System

Python SDK:

import asyncio
from src.agents.rag_agent import create_rag_agent
from src.retrieval.vector_store import get_retriever

async def query_rag():
    retriever = await get_retriever()
    agent = await create_rag_agent(retriever)

    result = await agent.run(
        query="What is machine learning?",
    )

    print(f"Answer: {result['answer']}")
    print(f"Documents: {len(result['documents'])}")
    print(f"Reflection Score: {result['reflection']['score']}/10")

asyncio.run(query_rag())

API Call:

curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Explain neural networks",
    "top_k": 5,
    "use_reflection": true
  }'

Streaming Response:

curl -X POST http://localhost:8000/api/v1/query/stream \
  -H "Content-Type: application/json" \
  -d '{"query": "What is deep learning?"}' \
  --no-buffer

πŸ“– API Documentation

Once the server is running, visit:

Main Endpoints

Endpoint Method Description
/api/v1/query POST Query the RAG system
/api/v1/query/stream POST Stream query results
/api/v1/ingest POST Ingest single document
/api/v1/ingest/batch POST Batch ingest from directory
/api/v1/ingest/upload POST Upload and ingest file
/api/v1/health GET Health check with dependencies
/api/v1/health/readiness GET Kubernetes readiness probe
/api/v1/health/liveness GET Kubernetes liveness probe

πŸ› οΈ Development

Project Structure

multimodal-rag-agent/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agents/           # Agent workflows and tools
β”‚   β”œβ”€β”€ api/              # FastAPI application
β”‚   β”œβ”€β”€ database/         # Database models and interfaces
β”‚   β”œβ”€β”€ evaluation/       # Metrics and benchmarking
β”‚   β”œβ”€β”€ generation/       # LLM client and prompts
β”‚   β”œβ”€β”€ ingestion/        # Document loaders and chunking
β”‚   β”œβ”€β”€ observability/    # Logging, tracing, metrics
β”‚   β”œβ”€β”€ retrieval/        # Embeddings, vector store, search
β”‚   └── utils/            # Config, cache, exceptions
β”œβ”€β”€ tests/                # Unit, integration, evaluation tests
β”œβ”€β”€ scripts/              # CLI scripts for operations
β”œβ”€β”€ configs/              # Configuration files
└── docker/               # Docker and deployment files

Code Quality

Linting and formatting:

make lint    # Check code quality
make format  # Auto-fix issues

Type checking:

# Ruff includes basic type checking
make lint

πŸ§ͺ Testing

Run all tests:

make test

Run with coverage:

make test-cov

Run specific test suites:

make test-unit         # Unit tests only
make test-integration  # Integration tests only

Individual test file:

pytest tests/unit/test_retrieval.py -v

Evaluation

Run quality evaluation:

python scripts/evaluate_rag.py configs/evaluation/sample_queries.json

Run performance benchmark:

python scripts/benchmark.py --num-queries 20

🚒 Deployment

Docker Deployment

Build image:

make docker-build

Start all services:

make docker-up

View logs:

make docker-logs

Stop services:

make docker-down

Production Checklist

  • Set ENVIRONMENT=production in .env
  • Configure proper ANTHROPIC_API_KEY
  • Set up persistent volumes for data
  • Configure CORS origins
  • Set up Prometheus and Grafana
  • Enable LangSmith tracing
  • Configure rate limiting
  • Set up health checks
  • Configure SSL/TLS
  • Set up log aggregation

Kubernetes Deployment

Example Kubernetes manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rag-agent
  template:
    metadata:
      labels:
        app: rag-agent
    spec:
      containers:
      - name: rag-agent
        image: multimodal-rag-agent:latest
        ports:
        - containerPort: 8000
        env:
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              name: rag-secrets
              key: anthropic-api-key
        livenessProbe:
          httpGet:
            path: /api/v1/health/liveness
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /api/v1/health/readiness
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5

πŸ” Troubleshooting

Common Issues

1. "ANTHROPIC_API_KEY not found"

# Ensure .env file exists and contains your API key
cp .env.example .env
# Edit .env and add ANTHROPIC_API_KEY=your_key

2. "Connection refused to Qdrant/Redis"

# Start dependencies with Docker
docker-compose -f docker/docker-compose.yml up -d qdrant redis

# Or check if services are running
docker ps

3. "Module not found" errors

# Ensure you're in the virtual environment
source venv/bin/activate

# Reinstall dependencies
pip install -r requirements.txt

4. OCR not working

# Install Tesseract OCR
brew install tesseract  # macOS
sudo apt-get install tesseract-ocr  # Ubuntu

5. Out of memory errors

# Reduce batch size in configuration
CHUNK_SIZE=256
RETRIEVAL_TOP_K=5

Debug Mode

Enable debug logging:

export LOG_LEVEL=DEBUG
python -m src.api.main

Health Check

# Check system health
curl http://localhost:8000/api/v1/health

# Expected response:
{
  "status": "healthy",
  "version": "0.1.0",
  "dependencies": {
    "llm": "configured",
    "vector_db": "healthy",
    "redis": "healthy"
  }
}

πŸ“Š Monitoring

Prometheus Metrics

Available at: http://localhost:9090

Key metrics:

  • rag_queries_total - Total RAG queries
  • rag_query_duration_seconds - Query latency
  • rag_reflection_score - Answer quality scores
  • llm_requests_total - LLM API calls
  • cache_hits_total / cache_misses_total - Cache performance

Grafana Dashboards

Available at: http://localhost:3000 (admin/admin)

Import dashboards for:

  • RAG query performance
  • LLM usage and costs
  • System resource utilization

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

MIT License - see LICENSE file for details

πŸ™ Acknowledgments

πŸ“ž Support


Built with ❀️ using Python, LangGraph, and Claude

About

Production-grade multimodal RAG system with agentic workflows

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages