RAG System

A production-grade, multi-provider Retrieval Augmented Generation (RAG) system with comprehensive multi-modal capabilities.

Overview

This system implements a sophisticated RAG pipeline that combines text and visual information retrieval with state-of-the-art language models. It supports multiple LLM providers and includes comprehensive evaluation frameworks.

Key Features

Multi-Provider Architecture: Seamless integration with OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and Ollama
Multi-Modal Processing: Handles both text and images from PDF documents
Visual RAG: Automated image captioning enables text-based retrieval of visual concepts
Hybrid Retrieval: Vector similarity search with metadata filtering
Production-Ready: Comprehensive testing, evaluation, and monitoring capabilities
Cost Tracking: Built-in token usage and cost estimation across providers

Architecture

The system follows a modular pipeline architecture:

PDF Documents → Ingestion → Vector Storage → Retrieval → Generation → Answer
                    ↓            ↓              ↓           ↓
                 Text/Images  ChromaDB    Hybrid Search  Multi-LLM

For detailed architecture documentation, see ARCHITECTURE.md.

Project Structure

production_rag/
├── config.py              # Central configuration
├── main.py               # CLI entry point
├── ingestion/            # Document processing
│   ├── extractors.py    # PDF text/image extraction
│   ├── processors.py    # Chunking and captioning
│   └── storage.py       # Vector database operations
├── retrieval/           # Search and ranking
│   └── search.py       # Hybrid retrieval implementation
├── generation/          # Answer generation
│   └── rag.py          # Multi-provider LLM clients
└── tests/              # Testing and evaluation
    ├── test_*.py       # Unit tests
    ├── benchmark_diverse_questions.py
    └── ragas_benchmark.py

Installation

Prerequisites

Python 3.10 or higher
pip package manager
Virtual environment (recommended)

Setup Steps

Create Virtual Environment:
```
python -m venv .venv
```

Activate Environment:

# Windows
.venv\Scripts\Activate

# Linux/Mac
source .venv/bin/activate

Install Dependencies:

cd production_rag
pip install -r requirements.txt

Configure Environment:

cp .env.example .env

Edit .env and add your API keys:

LLM_PROVIDER=openai
OPENAI_API_KEY=your_key_here
ANTHROPIC_API_KEY=your_key_here
GOOGLE_API_KEY=your_key_here
DEEPSEEK_API_KEY=your_key_here

Usage

All commands should be run from the repository root directory.

Document Ingestion

Process PDF documents into the vector database:

python -m production_rag.main ingest [OPTIONS]

Options:

--reset: Clear existing database before ingestion
--pdf PATH: Process specific PDF file

Example:

python -m production_rag.main ingest --reset

This will:

Extract text and images from PDFs in basic_rag/data/
Generate captions for images using the configured VLM
Chunk text into semantic units
Create embeddings and store in ChromaDB

Querying

Ask questions to the RAG system:

python -m production_rag.main query "Your question here"

Example:

python -m production_rag.main query "What is the Mastra framework?"

The system will:

Retrieve relevant context from the vector store
Assemble context with metadata
Generate an answer using the configured LLM
Display the answer with source citations and cost metrics

Configuration

Provider Selection

Edit config.py to change the default provider:

LLM_PROVIDER = "openai"  # Options: openai, anthropic, google, deepseek, ollama

Or set via environment variable:

export LLM_PROVIDER=deepseek

Model Configuration

Each provider has configurable models in config.py:

MODELS = {
    "openai": {
        "text": "gpt-4o-mini",
        "vision": "gpt-4o-mini",
        "embedding": "text-embedding-3-small"
    },
    # ... other providers
}

Retrieval Parameters

Adjust retrieval behavior in config.py:

TOP_K_RETRIEVAL = 10  # Initial retrieval count
TOP_K_FINAL = 5       # Final context size
CHUNK_SIZE = 1000     # Text chunk size
CHUNK_OVERLAP = 200   # Overlap between chunks

Testing and Evaluation

See TESTING.md for comprehensive testing documentation including:

Unit tests
Integration benchmarks
RAGAS evaluation framework
Performance metrics

Documentation

ARCHITECTURE.md - System architecture and data flow
TESTING.md - Testing and evaluation guide
CONFIG.md - Configuration reference

Performance

Typical performance metrics:

Ingestion: ~2-5 seconds per PDF page (with image captioning)
Query: 1-3 seconds end-to-end (depending on provider)
Cost: $0.001-$0.01 per query (varies by provider and model)

License

This project is for educational and research purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG System

Overview

Key Features

Architecture

Project Structure

Installation

Prerequisites

Setup Steps

Usage

Document Ingestion

Querying

Configuration

Provider Selection

Model Configuration

Retrieval Parameters

Testing and Evaluation

Documentation

Performance

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
experiments		experiments
generation		generation
ingestion		ingestion
retrieval		retrieval
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONFIG.md		CONFIG.md
README.md		README.md
TESTING.md		TESTING.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Ravikiran-Bhonagiri/RAG-Sys

Folders and files

Latest commit

History

Repository files navigation

RAG System

Overview

Key Features

Architecture

Project Structure

Installation

Prerequisites

Setup Steps

Usage

Document Ingestion

Querying

Configuration

Provider Selection

Model Configuration

Retrieval Parameters

Testing and Evaluation

Documentation

Performance

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages