Skip to content

Latest commit

 

History

History
122 lines (98 loc) · 5.39 KB

File metadata and controls

122 lines (98 loc) · 5.39 KB

Documentation Index

Welcome to the Document Search RAG System documentation.

📖 Documentation Overview

Core Documentation

  1. Project Structure

    • Complete architecture overview
    • Module responsibilities
    • Design principles
    • Extension points
  2. Enhanced Features Guide

    • Advanced capabilities
    • Docling and Chonkie integration details
    • Performance optimization
    • Benchmarks and best practices
  3. API Reference

    • Python API documentation
    • CLI commands
    • Configuration options

Setup Guides

  1. Qdrant Local Setup

    • Installing Qdrant locally
    • Docker setup
    • Configuration tips
  2. Qdrant Quick Start

    • Getting started with vector search
    • Basic operations
    • Example queries

Tutorials

  1. Processing Large Document Collections

    • Handling 1000+ documents
    • Batch processing strategies
    • Memory optimization
  2. Custom Document Processors

    • Extending the document processor
    • Adding new file formats
    • Custom chunking strategies

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Document Corpus (1000+ docs)            │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                    Docling Parser                            │
│  • Text, Table & Image Extraction                            │
│  • Multi-format Support                                      │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                    Chonkie Chunker                           │
│  • Semantic & Token-based Chunking                           │
│  • Configurable Overlap                                      │
└────────────┬───────────────────────────┬────────────────────┘
              │                           │
              ▼                           ▼
┌──────────────────────────┐  ┌──────────────────────────────┐
│    SQLite/PostgreSQL     │  │      Qdrant Vector DB        │
│  • Document Metadata     │  │  • Vector Embeddings         │
│  • Tables & Images       │  │  • Similarity Search         │
└──────────────────────────┘  └──────────────────────────────┘
              │                           │
              └───────────┬───────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                    RAG System                                │
│  • Query Processing                                          │
│  • Context Retrieval                                         │
│  • Answer Generation                                         │
└─────────────────────────────────────────────────────────────┘

🚀 Quick Links

📊 Performance Metrics

Component Metric Performance
Document Processing Speed 500-1000 docs/hour
Chunking Throughput 10,000 chunks/min
Vector Indexing Speed 50,000 vectors/min
Search Latency <500ms
RAG Generation Response Time 2-5 seconds

🔧 Configuration Files

  • .env - Environment variables
  • src/config/settings.py - Application settings
  • docker-compose.yml - Docker services

📝 Contributing

See our Contributing Guide for information on:

  • Code style
  • Testing requirements
  • PR process
  • Development setup

📄 License

This project is licensed under the MIT License.