Welcome to the Document Search RAG System documentation.
-
- Complete architecture overview
- Module responsibilities
- Design principles
- Extension points
-
- Advanced capabilities
- Docling and Chonkie integration details
- Performance optimization
- Benchmarks and best practices
-
- Python API documentation
- CLI commands
- Configuration options
-
- Installing Qdrant locally
- Docker setup
- Configuration tips
-
- Getting started with vector search
- Basic operations
- Example queries
-
Processing Large Document Collections
- Handling 1000+ documents
- Batch processing strategies
- Memory optimization
-
- Extending the document processor
- Adding new file formats
- Custom chunking strategies
┌─────────────────────────────────────────────────────────────┐
│ Document Corpus (1000+ docs) │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Docling Parser │
│ • Text, Table & Image Extraction │
│ • Multi-format Support │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Chonkie Chunker │
│ • Semantic & Token-based Chunking │
│ • Configurable Overlap │
└────────────┬───────────────────────────┬────────────────────┘
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────┐
│ SQLite/PostgreSQL │ │ Qdrant Vector DB │
│ • Document Metadata │ │ • Vector Embeddings │
│ • Tables & Images │ │ • Similarity Search │
└──────────────────────────┘ └──────────────────────────────┘
│ │
└───────────┬───────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ RAG System │
│ • Query Processing │
│ • Context Retrieval │
│ • Answer Generation │
└─────────────────────────────────────────────────────────────┘
- Main README - Getting started
- Configuration Guide - Environment setup
- API Examples - Code examples
- Troubleshooting - Common issues
| Component | Metric | Performance |
|---|---|---|
| Document Processing | Speed | 500-1000 docs/hour |
| Chunking | Throughput | 10,000 chunks/min |
| Vector Indexing | Speed | 50,000 vectors/min |
| Search | Latency | <500ms |
| RAG Generation | Response Time | 2-5 seconds |
.env- Environment variablessrc/config/settings.py- Application settingsdocker-compose.yml- Docker services
See our Contributing Guide for information on:
- Code style
- Testing requirements
- PR process
- Development setup
This project is licensed under the MIT License.