Problem: ModuleNotFoundError: No module named 'src.document_processor'
Solution: Updated all example files to use new project structure:
src.document_processor→src.processing.document_processorsrc.enhanced_rag→src.retrieval.enhanced_ragsrc.document_search→src.retrieval.base_ragsrc.database_manager→src.storage.database_manager
Files Fixed:
- ✅
examples/process_documents.py - ✅
examples/query_enhanced_rag.py - ✅
examples/use_local_qdrant.py - ✅
tests/test_document_search.py - ✅
src/__init__.py
Problem: TypeError: DocumentSearchRAG.__init__() got an unexpected keyword argument 'collection_name'
Root Cause: EnhancedDocumentRAG was trying to pass individual parameters to DocumentSearchRAG, but the base class only accepts config_path.
Solution: Changed EnhancedDocumentRAG.__init__() to match base class signature:
Before:
def __init__(
self,
collection_name: str = "enhanced_documents",
embedding_model: str = "...",
qdrant_url: str = "localhost",
qdrant_port: int = 6333,
openai_model: str = "gpt-4",
...
):
super().__init__(
collection_name=collection_name, # ❌ Base doesn't accept this
embedding_model=embedding_model,
...
)After:
def __init__(
self,
config_path: str = "config.yaml", # ✅ Base accepts this
chunk_size: int = 512,
use_semantic_chunking: bool = True,
embedding_model: str = "..."
):
super().__init__(config_path=config_path) # ✅ Works!Problem: pytest: error: unrecognized arguments: --cov=src
Solution: Installed pytest-cov package
Result:
- ✅ All 13 tests passing
- ✅ 82% coverage on base_rag.py
- ✅ HTML coverage report generated
- Import paths fixed across all files
- Enhanced RAG initialization fixed
- Tests running successfully (13/13 passing)
- Coverage reporting working
- Example scripts ready to run
from src.retrieval.base_rag import DocumentSearchRAG, Document
rag = DocumentSearchRAG()
doc = Document(id="1", title="Test", content="Hello world")
rag.index_documents([doc])
results = rag.search("hello")from src.retrieval.enhanced_rag import EnhancedDocumentRAG
# Initialize (requires OPENAI_API_KEY in .env)
rag = EnhancedDocumentRAG()
# Process real documents (PDF, DOCX, etc.)
stats = rag.process_and_index_documents("test_documents/")
# Search with filters
results = rag.enhanced_search(
"machine learning",
category_filter="AI/ML",
include_tables=True
)# Activate environment
source .venv/bin/activate
# Process documents
python examples/process_documents.py test_documents/
# Query system
python examples/query_enhanced_rag.py
# Test local Qdrant
python examples/use_local_qdrant.pypytest # Run all tests
pytest -v # Verbose output
pytest --no-cov # Skip coverage (faster)
open htmlcov/index.html # View coverage reportdocument_search/
├── src/
│ ├── __init__.py ✅ Fixed imports
│ ├── config/
│ │ └── settings.py
│ ├── core/
│ │ ├── pipeline.py
│ │ └── rag_system.py
│ ├── processing/
│ │ └── document_processor.py ← Docling + Chonkie
│ ├── retrieval/
│ │ ├── base_rag.py ← DocumentSearchRAG
│ │ └── enhanced_rag.py ← EnhancedDocumentRAG ✅ Fixed
│ └── storage/
│ ├── database_manager.py
│ └── vector_store.py
├── examples/
│ ├── process_documents.py ✅ Fixed imports
│ ├── query_enhanced_rag.py ✅ Fixed imports
│ └── use_local_qdrant.py ✅ Fixed imports
├── tests/
│ └── test_document_search.py ✅ Fixed imports
├── test_documents/ ← Sample documents
├── qdrant_db/ ← Local vector storage
├── config.yaml ← Configuration
├── .env ← API keys (create from sample.env)
└── pyproject.toml
- API Key Required: Both RAG systems need
OPENAI_API_KEYin.envfile - Config File: Settings are in
config.yaml(collection name, model, etc.) - Local Qdrant: Vector database stored in
./qdrant_db/(no server needed) - Test Documents: Use files in
test_documents/for testing
- ✅ All errors fixed - system is ready to use!
- Add your OpenAI API key to
.envfile - Run example scripts to see the system in action
- Process your own documents
- Build your RAG application!
🎉 Everything is working!