A local Retrieval-Augmented Generation (RAG) system using ChromaDB and Ollama with three operational modes for different use cases.
local-rag/
├── config.py # Configuration settings with triple-mode support
├── process_docs.py # Script to process and index documents into ChromaDB
├── rag_query.py # Triple-mode query interface (QA, Summary, Extract)
├── extract_documents.py # Systematic document extraction using map-reduce
├── check_db.py # Script to check the database contents
├── debug_db.py # Database debugging utilities
├── test_rag.py # Test script for RAG functionality
├── documents/ # Folder containing documents to be indexed
├── chroma_db/ # ChromaDB vector database storage
├── other/ # Development/experimental features (web interface, etc.)
│ ├── web_rag.py # Flask-based web server (in development)
│ ├── test_web.py # Web server testing script
│ ├── start_server.sh # Script to start the web server
│ ├── stop_server.sh # Script to stop the web server
│ └── NETWORK_ACCESS.md # Network access documentation
├── pixi.toml # Pixi dependency configuration
└── pixi.lock # Pixi lock file
This RAG system requires Ollama to be installed and running on your machine. Ollama provides the local LLM capabilities.
Installation:
-
macOS/Linux:
curl -fsSL https://ollama.com/install.sh | sh -
Windows: Download from ollama.com
Start Ollama:
ollama serveRequired Models:
The system uses the following models by default (configured in config.py):
- LLM:
llama3.1:8b- For text generation - Embeddings:
nomic-embed-text- For document embeddings
Pull these models before running the RAG system:
ollama pull llama3.1:8b
ollama pull nomic-embed-textVerify Ollama is Running:
ollama list # Should show installed modelsThe project uses Pixi for dependency management. All dependencies are already configured in pixi.toml.
To index documents into the vector database:
pixi run python process_docs.pyThis will process all documents in the documents/ folder and create embeddings in ChromaDB.
Configuration: The system uses optimized chunking parameters:
- Chunk Size: 512 tokens (for semantic coherence)
- Chunk Overlap: 128 tokens (25% overlap for context preservation)
The RAG system now supports three operational modes, each optimized for different use cases:
For specific questions requiring precise answers from the most relevant documents.
Parameters:
- Retrieves top-5 most relevant chunks
- Uses MMR (Maximum Marginal Relevance) for diversity
- Low temperature (0.1) for factual responses
Usage:
# Interactive mode
pixi run python rag_query.py
# Single question
pixi run python rag_query.py "What are nicotine pouches?"
# Without sources
pixi run python rag_query.py --no-sources "What health effects are documented?"For broad questions requiring comprehensive analysis from many documents.
Parameters:
- Retrieves top-50 most relevant chunks
- Uses similarity search for maximum relevance
- Higher temperature (0.3) for synthesized responses
Usage:
# Interactive mode
pixi run python rag_query.py --mode summary
# Single question
pixi run python rag_query.py --mode summary "Summarize all health effects research"
# Save output to file
pixi run python rag_query.py --mode summary "Overview of nicotine pouch research" > summary.txtFor extracting structured information from ALL documents systematically using map-reduce pattern. This mode processes every document individually, then combines results.
Parameters:
- Zero temperature (0.0) for consistent extraction
- Processes documents in batches
- Map-reduce approach for comprehensive coverage
Usage:
# Test with limited documents first
pixi run python extract_documents.py "List all chemicals mentioned" --max-docs 5
# Extract from all documents and save to JSON
pixi run python extract_documents.py "List all chemicals mentioned" -o chemicals.json
# Extract metadata
pixi run python extract_documents.py "Extract: title, authors, publication year, journal" -o metadata.json
# Classify papers
pixi run python extract_documents.py "Classify each paper as: Review Article, Original Research, or Meta-Analysis" -o paper_types.json
# Find specific mentions
pixi run python extract_documents.py "Which papers mention ONP? List paper title and what they say about ONP" -o onp_papers.json
# Quiet mode (minimal output)
pixi run python extract_documents.py "Extract study designs and sample sizes" -o studies.json -q⏱️ Performance Note: Extract mode processes each document through the LLM sequentially. For 100 documents, expect 30-60 minutes processing time. Always test with --max-docs 5 first.
Extract Mode Features:
- ✅ Processes every document systematically (not just retrieved chunks)
- ✅ Real-time progress tracking with ETA
- ✅ Handles missing information gracefully ("Not mentioned")
- ✅ Can infer/classify based on content (e.g., paper types)
- ✅ Saves individual extractions + combined results to JSON
- ✅ Deduplicates and structures final output
In interactive mode (pixi run python rag_query.py), you can use these commands:
# Switch modes
mode qa # Switch to QA mode
mode summary # Switch to Summary mode
mode extract # (Shows extract_documents.py usage)
# Toggle source display
sources on # Show source documents
sources off # Hide source documents
# Exit
quit # or 'exit' or 'q'To view the documents and chunks stored in the database:
pixi run python check_db.pyTo run tests on the RAG functionality:
pixi run python test_rag.pyEdit config.py to modify system behavior. The configuration now includes mode-specific settings:
- Document paths: Where to find documents to index
- Chunk size: 512 tokens (optimized for semantic coherence)
- Chunk overlap: 128 tokens (25% overlap)
- Embedding model: nomic-embed-text
- LLM model: llama3.1:8b
QA Mode:
RETRIEVAL_K: 5 chunksRETRIEVAL_SEARCH_TYPE: "mmr" (Maximum Marginal Relevance)TEMPERATURE: 0.1 (factual)
Summary Mode:
RETRIEVAL_K: 50 chunksRETRIEVAL_SEARCH_TYPE: "similarity" (relevance-focused)TEMPERATURE: 0.3 (synthesized)
Extract Mode:
TEMPERATURE: 0.0 (consistent)BATCH_SIZE: 10 documents per batch- Custom prompts for map-reduce extraction
Change the default mode:
DEFAULT_MODE = "qa" # or "summary" or "extract"| Task | Recommended Mode | Example |
|---|---|---|
| Quick fact lookup | QA Mode | "What is the nicotine content in ZYN?" |
| Literature review | Summary Mode | "Summarize all health effects research" |
| Data extraction | Extract Mode | "List all chemicals in each paper" |
| Finding papers | Extract Mode | "Which papers mention oral nicotine pouches?" |
| Classification | Extract Mode | "Classify papers as review or original research" |
| Metadata extraction | Extract Mode | "Extract: title, authors, year, journal" |
config.py- Central configuration with triple-mode supportprocess_docs.py- Document processing and indexing pipelinerag_query.py- Triple-mode RAG interface (QA, Summary, Extract-aware)extract_documents.py- NEW: Systematic extraction with map-reducecheck_db.py- Database inspection utilitiesdebug_db.py- Database debugging toolstest_rag.py- Testing suite for RAG system
# Extract chemicals with health effects
pixi run python extract_documents.py "For each paper, list: 1) Chemicals mentioned, 2) Associated health effects" -o chemicals_effects.json
# Find papers with specific criteria
pixi run python extract_documents.py "List papers that discuss cardiovascular effects. Include paper title and main findings" -o cardio_papers.json
# Extract study characteristics
pixi run python extract_documents.py "Extract: Study type (RCT/observational/review), Sample size, Population studied, Main outcome" -o study_design.json
# Comparative analysis
pixi run python extract_documents.py "Compare nicotine concentrations reported across papers" -o nicotine_comparison.json# Comprehensive topic overview
pixi run python rag_query.py --mode summary "What are the main health concerns with nicotine pouches?"
# Research trends
pixi run python rag_query.py --mode summary "What research methodologies are commonly used?"
# Synthesize findings
pixi run python rag_query.py --mode summary "Summarize contradictory findings across studies"# Specific factual questions
pixi run python rag_query.py "What is the FDA's stance on nicotine pouches?"
# Quick lookups
pixi run python rag_query.py "What brands are mentioned most frequently?"
# Definition queries
pixi run python rag_query.py "What is snus?"The other/ directory contains experimental and development features:
- Web Interface (in development): A Flask-based web server providing a browser interface for the RAG system
- See
other/NETWORK_ACCESS.mdfor network configuration details - Use
other/start_server.shandother/stop_server.shfor server management
- See
The system uses 512-token chunks (reduced from 1000) for:
- More precise semantic matching
- Better retrieval accuracy
- Reduced noise in results
| Mode | Speed | Coverage | Best For |
|---|---|---|---|
| QA | ⚡ Fast (seconds) | 5 chunks | Specific questions |
| Summary | 🏃 Medium (seconds) | 50 chunks | Broad overviews |
| Extract | 🐌 Slow (30-60 min for 100 docs) | All documents | Systematic extraction |
- The system uses the default "langchain" collection in ChromaDB
- Documents are automatically chunked with optimized parameters (512/128)
- Ensure Ollama is running before starting any RAG operations
- Extract mode processes documents sequentially - use
--max-docsfor testing - All modes support source citation display (toggle with
--sources/--no-sources) - JSON output from extract mode is structured for programmatic processing
Extract mode is slow:
- This is expected behavior - it processes every document through the LLM
- Use
--max-docs 5to test before full runs - Consider running overnight for large document sets
Out of context errors:
- Reduce
MAX_CONTEXT_LENGTHin config.py - Reduce chunk size or retrieval count
Poor retrieval quality:
- Try different modes (Summary for broader coverage)
- Adjust temperature settings in config.py
- Rephrase your query to be more specific
Database issues:
- Run
check_db.pyto verify contents - Reprocess documents with
process_docs.pyif needed