softrag is a minimalist local-first Retrieval-Augmented Generation (RAG) library that uses SQLite with sqlite-vec for efficient storage of documents, embeddings, and cache in a single .db file.
- Local storage: All data is kept in a single SQLite database file
- Pluggable RAG: Inject your own embedding and chat models via dependency injection
- Multi-format support: Ingests Markdown, DOCX, PDF, plain text files, and web pages
- Hybrid retrieval: Combines semantic (vector) search and keyword (FTS5) search
- Zero external dependencies: No cloud services required for storage
- Lightweight: Minimal overhead with maximum performance
pip install softragThe library requires the following dependencies:
- sqlite-vec: Vector similarity search in SQLite
- trafilatura: Web content extraction
- langchain-text-splitters: Text chunking (RecursiveCharacterTextSplitter)
- llama-index: Document readers for various file formats
- pymupdf: PDF processing (via llama-index)
These are automatically installed with the package.
- Python 3.12+
- SQLite with extension loading support
- Access to embedding and chat models (OpenAI, Hugging Face, Ollama, etc.)
from softrag import Rag
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# Initialize models
chat_model = ChatOpenAI(model="gpt-4o")
embed_model = OpenAIEmbeddings(model="text-embedding-3-small")
# Create RAG instance
rag = Rag(embed_model=embed_model, chat_model=chat_model, db_path="my_knowledge.db")
# Add content
rag.add_file("document.pdf")
rag.add_web("https://example.com/article")
# Query
answer = rag.query("What is the main topic discussed?")
print(answer)Initialize a new RAG instance.
Parameters:
embed_model: Model implementing.embed_query(text) -> List[float]chat_model: Model implementing.invoke(prompt) -> strdb_path: Path to SQLite database file (created if doesn't exist)
Add file content to the knowledge base.
Parameters:
data: File path (str/Path), bytes, or file-like objectmetadata: Optional dictionary with additional metadata
Supported formats:
- PDF files
- DOCX documents
- Markdown files
- Plain text files
- Any format supported by UnstructuredReader
Example:
rag.add_file("research.pdf", metadata={"author": "John Doe", "year": 2024})
rag.add_file(Path("notes.md"))Extract and add web page content.
Parameters:
url: Web page URLmetadata: Optional dictionary (URL is automatically added)
Example:
rag.add_web("https://arxiv.org/abs/2301.00001", metadata={"type": "paper"})Query the knowledge base with context-augmented generation.
Parameters:
question: The question to answertop_k: Number of most relevant chunks to retrieve (default: 5)stream: If True, returns generator yielding response chunks
Returns:
- String response (if
stream=False) - Generator yielding chunks (if
stream=True)
Example:
# Standard query
answer = rag.query("What are the key findings?", top_k=3)
# Streaming query
for chunk in rag.query("Explain the methodology", stream=True):
print(chunk, end="", flush=True)The default chunking uses RecursiveCharacterTextSplitter with 400 character chunks and 100 character overlap. You can customize this:
# Custom delimiter-based chunking
rag._set_splitter("\n\n") # Split on double newlines
# Custom function
def my_chunker(text):
return [chunk.strip() for chunk in text.split("---") if chunk.strip()]
rag._set_splitter(my_chunker)
# Back to default
rag._set_splitter(None)from langchain_openai import ChatOpenAI, OpenAIEmbeddings
chat_model = ChatOpenAI(model="gpt-4o", temperature=0.1)
embed_model = OpenAIEmbeddings(model="text-embedding-3-small")from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.llms import HuggingFacePipeline
embed_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
chat_model = HuggingFacePipeline.from_model_id(
model_id="microsoft/DialoGPT-medium",
task="text-generation"
)from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings
chat_model = Ollama(model="llama2")
embed_model = OllamaEmbeddings(model="llama2")The SQLite database contains three main components:
documents: Stores text chunks and metadata (JSON)docs_fts: FTS5 virtual table for keyword searchembeddings: Vector embeddings using sqlite-vec (1536 dimensions by default)
- Document Processing: Files are parsed and text is extracted
- Chunking: Text is split into manageable chunks (default: 400 chars with 100 overlap)
- Embedding: Each chunk is converted to a vector embedding
- Storage: Chunks, embeddings, and metadata are stored in SQLite
- Retrieval: Queries use hybrid search (keyword + semantic similarity)
- Generation: Retrieved chunks provide context for the language model
- Deduplication: Chunks are deduplicated using SHA-256 hashes
- Hybrid Search: Combines FTS5 keyword search with vector similarity
- Optimized Storage: Uses WAL mode and optimized page size (32KB)
- Embedding Dimensions: Optimized for 1536-dimensional embeddings (OpenAI compatible)
Common issues and solutions:
- sqlite-vec not found: Ensure sqlite-vec is properly installed
- Unsupported file format: Check if the file type is supported or use UnstructuredReader
- Empty results: Verify documents were added successfully and embeddings are working
- Model compatibility: Ensure your models implement the required interfaces
- Batch Processing: Add multiple files before querying for better performance
- Metadata Usage: Include relevant metadata for better document organization
- Chunk Size: Adjust chunk size based on your content type and model context window
- Model Selection: Choose embedding models compatible with your use case
- Database Backup: Regularly backup your
.dbfile as it contains all your data
EmbedFn = Callable[[str], List[float]]
ChatFn = Callable[[str, Sequence[str]], str]
Chunker = Union[str, Callable[[str], List[str]], None]
FileInput = Union[str, Path, bytes, bytearray, IO[bytes], IO[str]]_retrieve(query, k): Retrieve k most relevant chunks_persist(text, metadata): Store text chunks with embeddings_extract_file(data): Extract text from various file formats_extract_web(url): Extract text from web pages_ensure_db(): Initialize database and sqlite-vec_create_schema(): Create database tables