RAG from Scratch

A document question-answering system that searches your documents and answers questions using their content.

Overview

This system processes documents, stores them in a searchable format, and uses a language model to answer questions based on the retrieved content. It supports PDF, DOCX, and TXT files.

🎥 Demo

Click the image above to watch the demo video

📸 Screenshots

Features

Document loading for PDF, DOCX, and TXT files
Automatic text chunking with overlap
Vector-based search using ChromaDB
Support for multiple language model providers (Ollama, OpenAI, Groq)
Conversation memory for context across multiple questions
Automatic indexing of new documents

Architecture

The system follows a Retrieval Augmented Generation (RAG) architecture with conversation history management. The diagram below illustrates the complete system flow:

System Flow

User Interaction: Users initiate queries through a session interface
Semantic Search: The system performs semantic search against the indexed knowledge base (ChromaDB)
Context Retrieval: Relevant document chunks are retrieved based on semantic similarity
Conversation History: The last 10 conversation windows are fetched using session information
Final Prompt Construction: The system combines:
- System Prompt
- Conversation History (last 10 windows)
- Retrieved Context
- User Query
LLM Processing: The comprehensive prompt is sent to the Large Language Model
Response Generation: The LLM generates a response based on the provided context
History Storage: The conversation (query and response) is stored for future context

Document Ingestion Pipeline

The document ingestion process follows these steps:

Load: Raw documents (PDF, DOCX, TXT) are loaded from the data directory
Split: Documents are split into smaller chunks with overlap for better context preservation
Embed: Text chunks are converted into numerical vector embeddings using sentence transformers
Store: Embeddings and metadata are stored in ChromaDB for efficient semantic search

Requirements

Python 3.9 or higher
Ollama (if using Ollama provider) - must be running locally on port 11434

Installation

Clone the repository:

git clone <repository-url>
cd rag-from-scratch

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configuration

Ollama (Default)

Make sure Ollama is installed and running. Edit src/main.py:

llm = OllamaProvider(model="llama3")

OpenAI

Set your API key and update src/main.py:

export OPENAI_API_KEY="your-api-key-here"

from provider.openai_provider import OpenAIProvider
llm = OpenAIProvider(model="gpt-4o-mini")

Groq

Install the package, set your API key, and update src/main.py:

pip install groq
export GROQ_API_KEY="your-api-key-here"

from provider.groq_provider import GroqProvider
llm = GroqProvider(model="llama3.1:8b")

Usage

Place your documents in the data/ folder
Run the application:

python src/main.py

Ask questions when prompted. Type exit or quit to end.

Project Structure

rag-from-scratch/
├── data/              # Place your documents here
├── src/
│   ├── ingest/       # Document loading and chunking
│   ├── vectorstore/  # ChromaDB integration
│   ├── provider/     # LLM providers (Ollama, OpenAI, Groq)
│   ├── memory/       # Conversation memory
│   ├── prompt/       # Prompt templates
│   └── main.py       # Main entry point
├── storage/          # ChromaDB data storage
└── requirements.txt

How It Works

Document Processing: Files in the data/ folder are loaded and split into chunks of 500 characters with 50 character overlap.
Indexing: Text chunks are converted to embeddings and stored in ChromaDB with metadata about their source file.
Search: When you ask a question, the system searches for the 4 most relevant document chunks using semantic similarity.
Answer Generation: The retrieved chunks are combined with your question and sent to the language model, which generates an answer based on the provided context.
Memory: The conversation history is maintained to provide context for follow-up questions.

Dependencies

chromadb - Vector database for storing embeddings
sentence-transformers - Embedding model (all-MiniLM-L6-v2)
PyPDF2 - PDF file reading
python-docx - DOCX file reading
openai - OpenAI API client (optional)

Notes

Python 3.9+
Ollama (if using Ollama provider)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG from Scratch

Overview

🎥 Demo

📸 Screenshots

Features

Architecture

System Flow

Document Ingestion Pipeline

Requirements

Installation

Configuration

Ollama (Default)

OpenAI

Groq

Usage

Project Structure

How It Works

Dependencies

Notes

About

Uh oh!

Languages

hereisSwapnil/rag-from-scratch

Folders and files

Latest commit

History

Repository files navigation

RAG from Scratch

Overview

🎥 Demo

📸 Screenshots

Features

Architecture

System Flow

Document Ingestion Pipeline

Requirements

Installation

Configuration

Ollama (Default)

OpenAI

Groq

Usage

Project Structure

How It Works

Dependencies

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages