Locally Hosted RAG System

A locally hosted RAG (Retrieval-Augmented Generation) system for intelligent document research. Upload your own private documents, ask questions in plain English, and get AI-powered answers with source citations—all running privately on your machine.

Project Goal

Obviously no current laptops has the GPU power to use top of the line models and techniques across a full RAG pipeline. This project aims to allow the evaluation of multiple small models and features together to find an optimum pipeline for given set of documents that is still reasonably usable on a local computer.

Features

Core Functionality

Question Answering: Local LLM used for interacting with documents.
Document Formats: Currently supports txt, md, pdf, docx, pptx, xlsx, html.
Chat History: Session-based conversation memory persists across restarts
Local Deployment: Can runs entirely on your machine with no external API dependencies
Web Interface: ChagGPT like interface with document management.

Retrieval Methods

Contextual Retrieval: Document context embedded with chunks for improved accuracy
Structure Preservation: Maintains document hierarchy including headings, sections, and tables
Hybrid Search: BM25 keyword matching combined with vector semantic search
Reranking: Cross-encoder model for result refinement. Uses a small memory loaded model.

System Capabilities

Async Processing: Background document indexing via Celery task queue
Progress Tracking: Real-time status updates for document uploads
Source Attribution: Results include document excerpts and metadata

Technology Stack

Frontend: SvelteKit + Tailwind CSS.
Backend: Python + FastAPI
Vector Database: ChromaDB
AI Models: Ollama (local LLM and embeddings)
Document Processing: Docling + LlamaIndex
Task Queue: Celery + Redis
Package Management: uv

Architecture

                        ┌────────────────────────────────────┐
                        │   External: Ollama (Host Machine)  │
          End User      │   host.docker.internal:11434       │
          (browser)     │   - LLM: gemma3:4b                 │
              │         │   - Embeddings: nomic-embed-text   │
              │         └────────────────────────────────────┘
              │                              │
              │    Public Network (host)     │
--------------│------------------------------│-------------------
              │    Private Network (Docker)  │
              ▼                              │
    ┌─────────────────┐           ┌──────────────────────┐
    │   WebApp        │    HTTP   │   RAG Server         │
    │   (SvelteKit)   │◄─────────►│   (FastAPI)          │
    │   Port: 8000    │           │   Port: 8001         │
    └─────────────────┘           │                      │
                                  │  ┌────────────────┐  │
                                  │  │ Docling        │  │
                                  │  │ + LlamaIndex   │  │
                                  │  │ + Hybrid Search│  │
                                  │  │ + Reranking    │  │
                                  │  └────────────────┘  │
                                  └──────────┬───────────┘
                                             │
                                             │
           ┌─────────────┬───────────────────┼─────────┐
           │             │                   │         │
      ┌────▼─────┐  ┌────▼────┐       ┌──────▼──────┐  │
      │ ChromaDB │  │  Redis  │       │   Celery    │  │
      │ (Vector  │  │(Message │       │   Worker    │  │
      │  DB)     │  │ Broker) │       │  (Async     │  │
      └──────────┘  └─────────┘       │ Processing) │  │
                                      └──────┬──────┘  │
                                             │         │
                                      ┌───────────────────┐
                                      │   Shared Volume   │
                                      │   /tmp/shared     │
                                      │   (File Transfer) │
                                      └───────────────────┘

Prerequisites

Docker container host (Docker Desktop, OrbStack, or Podman)
Ollama for running AI models locally
Python 3.13 and uv package manager

Installation

# Install Docker alternative (faster than Docker Desktop)
brew install orbstack

# Install uv package manager
brew install uv

# Install Ollama (recommended method)
curl https://ollama.ai/install.sh | sh

# Pull required models
ollama pull gemma3:4b              # Inference model
ollama pull nomic-embed-text       # Embedding model

Quick Start

1. Clone and Setup

git clone [email protected]:gittycat/rag-docling.git
cd rag-docling

2. Configure Secrets (Optional)

# Create secrets directory
mkdir -p secrets

# Add Ollama configuration if needed
echo "OLLAMA_HOST=http://host.docker.internal:11434" > secrets/ollama_config.env

3. Start Services

docker compose up

Open the WebApp at http://localhost:8000

Basic Usage

Upload Documents

Open http://localhost:8000
Navigate to the Admin section
Click "Upload Documents"
Select files (PDF, DOCX, TXT, MD, etc.)
Monitor progress in real-time

Ask Questions

Use the main page to query your documents

Manage Documents

View Documents: See all indexed documents in the Admin section
Delete Documents: Remove documents you no longer need
Clear Chat: Start a fresh conversation anytime

Configuration

Basic configuration is handled through environment variables in docker-compose.yml. For most users, the defaults work well.

Key Settings

Models: Change LLM or embedding models (default: gemma3:4b, nomic-embed-text)
Retrieval: Adjust number of results returned (default: 10)
Features: Enable/disable hybrid search, contextual retrieval, reranking

For detailed configuration options, see DEVELOPMENT.md.

Documentation

Most of the extra documentation is for use by the Claude Code environment development. It includes more in depth search on techniques to improve accuracy. I will probably migrate to using Claude Skills for some of this info in the future.

DEVELOPMENT.md: API documentation, configuration details, troubleshooting
CLAUDE.md: Project guide for Claude Code development
docs/: Additional guides and implementation details

Roadmap

TODO: Detail the roadmap.

Overall, the immediate goal is to improve evaluation (make it easier to compare features/models), then migrating the application online which will require security and data privacy improvements.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.claude		.claude
docs		docs
sample_documents		sample_documents
scripts		scripts
secrets_template		secrets_template
services		services
.gitignore		.gitignore
.mcp.json		.mcp.json
.python-version		.python-version
CLAUDE.md		CLAUDE.md
DEVELOPMENT.md		DEVELOPMENT.md
README.md		README.md
docker-compose.yml		docker-compose.yml
redis.conf		redis.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Locally Hosted RAG System

Project Goal

Features

Core Functionality

Retrieval Methods

System Capabilities

Technology Stack

Architecture

Prerequisites

Installation

Quick Start

1. Clone and Setup

2. Configure Secrets (Optional)

3. Start Services

Basic Usage

Upload Documents

Ask Questions

Manage Documents

Configuration

Key Settings

Documentation

Roadmap

License

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

gittycat/rag-docling

Folders and files

Latest commit

History

Repository files navigation

Locally Hosted RAG System

Project Goal

Features

Core Functionality

Retrieval Methods

System Capabilities

Technology Stack

Architecture

Prerequisites

Installation

Quick Start

1. Clone and Setup

2. Configure Secrets (Optional)

3. Start Services

Basic Usage

Upload Documents

Ask Questions

Manage Documents

Configuration

Key Settings

Documentation

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages