Skip to content

dxuxlxa/Local-RAG

Repository files navigation

Dead simple local RAG api:

forked from https://github.com/fadawkas/RAG-System_Kuasar highly improved and tuned for my use case. The compose file initially had 3 services.

  • The API (Python image)
  • Chroma DB
  • Ollama

Improvements in this fork

  • Ability to upload web urls as well
  • segmented dev and prod workflows (compose & deployment.yaml)
  • cleaned up compose

I initially converted the compose file to a production ready kube manifest, then for local development purposes I removed ollama and python services (opting to use the ones already installed locally) from the compose image and only kept the chroma db service with persistance enabled. I opted to use poetry for the package manager but the implementation for pip and the 3 service compose is in the past commits of this repo.

Overview

This system provides a FastAPI-based backend for document processing, vector storage, and question answering using the RAG pattern. It allows you to:

  1. Upload PDF documents for processing and vector storage
  2. Process web content from URLs
  3. Ask questions against your stored knowledge base

Architecture

The system consists of:

  • FastAPI Service: Handles API endpoints for document upload, web content processing, and question answering
  • ChromaDB: Vector database for storing and retrieving document embeddings
  • Ollama: Local LLM provider for both text generation and embeddings

Prerequisites

  • Docker and Docker Compose (for ChromaDB)
  • Python 3.8+ with pip
  • miniconda3
  • Ollama installed locally with the model of choice, you will need 2:
    • 1 for text generation
    • 1 for embeddings

Installation

  1. Clone the repository
  2. Start ChromaDB with Docker:
    docker-compose up -d vector-store
  3. conda init, then conda activate, followed by poetry install:

The API will be available at http://localhost:5003.

API Endpoints

Upload PDF Document

POST /upload/

Uploads and processes a PDF file, storing its content in the vector database.

Process Web Content

POST /upload_web/

Processes content from web URLs and stores it in the vector database. Request body example:

{
  "urls": ["https://example.com/article"]
}

Ask Questions

POST /question/

Asks a question against the stored document base. Request body example:

{
  "question": "What is RAG?"
}

Configuration

The main configuration parameters are defined at the top of main.py:

  • generativeModelName: The Ollama model used for text generation (default: "DeepseekCoderV2")
  • embeddingsModelName: The Ollama model used for embeddings (default: "mxbai-embed-large")
  • UPLOAD_DIR: Directory for storing uploaded PDF files (default: "uploaded_docs")

Port Configuration

  • The FastAPI service runs on port 5003
  • ChromaDB runs on port 8000 (exposed from Docker)

For Langsmith tracing (optional), you will need to go to their site and get an API key

  • LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
  • LANGSMITH_TRACING=true
  • LANGSMITH_API_KEY="YOUR-API-KEY"
  • LANGSMITH_PROJECT="rag-api"
  • USER_AGENT="FirstRag/1.0 (Linux; Python 3.11)"

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages