forked from https://github.com/fadawkas/RAG-System_Kuasar highly improved and tuned for my use case. The compose file initially had 3 services.
- The API (Python image)
- Chroma DB
- Ollama
- Ability to upload web urls as well
- segmented dev and prod workflows (compose & deployment.yaml)
- cleaned up compose
I initially converted the compose file to a production ready kube manifest, then for local development purposes I removed ollama and python services (opting to use the ones already installed locally) from the compose image and only kept the chroma db service with persistance enabled. I opted to use poetry for the package manager but the implementation for pip and the 3 service compose is in the past commits of this repo.
This system provides a FastAPI-based backend for document processing, vector storage, and question answering using the RAG pattern. It allows you to:
- Upload PDF documents for processing and vector storage
- Process web content from URLs
- Ask questions against your stored knowledge base
The system consists of:
- FastAPI Service: Handles API endpoints for document upload, web content processing, and question answering
- ChromaDB: Vector database for storing and retrieving document embeddings
- Ollama: Local LLM provider for both text generation and embeddings
- Docker and Docker Compose (for ChromaDB)
- Python 3.8+ with pip
- miniconda3
- Ollama installed locally with the model of choice, you will need 2:
- 1 for text generation
- 1 for embeddings
- Clone the repository
- Start ChromaDB with Docker:
docker-compose up -d vector-store
- conda init, then conda activate, followed by poetry install:
The API will be available at http://localhost:5003.
POST /upload/
Uploads and processes a PDF file, storing its content in the vector database.
POST /upload_web/
Processes content from web URLs and stores it in the vector database. Request body example:
{
"urls": ["https://example.com/article"]
}POST /question/
Asks a question against the stored document base. Request body example:
{
"question": "What is RAG?"
}The main configuration parameters are defined at the top of main.py:
generativeModelName: The Ollama model used for text generation (default: "DeepseekCoderV2")embeddingsModelName: The Ollama model used for embeddings (default: "mxbai-embed-large")UPLOAD_DIR: Directory for storing uploaded PDF files (default: "uploaded_docs")
- The FastAPI service runs on port 5003
- ChromaDB runs on port 8000 (exposed from Docker)
- LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
- LANGSMITH_TRACING=true
- LANGSMITH_API_KEY="YOUR-API-KEY"
- LANGSMITH_PROJECT="rag-api"
- USER_AGENT="FirstRag/1.0 (Linux; Python 3.11)"