A High-Performance, Multi-Tenant RAG Platform.
RAG Factory is a robust, production-ready Retrieval-Augmented Generation (RAG) system designed for scalability and precision. It features a microservices architecture separating the RAG Engine (Python/FastAPI) from the Platform API (Node.js/NestJS), ensuring modularity and performance.
The system is model-agnostic, supporting Ollama, Google Gemini, and OpenAI, allowing flexibility between local privacy-focused models and powerful cloud-based LLMs.
-
🧠 Advanced RAG Engine:
- Hybrid Search: Combines semantic search (embeddings) with lexical search (keyword scoring) for superior recall.
- Context Expansion: Automatically retrieves neighboring chunks to provide better context to the LLM.
- Anti-Hallucination: Strict validation logic ensures answers are grounded in the provided documents.
- Multi-Model Support: Seamlessly switch between Ollama (local), Gemini, and OpenAI.
-
🏗️ Scalable Architecture:
- Multi-Tenancy: Built-in support for multiple workspaces and isolated document sets.
- Async Ingestion: Celery + Redis pipeline for processing large documents without blocking.
- Microservices:
apps/engine: Python core for RAG logic.apps/platform: NestJS API for management and orchestration.
Authentication is handled via Service Tokens. Clients must first generate a token to interact with the API. This token scopes access to specific resources.
Data is organized into Workspaces. A workspace acts as an isolated container for documents. Ingestion and retrieval are strictly scoped to a workspace, ensuring data privacy and multi-tenancy support.
The query engine uses a multi-stage pipeline:
- Hybrid Retrieval: Fetches documents using both vector similarity and keyword matching.
- Re-ranking: Re-orders results to prioritize the most relevant chunks.
- Synthesis: The LLM generates an answer based exclusively on the retrieved context.
- Docker & Docker Compose
- Git
git clone https://github.com/mdiniz97/rag-factory.git
cd rag-factoryCreate a .env file in apps/engine (or use the provided example):
cp apps/engine/.env.example apps/engine/.envModel Configuration (apps/engine/.env):
You can configure the provider by setting LLM_PROVIDER and EMBEDDING_PROVIDER.
Example for Gemini:
GOOGLE_API_KEY=your_gemini_api_key
LLM_PROVIDER=gemini
EMBEDDING_PROVIDER=geminiExample for OpenAI:
OPENAI_API_KEY=your_openai_api_key
LLM_PROVIDER=openai
EMBEDDING_PROVIDER=openaiExample for Ollama (Local):
LLM_PROVIDER=ollama
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434We provide three Docker Compose configurations to suit different needs:
| File | Purpose |
|---|---|
docker-compose.yml |
Standard Development. Runs the full stack (Engine, Platform, Worker) locally. |
docker-compose.infra.yml |
Infrastructure Only. Runs only the databases (Postgres, Redis, Qdrant, MinIO). Useful if you want to run the apps locally for debugging. |
docker-compose.full.yml |
Production/Full. Similar to standard but can be extended for production deployments with additional services if needed. |
Start the standard environment:
docker-compose up --build -dThe services will be available at:
- Platform API:
http://localhost:3000 - API Documentation (Swagger):
http://localhost:3000/api - RAG Engine:
http://localhost:8000 - Qdrant UI:
http://localhost:6333/dashboard
rag-factory/
├── apps/
│ ├── engine/ # Python RAG Core (FastAPI + LangChain)
│ │ ├── api/ # API Routes
│ │ ├── core/ # Factories & Config
│ │ ├── services/ # RAG Logic (rag_service.py)
│ │ └── worker.py # Celery Worker for Ingestion
│ │
│ └── platform/ # Node.js Management API (NestJS)
│ ├── src/
│ │ ├── ingestion/ # Ingestion Orchestration
│ │ ├── query/ # Query Proxy
│ │ └── workspaces/ # Workspace Management
│ └── prisma/ # Database Schema
│
├── docker-compose.yml # Main Docker Compose
└── README.md # You are here
- LLM & Embeddings: Google Gemini, OpenAI, Ollama
- Vector DB: Qdrant
- Backend: Python (FastAPI), Node.js (NestJS)
- Queue: Redis + Celery
- Storage: MinIO (S3 compatible object storage)
- Database: PostgreSQL
This project is licensed under the MIT License.