🏭 RAG Factory

A High-Performance, Multi-Tenant RAG Platform.

📖 Overview

RAG Factory is a robust, production-ready Retrieval-Augmented Generation (RAG) system designed for scalability and precision. It features a microservices architecture separating the RAG Engine (Python/FastAPI) from the Platform API (Node.js/NestJS), ensuring modularity and performance.

The system is model-agnostic, supporting Ollama, Google Gemini, and OpenAI, allowing flexibility between local privacy-focused models and powerful cloud-based LLMs.

✨ Key Features

🧠 Advanced RAG Engine:
- Hybrid Search: Combines semantic search (embeddings) with lexical search (keyword scoring) for superior recall.
- Context Expansion: Automatically retrieves neighboring chunks to provide better context to the LLM.
- Anti-Hallucination: Strict validation logic ensures answers are grounded in the provided documents.
- Multi-Model Support: Seamlessly switch between Ollama (local), Gemini, and OpenAI.
🏗️ Scalable Architecture:
- Multi-Tenancy: Built-in support for multiple workspaces and isolated document sets.
- Async Ingestion: Celery + Redis pipeline for processing large documents without blocking.
- Microservices:
  - apps/engine: Python core for RAG logic.
  - apps/platform: NestJS API for management and orchestration.

Core Concepts

1. Service Tokens

Authentication is handled via Service Tokens. Clients must first generate a token to interact with the API. This token scopes access to specific resources.

2. Workspaces

Data is organized into Workspaces. A workspace acts as an isolated container for documents. Ingestion and retrieval are strictly scoped to a workspace, ensuring data privacy and multi-tenancy support.

3. Query & Retrieval

The query engine uses a multi-stage pipeline:

Hybrid Retrieval: Fetches documents using both vector similarity and keyword matching.
Re-ranking: Re-orders results to prioritize the most relevant chunks.
Synthesis: The LLM generates an answer based exclusively on the retrieved context.

🚀 Getting Started

Prerequisites

Docker & Docker Compose
Git

1. Clone the Repository

git clone https://github.com/mdiniz97/rag-factory.git
cd rag-factory

2. Configuration

Create a .env file in apps/engine (or use the provided example):

cp apps/engine/.env.example apps/engine/.env

Model Configuration (apps/engine/.env):

You can configure the provider by setting LLM_PROVIDER and EMBEDDING_PROVIDER.

Example for Gemini:

GOOGLE_API_KEY=your_gemini_api_key
LLM_PROVIDER=gemini
EMBEDDING_PROVIDER=gemini

Example for OpenAI:

OPENAI_API_KEY=your_openai_api_key
LLM_PROVIDER=openai
EMBEDDING_PROVIDER=openai

Example for Ollama (Local):

LLM_PROVIDER=ollama
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434

3. Run with Docker

We provide three Docker Compose configurations to suit different needs:

File	Purpose
`docker-compose.yml`	Standard Development. Runs the full stack (Engine, Platform, Worker) locally.
`docker-compose.infra.yml`	Infrastructure Only. Runs only the databases (Postgres, Redis, Qdrant, MinIO). Useful if you want to run the apps locally for debugging.
`docker-compose.full.yml`	Production/Full. Similar to standard but can be extended for production deployments with additional services if needed.

Start the standard environment:

docker-compose up --build -d

The services will be available at:

Platform API: http://localhost:3000
API Documentation (Swagger): http://localhost:3000/api
RAG Engine: http://localhost:8000
Qdrant UI: http://localhost:6333/dashboard

📂 Project Structure

rag-factory/
├── apps/
│   ├── engine/             # Python RAG Core (FastAPI + LangChain)
│   │   ├── api/            # API Routes
│   │   ├── core/           # Factories & Config
│   │   ├── services/       # RAG Logic (rag_service.py)
│   │   └── worker.py       # Celery Worker for Ingestion
│   │
│   └── platform/           # Node.js Management API (NestJS)
│       ├── src/
│       │   ├── ingestion/  # Ingestion Orchestration
│       │   ├── query/      # Query Proxy
│       │   └── workspaces/ # Workspace Management
│       └── prisma/         # Database Schema
│
├── docker-compose.yml      # Main Docker Compose
└── README.md               # You are here

🛠️ Tech Stack

LLM & Embeddings: Google Gemini, OpenAI, Ollama
Vector DB: Qdrant
Backend: Python (FastAPI), Node.js (NestJS)
Queue: Redis + Celery
Storage: MinIO (S3 compatible object storage)
Database: PostgreSQL

📄 License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏭 RAG Factory

📖 Overview

✨ Key Features

Core Concepts

1. Service Tokens

2. Workspaces

3. Query & Retrieval

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Configuration

3. Run with Docker

📂 Project Structure

🛠️ Tech Stack

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
apps		apps
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.full.yml		docker-compose.full.yml
docker-compose.infra.yml		docker-compose.infra.yml
docker-compose.yml		docker-compose.yml

License

mdiniz97/rag-factory

Folders and files

Latest commit

History

Repository files navigation

🏭 RAG Factory

📖 Overview

✨ Key Features

Core Concepts

1. Service Tokens

2. Workspaces

3. Query & Retrieval

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Configuration

3. Run with Docker

📂 Project Structure

🛠️ Tech Stack

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages