Skip to content

RAG system featuring microservices architecture (FastAPI + NestJS), hybrid search, and async ingestion.

License

Notifications You must be signed in to change notification settings

mdiniz97/rag-factory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏭 RAG Factory

A High-Performance, Multi-Tenant RAG Platform.

License Python TypeScript Docker

📖 Overview

RAG Factory is a robust, production-ready Retrieval-Augmented Generation (RAG) system designed for scalability and precision. It features a microservices architecture separating the RAG Engine (Python/FastAPI) from the Platform API (Node.js/NestJS), ensuring modularity and performance.

The system is model-agnostic, supporting Ollama, Google Gemini, and OpenAI, allowing flexibility between local privacy-focused models and powerful cloud-based LLMs.

✨ Key Features

  • 🧠 Advanced RAG Engine:

    • Hybrid Search: Combines semantic search (embeddings) with lexical search (keyword scoring) for superior recall.
    • Context Expansion: Automatically retrieves neighboring chunks to provide better context to the LLM.
    • Anti-Hallucination: Strict validation logic ensures answers are grounded in the provided documents.
    • Multi-Model Support: Seamlessly switch between Ollama (local), Gemini, and OpenAI.
  • 🏗️ Scalable Architecture:

    • Multi-Tenancy: Built-in support for multiple workspaces and isolated document sets.
    • Async Ingestion: Celery + Redis pipeline for processing large documents without blocking.
    • Microservices:
      • apps/engine: Python core for RAG logic.
      • apps/platform: NestJS API for management and orchestration.

Core Concepts

1. Service Tokens

Authentication is handled via Service Tokens. Clients must first generate a token to interact with the API. This token scopes access to specific resources.

2. Workspaces

Data is organized into Workspaces. A workspace acts as an isolated container for documents. Ingestion and retrieval are strictly scoped to a workspace, ensuring data privacy and multi-tenancy support.

3. Query & Retrieval

The query engine uses a multi-stage pipeline:

  1. Hybrid Retrieval: Fetches documents using both vector similarity and keyword matching.
  2. Re-ranking: Re-orders results to prioritize the most relevant chunks.
  3. Synthesis: The LLM generates an answer based exclusively on the retrieved context.

🚀 Getting Started

Prerequisites

  • Docker & Docker Compose
  • Git

1. Clone the Repository

git clone https://github.com/mdiniz97/rag-factory.git
cd rag-factory

2. Configuration

Create a .env file in apps/engine (or use the provided example):

cp apps/engine/.env.example apps/engine/.env

Model Configuration (apps/engine/.env):

You can configure the provider by setting LLM_PROVIDER and EMBEDDING_PROVIDER.

Example for Gemini:

GOOGLE_API_KEY=your_gemini_api_key
LLM_PROVIDER=gemini
EMBEDDING_PROVIDER=gemini

Example for OpenAI:

OPENAI_API_KEY=your_openai_api_key
LLM_PROVIDER=openai
EMBEDDING_PROVIDER=openai

Example for Ollama (Local):

LLM_PROVIDER=ollama
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434

3. Run with Docker

We provide three Docker Compose configurations to suit different needs:

File Purpose
docker-compose.yml Standard Development. Runs the full stack (Engine, Platform, Worker) locally.
docker-compose.infra.yml Infrastructure Only. Runs only the databases (Postgres, Redis, Qdrant, MinIO). Useful if you want to run the apps locally for debugging.
docker-compose.full.yml Production/Full. Similar to standard but can be extended for production deployments with additional services if needed.

Start the standard environment:

docker-compose up --build -d

The services will be available at:

  • Platform API: http://localhost:3000
  • API Documentation (Swagger): http://localhost:3000/api
  • RAG Engine: http://localhost:8000
  • Qdrant UI: http://localhost:6333/dashboard

📂 Project Structure

rag-factory/
├── apps/
│   ├── engine/             # Python RAG Core (FastAPI + LangChain)
│   │   ├── api/            # API Routes
│   │   ├── core/           # Factories & Config
│   │   ├── services/       # RAG Logic (rag_service.py)
│   │   └── worker.py       # Celery Worker for Ingestion
│   │
│   └── platform/           # Node.js Management API (NestJS)
│       ├── src/
│       │   ├── ingestion/  # Ingestion Orchestration
│       │   ├── query/      # Query Proxy
│       │   └── workspaces/ # Workspace Management
│       └── prisma/         # Database Schema
│
├── docker-compose.yml      # Main Docker Compose
└── README.md               # You are here

🛠️ Tech Stack

  • LLM & Embeddings: Google Gemini, OpenAI, Ollama
  • Vector DB: Qdrant
  • Backend: Python (FastAPI), Node.js (NestJS)
  • Queue: Redis + Celery
  • Storage: MinIO (S3 compatible object storage)
  • Database: PostgreSQL

📄 License

This project is licensed under the MIT License.

About

RAG system featuring microservices architecture (FastAPI + NestJS), hybrid search, and async ingestion.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published