Emerging Topics RAG — Retrieval-Augmented Generation System

A Dockerized Retrieval-Augmented Generation (RAG) system optimized for CPU-only environments and capable of indexing and querying large-scale document collections (100k+ documents). This system integrates a modular API, local LLM serving via Ollama, and optional evaluation via RAGAS metrics. Ideal for research or educational deployment, especially in resource-constrained setups.

Features

Upload and index large-scale documents (>100k, ~5k characters each)
Perform semantic search with contextual answer generation
CPU-only compatible (≤16GB RAM, no GPU needed)
Modular microservices: FastAPI, embedding service, LLM wrapper, ChromaDB
Local LLM inference via Ollama (e.g., Mistral, LLaMA 2)
RAGAS-ready pipeline for evaluating answer quality and context precision
Designed for extensibility, benchmarking, and privacy-preserving applications

Prerequisites

Docker v20.10+
Docker Compose v1.27+
CPU-only machine (≥8GB RAM recommended)
(Optional) OPENAI_API_KEY set for metric computation:

export OPENAI_API_KEY=your_key

Getting Started

Clone the repository with submodules:

git clone https://github.com/ckranon/emerging-topics-rag.git
cd emerging-topics-rag/rag-api

RAG API Submodule

Located in rag-api/, the core RAG pipeline includes:

api/ — FastAPI endpoints for document upload and generation
embedding/ — Embedding server using SentenceTransformers
ollama/ — Local LLM runner using Ollama
vector_store/ — Persistent ChromaDB vector index
test_api.py — Basic integration test script; returns average respones time.
compute_metrics.py — Computes RAGAS Metrics based on generated results from test_api.py

API Endpoints

`GET /`

Health check:

curl http://localhost:8000/

Response:

{"message":"RAG API is running successfully"}

`POST /upload`

Uploads documents and indexes them into the vector store.

curl -X POST http://localhost:8000/upload \
  -H "Content-Type: application/json" \
  -d '{"texts":["Document 1 text...", "Document 2 text..."]}'

Response:

{"message":"Vector index successfully created","nodes_count":123}

`POST /generate`

Generates an answer based on user query and retrieved document context.

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"new_message":{"role":"user","content":"What is the capital of France?"}}'

Response:

{
  "generated_text": "The capital of France is Paris.",
  "contexts": ["Paris is the capital of France. It is known for the Eiffel Tower."]
}

Docker Setup

To build and run all services:

docker-compose up --build

Services launched:

api — FastAPI service for user interaction
embedding — Generates document embeddings
ollama — Runs a local LLM using start.sh <- Can change model.

Testing

Run API tests:

python test_api.py

Run RAGAS-based metric evaluation:

export OPENAI_API_KEY=your_key
python compute_metrics.py

⚠️ Due to runtime and API constraints, metric computation may timeout.

Methodology & Findings

Chunking Strategy Evaluation

We compared:

Semantic Chunking — Splitting based on semantic boundaries (embedding similarity)
Sentence Window Chunking — Fixed-size overlapping windows

Result: Inconclusive. compute_metrics.py timeout.

Model Comparison

We explored different LLMs:

DeepSeek-R1:1.5b (reasoning-focused, open-weight)
Qwen2.5:0.5b (BASELINE)

Result: Inconclusive. compute_metrics.py timeout.

Inference Backends

We tested:

Ollama — Seamless local inference with minimal setup
Hugging Face TGI — Scalable backend for multi-GPU serving

Result: Ollama replaced TGI due to TGI not being able to pull baseline models.

Vector Stores

Insteaad of using HuggingFace TGI, we implemented a persistent storage using Chroma.db.

Evaluation Issues

Although the pipeline stores generation outputs for downstream evaluation, RAGAS metric computation consistently timed out during execution due to:

API response delays from OpenAI

As a result, we deliver a baseline model with only qualitative improvement insights and no definitive RAGAS scores.

Project Structure

emerging-topics-rag/
├── .gitignore
├── README.md               # This file
├── compute_metrics.py      # Metric computation using RAGAS (OpenAI required)
└── rag-api/
    ├── api/
    │   ├── api_rag.py
    │   ├── Dockerfile
    │   └── requirements.txt
    ├── embedding/
    │   ├── embed_server.py
    │   ├── Dockerfile
    │   └── requirements.txt
    ├── ollama/
    │   ├── start.sh
    │   └── Dockerfile
    ├── vector_store/
    │   └── chroma.db       # Persistent ChromaDB index
    ├── docker-compose.yaml
    └── test_api.py

Use Cases

Research Prototypes — Test chunking and RAG strategies
Private Knowledge Retrieval — Deploy local document Q&A systems
Teaching Tool — Understand full-stack RAG pipelines
Baseline Model Benchmarks — Evaluate low-resource model performance

Limitations & Challenges

RAGAS Metrics Unavailable — Due to OpenAI API timeout issues
No GPU Support — CPU-only by design; not optimized for high-scale workloads
Manual Chunking Trade-offs — Semantic methods improve results but increase complexity
Ollama Model Limitation — Must manually ensure models are pulled and accessible

Contributing

We welcome contributions!

Fork the repository
Create a new feature branch
Commit your changes
Open a pull request

If you find a bug or have a feature request, feel free to open an Issue.

License

This project is licensed under the MIT License. You are free to use, modify, and distribute the code for academic or commercial purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Emerging Topics RAG — Retrieval-Augmented Generation System

Table of Contents

Features

Prerequisites

Getting Started

Clone the repository with submodules:

RAG API Submodule

API Endpoints

`GET /`

`POST /upload`

`POST /generate`

Docker Setup

Testing

Run API tests:

Run RAGAS-based metric evaluation:

Methodology & Findings

Chunking Strategy Evaluation

Model Comparison

Inference Backends

Vector Stores

Evaluation Issues

Project Structure

Use Cases

Limitations & Challenges

Contributing

License

(https://github.com/user-attachments/assets/93e5ff9a-f5cb-457f-8637-e410b1058f17)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
rag-api		rag-api
.gitignore		.gitignore
README.md		README.md

ckranon/emerging-topics-rag

Folders and files

Latest commit

History

Repository files navigation

Emerging Topics RAG — Retrieval-Augmented Generation System

Table of Contents

Features

Prerequisites

Getting Started

Clone the repository with submodules:

RAG API Submodule

API Endpoints

GET /

POST /upload

POST /generate

Docker Setup

Testing

Run API tests:

Run RAGAS-based metric evaluation:

Methodology & Findings

Chunking Strategy Evaluation

Model Comparison

Inference Backends

Vector Stores

Evaluation Issues

Project Structure

Use Cases

Limitations & Challenges

Contributing

License

(https://github.com/user-attachments/assets/93e5ff9a-f5cb-457f-8637-e410b1058f17)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

`GET /`

`POST /upload`

`POST /generate`

Packages