Skip to content

paramcodes/rag-agent-api

Repository files navigation

Atlas API

RAG API service with RAPTOR hierarchical indexing, multi-stage search, and LLM-powered answers.

Ingests documents (PDFs, images, URLs, code), builds a RAPTOR (Recursive Abstractive Processing Tree Of Retrieval) index, and answers queries using a multi-stage search pipeline (vector + BM25 + RRF fusion + reranker).

Architecture

                   ┌──────────────┐
                   │   Client     │
                   └──────┬───────┘
                          │
                  ┌───────▼───────┐
                  │   Fastify     │
                  │   (Bun)       │
                  └───┬───┬───┬───┘
                      │   │   │
         ┌────────────┘   │   └────────────┐
         ▼                ▼                ▼
   ┌──────────┐    ┌──────────┐    ┌──────────────┐
   │ Docling  │    │  Gemini  │    │  Jina AI     │
   │ Service  │    │  2.0     │    │  Embed +      │
   │ (Python) │    │  Flash   │    │  Rerank       │
   └────┬─────┘    └────┬─────┘    └──────┬───────┘
        │               │                 │
        ▼               ▼                 ▼
   ┌──────────────────────────────────────────┐
   │         PostgreSQL + pgvector            │
   │  (HNSW index + BM25 full-text search)    │
   └──────────────────────────────────────────┘
        │
        ▼
   ┌──────────┐
   │  RAPTOR  │
   │ Service  │
   │ (Python) │
   └──────────┘

Features

  • Multi-format ingestion — PDFs (via IBM Docling), images (Gemini Vision), URLs (Cheerio), code files
  • RAPTOR indexing — Recursive clustering (UMAP + HDBSCAN) with LLM summarization builds a hierarchical knowledge tree
  • Late chunking — Cross-chunk attention context via Jina embeddings
  • Multi-stage search — Vector similarity (pgvector) + BM25 (Postgres FTS) → RRF fusion → Jina reranker
  • LLM answers — Gemini 2.0 Flash generates answers with source citations
  • Full observability — Prometheus metrics, Pino structured logging, request ID propagation
  • Docker Compose — One command to spin up all services

Services

Container Port Purpose
Atlas API 3001 Fastify HTTP server (Bun)
Docling 8000 PDF parsing (Python/FastAPI)
RAPTOR 8001 Hierarchical tree builder (Python/FastAPI)
PostgreSQL 5432 pgvector + BM25 full-text search
Neo4j 7474 Graph DB (provisioned, integration TBD)
Prometheus 9090 Metrics collection
Grafana 3030 Metrics dashboards

Quick start

# 1. Install JS dependencies
bun install

# 2. Install Python service dependencies
pip install -r docling-service/requirements.txt
pip install -r raptor-service/requirements.txt

# 3. Set up environment
cp .env.example .env
# Edit .env with your API keys (JINA_API_KEY, GEMINI_API_KEY)

# 4. Start infrastructure (PostgreSQL, Neo4j, Prometheus, Grafana)
docker compose up -d postgres neo4j prometheus grafana

# 5. Run database migrations
bunx prisma migrate deploy

# 6. Start Python microservices (in separate terminals)
bun run docling-service/main.py &
bun run raptor-service/main.py &

# 7. Start the API server
bun run dev

Using Docker Compose (all services)

docker compose up --build

API

Ingestion

Method Route Description
POST /ingest/pdf Upload PDF → Docling parses → returns chunks
POST /ingest/url { url } → fetch + extract content
POST /ingest/image Upload image → Gemini describes
POST /ingest/code { content, filepath } → split into blocks
POST /ingest/full Upload file → full pipeline: ingest → embed → RAPTOR → store
POST /test/embed { texts } → test Jina embeddings

Query

Method Route Description
POST /query { query } → multi-stage search → Gemini answer

Health & Metrics

Method Route Description
GET /health Service health check
GET /metrics Prometheus metrics

Query flow

POST /query { "query": "What is RAPTOR indexing?" }
  1. Embed query via Jina (retrieval.query)
  2. Vector search — cosine similarity on all tree nodes via pgvector HNSW index
  3. BM25 search — PostgreSQL full-text search on leaf nodes
  4. RRF fusion — Reciprocal Rank Fusion (k=60) combines results
  5. Rerank — Top 20 → Jina reranker → keep top 5
  6. Answer — Gemini 2.0 Flash generates answer with citations

Configuration

Variable Default Description
PORT 3001 HTTP server port
DATABASE_URL postgresql://atlas:atlas@localhost:5432/atlas Postgres connection string
JINA_API_KEY Jina AI API key (embeddings + reranker)
GEMINI_API_KEY Google Gemini API key
NEO4J_URI bolt://localhost:7687 Neo4j connection URI
DOCLING_URL http://localhost:8000 Docling service URL
RAPTOR_URL http://localhost:8001 RAPTOR service URL

Tech stack

Project structure

src/
├── agent/          # LLM answer generation
├── api/            # HTTP route handlers
├── embeddings/     # Jina embeddings + late chunking
├── fusion/         # Reciprocal Rank Fusion (RRF)
├── ingestion/      # Document parsers (PDF, image, URL, code)
├── observability/  # Logging, metrics, request ID
├── raptor/         # RAPTOR tree client + storage
├── reranker/       # Jina reranker integration
├── retrieval/      # Vector + BM25 search
├── storage/        # Prisma client singleton
├── config.ts       # Environment configuration
├── index.ts        # Server entry point
└── types.ts        # Shared type definitions

docling-service/    # Python: IBM Docling PDF parser
raptor-service/     # Python: UMAP + HDBSCAN + Gemini summarization
prisma/             # Schema, migrations, pgvector SQL

Scripts

bun run dev      # Start with hot reload
bun run build    # Build for production
bun run start    # Run production build

License

MIT

About

RAG API service with RAPTOR hierarchical indexing, multi-stage search, and LLM-powered answers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors