System Architecture

This document provides a detailed breakdown of the ACL Anthology RAG system architecture, including its major components, design patterns, and data flow.

High-Level Overview

The system is designed as a modular Retrieval-Augmented Generation (RAG) pipeline. It decouples the offline ingestion of data from the online retrieval of information, allowing for scalable and efficient searching.

graph TD
    subgraph "Frontend"
        UI[React UI]
    end

    subgraph "Backend API"
        API[FastAPI Router]
        QP[Query Processor]
        Agg[Result Aggregator]
    end

    subgraph "Core Logic"
        Ref[LLM Reformulator]
        Emb[Embedding Service]
        Ret[Retrieval Pipeline]
    end

    subgraph "Data Storage"
        Qdrant[Qdrant Vector DB]
        Raw[Raw ACL Data]
    end

    UI <--> API
    API --> QP
    QP --> Ref
    Ref --> Emb
    Emb --> Ret
    Ret <--> Qdrant
    Ret --> Agg
    Agg --> API

Component Breakdown

1. Frontend Client

Tech Stack: React 19, Vite, TailwindCSS, shadcn/ui.
Responsibilities:
- Accepts user input (Natural Language or Paper ID).
- Displays retrieved papers with metadata (Title, Abstract, Year, Authors).
- Visualizes the search process (loading states, query expansion results).
Key Components:
- SearchInterface: Main input component handling mode switching.
- ResultCard: Displays individual paper details.

2. Backend API

Tech Stack: FastAPI, Python 3.12.
Responsibilities:
- Exposes REST endpoints for search.
- Handles request validation and error management.
- Orchestrates the retrieval pipeline.
Key Modules:
- api/app.py: FastAPI application entrypoint (Vercel-friendly).
- api/src/api/routes.py: Defines endpoints like POST /api/search (SSE) and GET /api/paper/{paper_id}.

3. Query Processor (`api/src/retrieval/query_processor.py`)

Role: Interprets the raw user input.
Logic:
- If input is a Paper ID (e.g., 2023.acl-long.1), the pipeline looks up the paper in the Qdrant payload and uses its title/abstract as the basis for generating search queries.
- If input is Natural Language, it passes it directly to the next stage.
Design Pattern: Strategy Pattern (handles different input types uniformly).

4. Filter Parser (`api/src/retrieval/filter_parser.py`)

Role: Extracts structured filters (year/authors/etc.), a remaining semantic query, and performs permissive relevance gating.
Notes:
- Runs before retrieval and can return early for clearly irrelevant queries.

5. LLM Reformulator (`api/src/llm/reformulator.py`)

Role: Expands the semantic query into multiple search queries.
Logic:
- Uses Groq LLM and returns a JSON list of query strings.
- Improves recall by covering different facets of the information need.
Configuration: Adjustable number of generated queries (default: 3).

6. Embedding Service (`api/src/vectorstore/client.py`)

Role: Converts text into dense vectors.
Provider: Fireworks embeddings via LangChain.
Model: Defaults to nomic-ai/nomic-embed-text-v1.5.
Usage:
- Offline: Embeds all paper abstracts.
- Online: Embeds the reformulated search queries.

7. Retrieval Pipeline (`api/src/retrieval/pipeline.py`)

Role: Executes the search against the vector database.
Logic:
- Performs nearest neighbor search for each of the reformulated queries.
- Retrieves k candidates for each query.

8. Result Aggregator (`api/src/retrieval/aggregator.py`)

Role: Merges results from multiple queries into a single ranked list.
Algorithm: Reciprocal Rank Fusion (RRF).
Logic:
- Assigns scores based on the rank of a paper in each result set.
- Favors papers that appear in multiple result sets or rank highly in single sets.
- Deduplicates papers.

9. Response Synthesizer (`api/src/llm/reformulator.py`)

Role: Streams a markdown response grounded in the retrieved results.
Notes:
- Uses bracketed numeric citations like [1], [2] aligned to the ranked results.

10. Vector Database

Tech Stack: Qdrant (Dockerized).
Role: Stores abstract embeddings and metadata (Title, URL, Year).
Schema:
- vector: 768-dim float array.
- payload: JSON object with paper metadata.

Design Patterns

Retrieval-Augmented Generation (RAG)

While typically used for generating answers, here the "Generation" part is primarily used for Query Expansion, and the final output is the retrieved documents themselves. This is often called RAG for Retrieval.

Reciprocal Rank Fusion (RRF)

Used to combine results from the multiple generated queries without needing complex re-ranking models. It provides a robust way to fuse rankings.

Separation of Concerns

Ingestion is strictly offline.
Retrieval is strictly read-only online.
Shared Components (like Embedding) are reused to ensure consistency between indexing and querying.

Data Flow

Offline Ingestion Flow

Download: acl-anthology library fetches metadata.
Clean: Text is normalized (unicode normalization, whitespace stripping).
Embed: Batches of abstracts are sent to the embedding model.
Upsert: Vectors + Payload are pushed to Qdrant.

Online Search Flow

Input: User provides "Machine Translation".
Parse filters: LLM extracts structured filters + semantic search intent.
Reformulate: LLM generates:
- "Neural Machine Translation state of the art"
- "Low-resource language translation"
- "Transformer based translation models"
Embed: All strings are embedded.
Search: Qdrant runs multiple searches (one per query).
Fuse: Hybrid RRF + score fusion combines the ranked lists.
Return: Results + response are streamed to the UI via SSE.

Extension Points

New Embedding Models: Change EMBEDDING_MODEL in config.
Different LLMs: Switch providers in api/src/llm/.
Hybrid Search: Add keyword-based search (BM25) to Qdrant and fuse with dense vectors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Architecture

High-Level Overview

Component Breakdown

1. Frontend Client

2. Backend API

3. Query Processor (`api/src/retrieval/query_processor.py`)

4. Filter Parser (`api/src/retrieval/filter_parser.py`)

5. LLM Reformulator (`api/src/llm/reformulator.py`)

6. Embedding Service (`api/src/vectorstore/client.py`)

7. Retrieval Pipeline (`api/src/retrieval/pipeline.py`)

8. Result Aggregator (`api/src/retrieval/aggregator.py`)

9. Response Synthesizer (`api/src/llm/reformulator.py`)

10. Vector Database

Design Patterns

Retrieval-Augmented Generation (RAG)

Reciprocal Rank Fusion (RRF)

Separation of Concerns

Data Flow

Offline Ingestion Flow

Online Search Flow

Extension Points

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

System Architecture

High-Level Overview

Component Breakdown

1. Frontend Client

2. Backend API

3. Query Processor (api/src/retrieval/query_processor.py)

4. Filter Parser (api/src/retrieval/filter_parser.py)

5. LLM Reformulator (api/src/llm/reformulator.py)

6. Embedding Service (api/src/vectorstore/client.py)

7. Retrieval Pipeline (api/src/retrieval/pipeline.py)

8. Result Aggregator (api/src/retrieval/aggregator.py)

9. Response Synthesizer (api/src/llm/reformulator.py)

10. Vector Database

Design Patterns

Retrieval-Augmented Generation (RAG)

Reciprocal Rank Fusion (RRF)

Separation of Concerns

Data Flow

Offline Ingestion Flow

Online Search Flow

Extension Points

3. Query Processor (`api/src/retrieval/query_processor.py`)

4. Filter Parser (`api/src/retrieval/filter_parser.py`)

5. LLM Reformulator (`api/src/llm/reformulator.py`)

6. Embedding Service (`api/src/vectorstore/client.py`)

7. Retrieval Pipeline (`api/src/retrieval/pipeline.py`)

8. Result Aggregator (`api/src/retrieval/aggregator.py`)

9. Response Synthesizer (`api/src/llm/reformulator.py`)