The ACL Anthology hosts tens of thousands of NLP research papers. Traditional keyword-based search often fails to capture the semantic nuance of research queries, struggling with synonyms, paraphrases, and conceptual similarity.
ACL Anthology RAG bridges this gap by implementing a semantic retrieval system. It moves beyond simple keyword matching to understand the meaning behind a query, allowing researchers to discover relevant work even when they don't know the exact terminology.
- Semantic Search: Uses dense vector embeddings to find conceptually similar papers.
- Query Reformulation: Leverages LLMs to expand user queries into multiple search vectors, improving recall.
- Dual Query Modes: Supports both natural language questions and "Paper as Query" (using a paper ID to find related work).
- Unified Pipeline: A consistent architecture for handling different types of inputs.
- Modern Stack: FastAPI, LangChain, Fireworks (embeddings), Groq (LLM), Qdrant, React.
- Interactive UI: Clean, responsive interface with real-time streaming responses and inline citations.
The system follows a Retrieval-Augmented Generation (RAG) pattern, though currently focused on the retrieval aspect.
Note: Fireworks is used only for embedding generation. LLM-based query reformulation uses Groq.
The web interface provides:
- Real-time streaming of search responses as they're generated
- Inline citations with hover previews showing paper details, authors, and PDF links
- Monitoring panel displaying query processing pipeline, filters, and timing information
- Clean, responsive design optimized for research workflows
Best for: Exploratory research, topic discovery.
- Input: "How do I improve low-resource translation?"
- Process: The system expands this into multiple semantic queries (e.g., "low-resource NMT techniques", "data augmentation for translation").
Best for: Finding related work, literature review.
- Input:
2023.acl-long.412 - Process: The system fetches the abstract of the specified paper and uses it as a semantic proxy to find other papers in the same research neighborhood.
- Download: Metadata and abstracts are fetched from the ACL Anthology.
- Preprocess: Text is cleaned, normalized, and formatted.
- Embed: Dense vectors are generated for each abstract.
- Index: Vectors are stored in Qdrant for fast retrieval.
- Receive: User input (text or ID) is received.
- Reformulate: LLM generates multiple search queries.
- Retrieve: Vector search finds candidate papers for each query.
- Aggregate: Reciprocal Rank Fusion (RRF) combines and ranks results.
- Python 3.12+
- Node.js 20+
- Access to a Qdrant endpoint (local install or Qdrant Cloud)
-
Clone the repository
git clone https://github.com/nnayz/acl-anthology-rag.git cd acl-anthology-rag -
Configure Environment Copy
.env.exampleto.envinapi/and fill in your API keys (Groq/Fireworks).cp api/.env.example api/.env
Then set Qdrant connection in
api/.env:- Set
QDRANT_ENDPOINTto your running instance (e.g.,http://localhost:6333for a local install, or your Qdrant Cloud URL). - Optionally set
QDRANT_API_KEYif using Qdrant Cloud.
- Set
-
Run Backend
cd api uv sync uv run fastapi dev app.py -
Run Frontend
cd client npm install npm run dev
See Installation Guide for detailed setup.
.
├── api/ # Backend (FastAPI)
│ ├── src/
│ │ ├── ingestion/ # Data processing pipeline
│ │ ├── retrieval/ # Search logic & ranking
│ │ ├── llm/ # LLM integration
│ │ └── vectorstore/ # Qdrant interface
├── client/ # Frontend (React)
├── docs/ # Documentation
- Architecture: Deep dive into system components and design.
- Installation: Detailed setup and troubleshooting.
- Configuration: Environment variables and settings.
- Usage: How to use the system effectively.
- Workflows: Detailed offline and online pipeline steps.
- Evaluation: Offline evaluation pipeline, metrics, and reports.
- Operates on abstracts only (no full-text indexing).
- Requires external services: Groq (LLM for reformulation) and a Qdrant endpoint.
- Embedding generation via Fireworks API; throughput and cost depend on API limits.
- Limited quantitative evaluation included (primarily qualitative relevance checks).
- No cross-encoder re-ranking; ranking relies on dense similarity and RRF fusion.
- Not productionized (no auth, observability, or autoscaling in scope).
- Add hybrid retrieval (dense + BM25) and cross-encoder re-ranking.
- Incorporate full-text indexing and PDF parsing.
- Add user feedback loops and relevance learning.
- Enhance UI with advanced filters, export options, and saved searches.
- Batch/streaming ingestion pipeline with resumability and monitoring.
- Expand evaluation with curated benchmarks and inter-annotator agreement.
MIT License
