This project implements a Retrieval-Augmented Generation (RAG) system using FastAPI.
Users can:
- Upload PDF documents
- Ask questions grounded strictly in document content
- Receive context-aware answers using semantic search + LLM
The system performs:
- PDF text extraction
- Text chunking
- Embedding generation (Sentence Transformers)
- Vector storage using ChromaDB
- Similarity retrieval
- Context-grounded response generation using Groq LLM
The entire system is containerized using Docker for reproducible deployment.
User → FastAPI → Chunking → Embeddings → ChromaDB → Retrieve → LLM → Response
- FastAPI
- Uvicorn
- Sentence Transformers
- ChromaDB(Vector Database)
- Groq LLM API
- Docker
Install Docker
Download and install Docker Desktop from google:
- Verify installation from the terminsl/cmd:
docker --version
- Build the image:
docker build -t rag-api .
- Run the container:
export GROQ_API_KEY="your_api_key_here" docker run -p 8000:8000 -e GROQ_API_KEY=$GROQ_API_KEY rag-api
- Open in browser:
- Create virtual environment:
python -m venv .venv source .venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Run server:
export GROQ_API_KEY="your_api_key_here" uvicorn app:app --reload
- Open:
POST /documents
Upload and index a PDF.
POST /chat
Ask questions grounded in the uploaded document.
This project requires:
GROQ_API_KEY
Do NOT hardcode it inside the source code.
The application is fully Dockerized and ready for deployment to:
- Google Cloud Run
- AWS ECS
- Azure Container Apps
- Any container-based platform
- Retrieval-Augmented Generation (RAG)
- Semantic Search
- Vector Databases
- Embedding-based Similarity
- Backend API Design
- Containerization with Docker