Production-grade Retrieval-Augmented Generation with hallucination risk scoring and adaptive LLM routing.
Features • Architecture • Quickstart • How It Works • API
Most RAG systems give you an answer and ask you to take it on faith. TrustRAG gives you the answer and tells you how much to trust it.
Every response includes a confidence score computed from three independent signals — retrieval quality, source consistency, and answer grounding — collapsed into a single hallucination risk label (Low / Medium / High). Under the hood, it combines FAISS semantic search with BM25 keyword search for stronger retrieval, and silently falls back from Groq (LLaMA3) to ZhipuAI (GLM-4) if the primary LLM is unavailable.
| Query & Answer | Hallucination Risk & Sources |
|---|---|
![]() |
![]() |
| 🔍 Hybrid Retrieval | FAISS vector search + BM25 keyword search merged for higher recall |
| 🛡️ Trust Scoring | Per-response confidence from retrieval distance, source agreement, and answer grounding |
| Low / Medium / High label on every response | |
| 🔀 Adaptive LLM Routing | Groq (LLaMA3-8b) primary → ZhipuAI (GLM-4-Flash) automatic fallback |
| 📄 PDF Ingestion | PyMuPDF extraction + LangChain chunking with configurable size and overlap |
| ⚡ Lazy Loading | Embedding model loads on first use — zero startup delay |
| 🌐 REST API | Clean FastAPI backend with /upload and /query endpoints |
| 🖥️ Streamlit UI | One-page interface for uploading docs and asking questions |
┌──────────────────────────────────────────────────────┐
│ Streamlit Frontend │
│ Sidebar: PDF upload │ Main: query + output │
└─────────────────┬────────────────────────────────────┘
│ HTTP REST (port 8001)
┌─────────────────▼────────────────────────────────────┐
│ FastAPI Backend │
│ │
│ /api/upload /api/query │
│ │ │ │
│ PDF Parser Retriever │
│ (PyMuPDF) FAISS + BM25 │
│ │ │ │
│ Chunker Generator │
│ (LangChain) LLM Router │
│ │ │ │
│ VectorStore Trust Evaluator │
│ FAISS + texts confidence + risk label │
└──────────────────────────────────────────────────────┘
Ingestion
PDF → PyMuPDF → plain text → LangChain splitter (700 chars / 150 overlap)
→ all-MiniLM-L6-v2 embeddings → FAISS index
→ tokenized docs → BM25 index
Query
question → FAISS top-k (semantic) ──┐
→ BM25 top-k (keyword) ──┴─► merge + dedup → top 3 chunks
│
Groq (LLaMA3-8b)
(or ZhipuAI fallback)
│
answer + trust score
TrustRAG/
├── app/
│ ├── api/
│ │ └── routes.py # /upload, /query, health check
│ ├── db/
│ │ └── vector_store.py # FAISS index + sentence-transformer embeddings
│ ├── models/
│ │ ├── request_models.py # QueryRequest (Pydantic)
│ │ └── response_models.py # QueryResponse (Pydantic)
│ ├── services/
│ │ ├── generator.py # Prompt construction + LLMRouter call
│ │ ├── llm_router.py # Groq primary → ZhipuAI fallback
│ │ ├── rag_service.py # Pipeline: retrieve → generate → evaluate
│ │ ├── retrieval.py # Hybrid FAISS + BM25 retriever
│ │ └── trust.py # Confidence scoring + risk classification
│ ├── utils/
│ │ ├── chunking.py # RecursiveCharacterTextSplitter wrapper
│ │ └── pdf_parser.py # PyMuPDF bytes → text
│ ├── config.py # Env var loading
│ └── main.py # FastAPI app + CORS
├── frontend/
│ └── app.py # Streamlit UI
├── data/
│ ├── raw/ # Uploaded PDFs
│ └── processed/ # Extracted text
├── .env.example # ← copy to .env and fill in keys
├── requirements.txt
└── README.md
- Python 3.10+
- A free Groq API key (primary LLM)
- A free ZhipuAI API key (fallback LLM)
git clone https://github.com/yourusername/TrustRAG.git
cd TrustRAG
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtcp .env.example .envEdit .env:
GROQ_API_KEY=your_groq_key_here
ZAI_API_KEY=your_zhipuai_key_hereRun from the project root — not from inside
app/
uvicorn app.main:app --host 127.0.0.1 --port 8001Open a second terminal:
streamlit run frontend/app.pyVisit http://localhost:8501
- Upload PDFs via the left sidebar → click Process Files
- Wait for the ✅ confirmation (first run downloads the embedding model, ~90 MB)
- Ask a question in the main panel → click Get Answer
- Read the response:
📌 Answer — generated strictly from your document context
📊 Confidence — 0–100% score based on three independent signals
⚠️ Risk — 🟢 Low / 🟡 Medium / 🔴 High
📚 Sources Used — exact chunks retrieved from your PDF
Rather than relying on vector search alone, TrustRAG runs two independent retrieval passes and merges the results:
- FAISS (semantic) — finds chunks whose meaning is close to the query using cosine distance in embedding space
- BM25 (keyword) — finds chunks that share exact terms with the query
Results are merged, deduplicated, and the top-3 chunks are passed to the LLM. This consistently outperforms either method alone on queries that mix conceptual and factual language.
Every response is scored on three orthogonal dimensions:
| Signal | Weight | Method |
|---|---|---|
| Retrieval Confidence | 40% | FAISS L2 distance mapped via e^(−d/50). Distance 0 → 1.0; large distance → ~0. Measures how semantically close the retrieved context is to the query. |
| Source Agreement | 30% | Pairwise text overlap across retrieved chunks. High overlap = consistent, non-contradictory evidence. |
| Answer Grounding | 30% | Fraction of the first 10 answer tokens that appear in the retrieved context. A grounded answer reuses source vocabulary; a hallucinated one doesn't. |
confidence = 0.4 × retrieval_conf + 0.3 × source_agreement + 0.3 × answer_grounding
confidence > 0.75 → 🟢 Low Risk
confidence > 0.50 → 🟡 Medium Risk
confidence ≤ 0.50 → 🔴 High Risk
Query
│
▼
Groq ──── success ──► answer
│
fail (timeout / rate-limit / bad key)
│
▼
ZhipuAI ── success ──► answer
│
fail
│
▼
Error message
To add a third provider, implement call_xxx(prompt) in llm_router.py and chain it in generate().
Health check.
Response
{ "status": "RAG API is running" }Ingest one or more PDF files.
Request — multipart/form-data, field name files
Response
{
"message": "Files uploaded and processed successfully",
"files": ["nvidia_annual_report.pdf"]
}Query the knowledge base.
Request
{ "query": "What was NVIDIA's revenue in Q4 2024?" }Response
{
"answer": "NVIDIA's Q4 2024 revenue was $22.1 billion, representing...",
"confidence": 0.63,
"sources": ["...chunk 1...", "...chunk 2...", "...chunk 3..."],
"hallucination_risk": "Medium"
}| Symptom | Fix |
|---|---|
address already in use on port 8001 |
lsof -ti :8001 | xargs kill -9 |
Cannot connect to backend in Streamlit |
Confirm uvicorn is running on port 8001 |
| Upload times out on first run | Normal — embedding model downloading (~90 MB). Wait and retry. |
No documents in knowledge base |
Upload and process PDFs before querying |
| Answer quality is poor | Ask more specific questions; large PDFs need targeted queries |
ModuleNotFoundError |
Activate your venv: source venv/bin/activate |
| Backend | FastAPI + Uvicorn |
| Frontend | Streamlit |
| Vector Search | FAISS IndexFlatL2 |
| Keyword Search | BM25Okapi (rank-bm25) |
| Embeddings | all-MiniLM-L6-v2 (sentence-transformers) |
| PDF Parsing | PyMuPDF (fitz) |
| Text Chunking | LangChain RecursiveCharacterTextSplitter |
| Primary LLM | Groq — llama3-8b-8192 |
| Fallback LLM | ZhipuAI — glm-4-flash |
| Validation | Pydantic v2 |
MIT © Bhaumik Patel

