Skip to content

bhaumik611/TrustRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrustRAG

Production-grade Retrieval-Augmented Generation with hallucination risk scoring and adaptive LLM routing.

Python FastAPI Streamlit License: MIT

FeaturesArchitectureQuickstartHow It WorksAPI


Overview

Most RAG systems give you an answer and ask you to take it on faith. TrustRAG gives you the answer and tells you how much to trust it.

Every response includes a confidence score computed from three independent signals — retrieval quality, source consistency, and answer grounding — collapsed into a single hallucination risk label (Low / Medium / High). Under the hood, it combines FAISS semantic search with BM25 keyword search for stronger retrieval, and silently falls back from Groq (LLaMA3) to ZhipuAI (GLM-4) if the primary LLM is unavailable.


Screenshots

Query & Answer Hallucination Risk & Sources
Query Sources

Features

🔍 Hybrid Retrieval FAISS vector search + BM25 keyword search merged for higher recall
🛡️ Trust Scoring Per-response confidence from retrieval distance, source agreement, and answer grounding
⚠️ Hallucination Risk Low / Medium / High label on every response
🔀 Adaptive LLM Routing Groq (LLaMA3-8b) primary → ZhipuAI (GLM-4-Flash) automatic fallback
📄 PDF Ingestion PyMuPDF extraction + LangChain chunking with configurable size and overlap
Lazy Loading Embedding model loads on first use — zero startup delay
🌐 REST API Clean FastAPI backend with /upload and /query endpoints
🖥️ Streamlit UI One-page interface for uploading docs and asking questions

Architecture

┌──────────────────────────────────────────────────────┐
│                   Streamlit Frontend                  │
│    Sidebar: PDF upload    │    Main: query + output   │
└─────────────────┬────────────────────────────────────┘
                  │  HTTP REST  (port 8001)
┌─────────────────▼────────────────────────────────────┐
│                    FastAPI Backend                    │
│                                                       │
│   /api/upload                    /api/query           │
│        │                              │               │
│   PDF Parser                      Retriever           │
│   (PyMuPDF)                   FAISS + BM25            │
│        │                              │               │
│    Chunker                        Generator           │
│  (LangChain)                     LLM Router           │
│        │                              │               │
│   VectorStore                  Trust Evaluator        │
│  FAISS + texts              confidence + risk label   │
└──────────────────────────────────────────────────────┘

Data Flow

Ingestion

PDF → PyMuPDF → plain text → LangChain splitter (700 chars / 150 overlap)
    → all-MiniLM-L6-v2 embeddings → FAISS index
    → tokenized docs               → BM25 index

Query

question → FAISS top-k (semantic) ──┐
         → BM25  top-k (keyword)  ──┴─► merge + dedup → top 3 chunks
                                                              │
                                                    Groq (LLaMA3-8b)
                                                     (or ZhipuAI fallback)
                                                              │
                                                       answer + trust score

Project Structure

TrustRAG/
├── app/
│   ├── api/
│   │   └── routes.py            # /upload, /query, health check
│   ├── db/
│   │   └── vector_store.py      # FAISS index + sentence-transformer embeddings
│   ├── models/
│   │   ├── request_models.py    # QueryRequest (Pydantic)
│   │   └── response_models.py   # QueryResponse (Pydantic)
│   ├── services/
│   │   ├── generator.py         # Prompt construction + LLMRouter call
│   │   ├── llm_router.py        # Groq primary → ZhipuAI fallback
│   │   ├── rag_service.py       # Pipeline: retrieve → generate → evaluate
│   │   ├── retrieval.py         # Hybrid FAISS + BM25 retriever
│   │   └── trust.py             # Confidence scoring + risk classification
│   ├── utils/
│   │   ├── chunking.py          # RecursiveCharacterTextSplitter wrapper
│   │   └── pdf_parser.py        # PyMuPDF bytes → text
│   ├── config.py                # Env var loading
│   └── main.py                  # FastAPI app + CORS
├── frontend/
│   └── app.py                   # Streamlit UI
├── data/
│   ├── raw/                     # Uploaded PDFs
│   └── processed/               # Extracted text
├── .env.example                 # ← copy to .env and fill in keys
├── requirements.txt
└── README.md

Quickstart

Prerequisites

1. Clone & install

git clone https://github.com/yourusername/TrustRAG.git
cd TrustRAG
python -m venv venv && source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure environment

cp .env.example .env

Edit .env:

GROQ_API_KEY=your_groq_key_here
ZAI_API_KEY=your_zhipuai_key_here

3. Start the backend

Run from the project root — not from inside app/

uvicorn app.main:app --host 127.0.0.1 --port 8001

4. Start the frontend

Open a second terminal:

streamlit run frontend/app.py

Visit http://localhost:8501


Usage

  1. Upload PDFs via the left sidebar → click Process Files
  2. Wait for the ✅ confirmation (first run downloads the embedding model, ~90 MB)
  3. Ask a question in the main panel → click Get Answer
  4. Read the response:
📌 Answer          — generated strictly from your document context
📊 Confidence      — 0–100% score based on three independent signals  
⚠️ Risk            — 🟢 Low  /  🟡 Medium  /  🔴 High
📚 Sources Used    — exact chunks retrieved from your PDF

How It Works

Hybrid Retrieval

Rather than relying on vector search alone, TrustRAG runs two independent retrieval passes and merges the results:

  • FAISS (semantic) — finds chunks whose meaning is close to the query using cosine distance in embedding space
  • BM25 (keyword) — finds chunks that share exact terms with the query

Results are merged, deduplicated, and the top-3 chunks are passed to the LLM. This consistently outperforms either method alone on queries that mix conceptual and factual language.

Trust Scoring

Every response is scored on three orthogonal dimensions:

Signal Weight Method
Retrieval Confidence 40% FAISS L2 distance mapped via e^(−d/50). Distance 0 → 1.0; large distance → ~0. Measures how semantically close the retrieved context is to the query.
Source Agreement 30% Pairwise text overlap across retrieved chunks. High overlap = consistent, non-contradictory evidence.
Answer Grounding 30% Fraction of the first 10 answer tokens that appear in the retrieved context. A grounded answer reuses source vocabulary; a hallucinated one doesn't.
confidence = 0.4 × retrieval_conf + 0.3 × source_agreement + 0.3 × answer_grounding

confidence > 0.75  →  🟢 Low Risk
confidence > 0.50  →  🟡 Medium Risk
confidence ≤ 0.50  →  🔴 High Risk

LLM Routing

Query
  │
  ▼
Groq  ──── success ──► answer
  │
  fail (timeout / rate-limit / bad key)
  │
  ▼
ZhipuAI ── success ──► answer
  │
  fail
  │
  ▼
Error message

To add a third provider, implement call_xxx(prompt) in llm_router.py and chain it in generate().


API Reference

GET /api/

Health check.

Response

{ "status": "RAG API is running" }

POST /api/upload

Ingest one or more PDF files.

Requestmultipart/form-data, field name files

Response

{
  "message": "Files uploaded and processed successfully",
  "files": ["nvidia_annual_report.pdf"]
}

POST /api/query

Query the knowledge base.

Request

{ "query": "What was NVIDIA's revenue in Q4 2024?" }

Response

{
  "answer": "NVIDIA's Q4 2024 revenue was $22.1 billion, representing...",
  "confidence": 0.63,
  "sources": ["...chunk 1...", "...chunk 2...", "...chunk 3..."],
  "hallucination_risk": "Medium"
}

Troubleshooting

Symptom Fix
address already in use on port 8001 lsof -ti :8001 | xargs kill -9
Cannot connect to backend in Streamlit Confirm uvicorn is running on port 8001
Upload times out on first run Normal — embedding model downloading (~90 MB). Wait and retry.
No documents in knowledge base Upload and process PDFs before querying
Answer quality is poor Ask more specific questions; large PDFs need targeted queries
ModuleNotFoundError Activate your venv: source venv/bin/activate

Tech Stack

Backend FastAPI + Uvicorn
Frontend Streamlit
Vector Search FAISS IndexFlatL2
Keyword Search BM25Okapi (rank-bm25)
Embeddings all-MiniLM-L6-v2 (sentence-transformers)
PDF Parsing PyMuPDF (fitz)
Text Chunking LangChain RecursiveCharacterTextSplitter
Primary LLM Groq — llama3-8b-8192
Fallback LLM ZhipuAI — glm-4-flash
Validation Pydantic v2

License

MIT © Bhaumik Patel

About

Trust-aware RAG pipeline with hybrid retrieval (FAISS + BM25), hallucination risk scoring, and adaptive LLM routing (Groq → ZhipuAI fallback).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages