TrustRAG

Production-grade Retrieval-Augmented Generation with hallucination risk scoring and adaptive LLM routing.

Features • Architecture • Quickstart • How It Works • API

Overview

Most RAG systems give you an answer and ask you to take it on faith. TrustRAG gives you the answer and tells you how much to trust it.

Every response includes a confidence score computed from three independent signals — retrieval quality, source consistency, and answer grounding — collapsed into a single hallucination risk label (Low / Medium / High). Under the hood, it combines FAISS semantic search with BM25 keyword search for stronger retrieval, and silently falls back from Groq (LLaMA3) to ZhipuAI (GLM-4) if the primary LLM is unavailable.

Screenshots

Query & Answer	Hallucination Risk & Sources

Features


🔍 Hybrid Retrieval	FAISS vector search + BM25 keyword search merged for higher recall
🛡️ Trust Scoring	Per-response confidence from retrieval distance, source agreement, and answer grounding
⚠️ Hallucination Risk	Low / Medium / High label on every response
🔀 Adaptive LLM Routing	Groq (LLaMA3-8b) primary → ZhipuAI (GLM-4-Flash) automatic fallback
📄 PDF Ingestion	PyMuPDF extraction + LangChain chunking with configurable size and overlap
⚡ Lazy Loading	Embedding model loads on first use — zero startup delay
🌐 REST API	Clean FastAPI backend with `/upload` and `/query` endpoints
🖥️ Streamlit UI	One-page interface for uploading docs and asking questions

Architecture

┌──────────────────────────────────────────────────────┐
│                   Streamlit Frontend                  │
│    Sidebar: PDF upload    │    Main: query + output   │
└─────────────────┬────────────────────────────────────┘
                  │  HTTP REST  (port 8001)
┌─────────────────▼────────────────────────────────────┐
│                    FastAPI Backend                    │
│                                                       │
│   /api/upload                    /api/query           │
│        │                              │               │
│   PDF Parser                      Retriever           │
│   (PyMuPDF)                   FAISS + BM25            │
│        │                              │               │
│    Chunker                        Generator           │
│  (LangChain)                     LLM Router           │
│        │                              │               │
│   VectorStore                  Trust Evaluator        │
│  FAISS + texts              confidence + risk label   │
└──────────────────────────────────────────────────────┘

Data Flow

Ingestion

PDF → PyMuPDF → plain text → LangChain splitter (700 chars / 150 overlap)
    → all-MiniLM-L6-v2 embeddings → FAISS index
    → tokenized docs               → BM25 index

Query

question → FAISS top-k (semantic) ──┐
         → BM25  top-k (keyword)  ──┴─► merge + dedup → top 3 chunks
                                                              │
                                                    Groq (LLaMA3-8b)
                                                     (or ZhipuAI fallback)
                                                              │
                                                       answer + trust score

Project Structure

TrustRAG/
├── app/
│   ├── api/
│   │   └── routes.py            # /upload, /query, health check
│   ├── db/
│   │   └── vector_store.py      # FAISS index + sentence-transformer embeddings
│   ├── models/
│   │   ├── request_models.py    # QueryRequest (Pydantic)
│   │   └── response_models.py   # QueryResponse (Pydantic)
│   ├── services/
│   │   ├── generator.py         # Prompt construction + LLMRouter call
│   │   ├── llm_router.py        # Groq primary → ZhipuAI fallback
│   │   ├── rag_service.py       # Pipeline: retrieve → generate → evaluate
│   │   ├── retrieval.py         # Hybrid FAISS + BM25 retriever
│   │   └── trust.py             # Confidence scoring + risk classification
│   ├── utils/
│   │   ├── chunking.py          # RecursiveCharacterTextSplitter wrapper
│   │   └── pdf_parser.py        # PyMuPDF bytes → text
│   ├── config.py                # Env var loading
│   └── main.py                  # FastAPI app + CORS
├── frontend/
│   └── app.py                   # Streamlit UI
├── data/
│   ├── raw/                     # Uploaded PDFs
│   └── processed/               # Extracted text
├── .env.example                 # ← copy to .env and fill in keys
├── requirements.txt
└── README.md

Quickstart

Prerequisites

Python 3.10+
A free Groq API key (primary LLM)
A free ZhipuAI API key (fallback LLM)

1. Clone & install

git clone https://github.com/yourusername/TrustRAG.git
cd TrustRAG
python -m venv venv && source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure environment

cp .env.example .env

Edit .env:

GROQ_API_KEY=your_groq_key_here
ZAI_API_KEY=your_zhipuai_key_here

3. Start the backend

Run from the project root — not from inside app/

uvicorn app.main:app --host 127.0.0.1 --port 8001

4. Start the frontend

Open a second terminal:

streamlit run frontend/app.py

Visit http://localhost:8501

Usage

Upload PDFs via the left sidebar → click Process Files
Wait for the ✅ confirmation (first run downloads the embedding model, ~90 MB)
Ask a question in the main panel → click Get Answer
Read the response:

📌 Answer          — generated strictly from your document context
📊 Confidence      — 0–100% score based on three independent signals  
⚠️ Risk            — 🟢 Low  /  🟡 Medium  /  🔴 High
📚 Sources Used    — exact chunks retrieved from your PDF

How It Works

Hybrid Retrieval

Rather than relying on vector search alone, TrustRAG runs two independent retrieval passes and merges the results:

FAISS (semantic) — finds chunks whose meaning is close to the query using cosine distance in embedding space
BM25 (keyword) — finds chunks that share exact terms with the query

Results are merged, deduplicated, and the top-3 chunks are passed to the LLM. This consistently outperforms either method alone on queries that mix conceptual and factual language.

Trust Scoring

Every response is scored on three orthogonal dimensions:

Signal	Weight	Method
Retrieval Confidence	40%	FAISS L2 distance mapped via `e^(−d/50)`. Distance 0 → 1.0; large distance → ~0. Measures how semantically close the retrieved context is to the query.
Source Agreement	30%	Pairwise text overlap across retrieved chunks. High overlap = consistent, non-contradictory evidence.
Answer Grounding	30%	Fraction of the first 10 answer tokens that appear in the retrieved context. A grounded answer reuses source vocabulary; a hallucinated one doesn't.

confidence = 0.4 × retrieval_conf + 0.3 × source_agreement + 0.3 × answer_grounding

confidence > 0.75  →  🟢 Low Risk
confidence > 0.50  →  🟡 Medium Risk
confidence ≤ 0.50  →  🔴 High Risk

LLM Routing

Query
  │
  ▼
Groq  ──── success ──► answer
  │
  fail (timeout / rate-limit / bad key)
  │
  ▼
ZhipuAI ── success ──► answer
  │
  fail
  │
  ▼
Error message

To add a third provider, implement call_xxx(prompt) in llm_router.py and chain it in generate().

API Reference

`GET /api/`

Health check.

Response

{ "status": "RAG API is running" }

`POST /api/upload`

Ingest one or more PDF files.

Request — multipart/form-data, field name files

Response

{
  "message": "Files uploaded and processed successfully",
  "files": ["nvidia_annual_report.pdf"]
}

`POST /api/query`

Query the knowledge base.

Request

{ "query": "What was NVIDIA's revenue in Q4 2024?" }

Response

{
  "answer": "NVIDIA's Q4 2024 revenue was $22.1 billion, representing...",
  "confidence": 0.63,
  "sources": ["...chunk 1...", "...chunk 2...", "...chunk 3..."],
  "hallucination_risk": "Medium"
}

Troubleshooting

Symptom	Fix
`address already in use` on port 8001	`lsof -ti :8001 \| xargs kill -9`
`Cannot connect to backend` in Streamlit	Confirm uvicorn is running on port 8001
Upload times out on first run	Normal — embedding model downloading (~90 MB). Wait and retry.
`No documents in knowledge base`	Upload and process PDFs before querying
Answer quality is poor	Ask more specific questions; large PDFs need targeted queries
`ModuleNotFoundError`	Activate your venv: `source venv/bin/activate`

Tech Stack


Backend	FastAPI + Uvicorn
Frontend	Streamlit
Vector Search	FAISS `IndexFlatL2`
Keyword Search	BM25Okapi (`rank-bm25`)
Embeddings	`all-MiniLM-L6-v2` (sentence-transformers)
PDF Parsing	PyMuPDF (`fitz`)
Text Chunking	LangChain `RecursiveCharacterTextSplitter`
Primary LLM	Groq — `llama3-8b-8192`
Fallback LLM	ZhipuAI — `glm-4-flash`
Validation	Pydantic v2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrustRAG

Overview

Screenshots

Features

Architecture

Data Flow

Project Structure

Quickstart

Prerequisites

1. Clone & install

2. Configure environment

3. Start the backend

4. Start the frontend

Usage

How It Works

Hybrid Retrieval

Trust Scoring

LLM Routing

API Reference

`GET /api/`

`POST /api/upload`

`POST /api/query`

Troubleshooting

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Sc		Sc
app		app
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
ho_to_run.txt		ho_to_run.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

TrustRAG

Overview

Screenshots

Features

Architecture

Data Flow

Project Structure

Quickstart

Prerequisites

1. Clone & install

2. Configure environment

3. Start the backend

4. Start the frontend

Usage

How It Works

Hybrid Retrieval

Trust Scoring

LLM Routing

API Reference

GET /api/

POST /api/upload

POST /api/query

Troubleshooting

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /api/`

`POST /api/upload`

`POST /api/query`

Packages