RAG_Insurance_Control/README.md at main · nayefroqaya/RAG_Insurance_Control

Step	Function / File	Purpose
API entry	`app.py` → `/ask`	Receives the user question and calls the RAG pipeline
Main pipeline	`ask_question()`	Orchestrates the full flow from question to answer
Load documents	`load_documents()`	Reads the JSON dataset from `data/car_insurance_dataset.json`
Chunking	`chunk_text()`, `chunk_documents()`	Splits documents into retrieval-friendly chunks while preserving metadata
System initialization	`initialize_system()`	Loads documents, creates chunks, and builds retrieval indexes
Keyword retrieval	`build_bm25_index()`, `search_bm25()`	Supports lexical retrieval for exact words like “Premium”, “Germany”, or “track”
Vector retrieval	`build_vector_index()`, `search_vector()`	Supports semantic retrieval using embeddings and FAISS
Retrieval planning	`plan_retrieval()`	Lightweight agentic step that infers intent, entities, and which document types should be prioritized
Hybrid retrieval	`retrieve_candidates()`	Expands the query, runs BM25 and vector search, and merges candidates
Reranking	`rerank_candidates()`	Reorders candidates using metadata-aware scoring such as plan, region, audience, and document type
Evidence selection	`select_context()`	Builds a balanced evidence bundle instead of sending plain top-k chunks
Prompt building	`build_context_text()`	Creates the grounded prompt using selected evidence, user facts, and recent history
Answer generation	`generate_answer()`	Sends the final prompt to the LLM and returns answer plus sources
Conversation memory	`get_user_memory()`, `update_user_memory()`, `set_user_fact()`, `get_user_facts()`	Stores lightweight state per `user_id` for follow-up questions
Frontend	`streamlit_app.py`	Provides a chat-style user interface on top of the FastAPI backend

The most important functions in the project

ask_question()

This is the main orchestrator of the whole system. It updates user memory, plans retrieval, retrieves evidence, reranks results, selects context, generates the answer, and stores the assistant response.

plan_retrieval()

This is the lightweight agentic component. It analyzes the question and decides:

the likely intent relevant entities such as plan, region, or audience which document types should be prioritized

This helps retrieval go beyond simple similarity search.

retrieve_candidates()

This function performs hybrid retrieval by combining:

BM25 keyword search vector search with embeddings

It also expands the query using the output of the retrieval planner.

rerank_candidates()

This is where metadata-aware reasoning happens. Retrieved chunks are boosted if they match:

the requested plan region audience important document types such as exceptions

This makes the system more reliable than plain top-k retrieval.

select_context()

This function is one of the strongest parts of the system. Instead of sending the top few chunks blindly, it tries to assemble a balanced evidence set such as:

one general rule one plan-specific rule one exception one region or audience rule

This is how the system answers complex questions involving defaults, conditions, and overrides.

generate_answer()

This function builds the grounded prompt and sends it to the LLM. It ensures the final answer is based only on the selected evidence and returns the supporting source document ids.

initialize_system()

This sets up the retrieval system at startup by:

loading the dataset chunking documents building the BM25 index building the FAISS vector index

Without this step, retrieval would not work.

Memory functions

The memory helpers:

get_user_memory()

update_user_memory()

set_user_fact()

get_user_facts()

allow the assistant to support basic multi-turn conversations by remembering facts like:

the user’s region selected plan audience type such as young driver

KB : knowledge base = your JSON dataset + processed chunks

BM25 index → keyword search FAISS index → semantic search

Prompt Engineering :

prompt = f""" You are a car insurance assistant.

Answer only from the provided evidence. If the evidence is incomplete, clearly say what is missing.

User question: {question}

User facts: {facts_text}

Recent conversation: {history_text}

Evidence: {evidence_text} """

Part	Source
Question	user input
User facts	memory
Conversation	previous turns
Evidence	retrieved chunks
Instructions	system rules

LLM llama3

response = requests.post( "http://localhost:11434/api/generate", json={ "model": "llama3", "prompt": prompt, "stream": False } )

Knowledge Base ↓ data/car_insurance_dataset.json ↓ chunk_documents() ↓ Indexes (BM25 + FAISS)

User Question ↓ plan_retrieval() ← (agentic step) ↓ retrieve_candidates() ↓ rerank_candidates() ↓ select_context() ↓ build_context_text() ← PROMPT ↓ generate_answer() ← LLM (Llama3) ↓ Final Answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The most important functions in the project

plan_retrieval()

retrieve_candidates()

rerank_candidates()

select_context()

generate_answer()

initialize_system()

get_user_memory()

update_user_memory()

set_user_fact()

get_user_facts()

KB : knowledge base = your JSON dataset + processed chunks

Prompt Engineering :

LLM llama3

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

The most important functions in the project

plan_retrieval()

retrieve_candidates()

rerank_candidates()

select_context()

generate_answer()

initialize_system()

get_user_memory()

update_user_memory()

set_user_fact()

get_user_facts()

KB : knowledge base = your JSON dataset + processed chunks

Prompt Engineering :

LLM llama3