Skip to content

Latest commit

 

History

History
159 lines (119 loc) · 6.76 KB

File metadata and controls

159 lines (119 loc) · 6.76 KB
Step Function / File Purpose
API entry app.py/ask Receives the user question and calls the RAG pipeline
Main pipeline ask_question() Orchestrates the full flow from question to answer
Load documents load_documents() Reads the JSON dataset from data/car_insurance_dataset.json
Chunking chunk_text(), chunk_documents() Splits documents into retrieval-friendly chunks while preserving metadata
System initialization initialize_system() Loads documents, creates chunks, and builds retrieval indexes
Keyword retrieval build_bm25_index(), search_bm25() Supports lexical retrieval for exact words like “Premium”, “Germany”, or “track”
Vector retrieval build_vector_index(), search_vector() Supports semantic retrieval using embeddings and FAISS
Retrieval planning plan_retrieval() Lightweight agentic step that infers intent, entities, and which document types should be prioritized
Hybrid retrieval retrieve_candidates() Expands the query, runs BM25 and vector search, and merges candidates
Reranking rerank_candidates() Reorders candidates using metadata-aware scoring such as plan, region, audience, and document type
Evidence selection select_context() Builds a balanced evidence bundle instead of sending plain top-k chunks
Prompt building build_context_text() Creates the grounded prompt using selected evidence, user facts, and recent history
Answer generation generate_answer() Sends the final prompt to the LLM and returns answer plus sources
Conversation memory get_user_memory(), update_user_memory(), set_user_fact(), get_user_facts() Stores lightweight state per user_id for follow-up questions
Frontend streamlit_app.py Provides a chat-style user interface on top of the FastAPI backend

The most important functions in the project

ask_question()

This is the main orchestrator of the whole system. It updates user memory, plans retrieval, retrieves evidence, reranks results, selects context, generates the answer, and stores the assistant response.

plan_retrieval()

This is the lightweight agentic component. It analyzes the question and decides:

the likely intent relevant entities such as plan, region, or audience which document types should be prioritized

This helps retrieval go beyond simple similarity search.

retrieve_candidates()

This function performs hybrid retrieval by combining:

BM25 keyword search vector search with embeddings

It also expands the query using the output of the retrieval planner.

rerank_candidates()

This is where metadata-aware reasoning happens. Retrieved chunks are boosted if they match:

the requested plan region audience important document types such as exceptions

This makes the system more reliable than plain top-k retrieval.

select_context()

This function is one of the strongest parts of the system. Instead of sending the top few chunks blindly, it tries to assemble a balanced evidence set such as:

one general rule one plan-specific rule one exception one region or audience rule

This is how the system answers complex questions involving defaults, conditions, and overrides.

generate_answer()

This function builds the grounded prompt and sends it to the LLM. It ensures the final answer is based only on the selected evidence and returns the supporting source document ids.

initialize_system()

This sets up the retrieval system at startup by:

loading the dataset chunking documents building the BM25 index building the FAISS vector index

Without this step, retrieval would not work.

Memory functions

The memory helpers:

get_user_memory()

update_user_memory()

set_user_fact()

get_user_facts()

allow the assistant to support basic multi-turn conversations by remembering facts like:

the user’s region selected plan audience type such as young driver

KB : knowledge base = your JSON dataset + processed chunks

BM25 index → keyword search FAISS index → semantic search

Prompt Engineering :

prompt = f""" You are a car insurance assistant.

Answer only from the provided evidence. If the evidence is incomplete, clearly say what is missing.

User question: {question}

User facts: {facts_text}

Recent conversation: {history_text}

Evidence: {evidence_text} """

Part Source
Question user input
User facts memory
Conversation previous turns
Evidence retrieved chunks
Instructions system rules

LLM llama3

response = requests.post( "http://localhost:11434/api/generate", json={ "model": "llama3", "prompt": prompt, "stream": False } )

Knowledge Base ↓ data/car_insurance_dataset.json ↓ chunk_documents() ↓ Indexes (BM25 + FAISS)

User Question ↓ plan_retrieval() ← (agentic step) ↓ retrieve_candidates() ↓ rerank_candidates() ↓ select_context() ↓ build_context_text() ← PROMPT ↓ generate_answer() ← LLM (Llama3) ↓ Final Answer