| Step | Function / File | Purpose |
|---|---|---|
| API entry | app.py → /ask |
Receives the user question and calls the RAG pipeline |
| Main pipeline | ask_question() |
Orchestrates the full flow from question to answer |
| Load documents | load_documents() |
Reads the JSON dataset from data/car_insurance_dataset.json |
| Chunking | chunk_text(), chunk_documents() |
Splits documents into retrieval-friendly chunks while preserving metadata |
| System initialization | initialize_system() |
Loads documents, creates chunks, and builds retrieval indexes |
| Keyword retrieval | build_bm25_index(), search_bm25() |
Supports lexical retrieval for exact words like “Premium”, “Germany”, or “track” |
| Vector retrieval | build_vector_index(), search_vector() |
Supports semantic retrieval using embeddings and FAISS |
| Retrieval planning | plan_retrieval() |
Lightweight agentic step that infers intent, entities, and which document types should be prioritized |
| Hybrid retrieval | retrieve_candidates() |
Expands the query, runs BM25 and vector search, and merges candidates |
| Reranking | rerank_candidates() |
Reorders candidates using metadata-aware scoring such as plan, region, audience, and document type |
| Evidence selection | select_context() |
Builds a balanced evidence bundle instead of sending plain top-k chunks |
| Prompt building | build_context_text() |
Creates the grounded prompt using selected evidence, user facts, and recent history |
| Answer generation | generate_answer() |
Sends the final prompt to the LLM and returns answer plus sources |
| Conversation memory | get_user_memory(), update_user_memory(), set_user_fact(), get_user_facts() |
Stores lightweight state per user_id for follow-up questions |
| Frontend | streamlit_app.py |
Provides a chat-style user interface on top of the FastAPI backend |
ask_question()
This is the main orchestrator of the whole system. It updates user memory, plans retrieval, retrieves evidence, reranks results, selects context, generates the answer, and stores the assistant response.
This is the lightweight agentic component. It analyzes the question and decides:
the likely intent relevant entities such as plan, region, or audience which document types should be prioritized
This helps retrieval go beyond simple similarity search.
This function performs hybrid retrieval by combining:
BM25 keyword search vector search with embeddings
It also expands the query using the output of the retrieval planner.
This is where metadata-aware reasoning happens. Retrieved chunks are boosted if they match:
the requested plan region audience important document types such as exceptions
This makes the system more reliable than plain top-k retrieval.
This function is one of the strongest parts of the system. Instead of sending the top few chunks blindly, it tries to assemble a balanced evidence set such as:
one general rule one plan-specific rule one exception one region or audience rule
This is how the system answers complex questions involving defaults, conditions, and overrides.
This function builds the grounded prompt and sends it to the LLM. It ensures the final answer is based only on the selected evidence and returns the supporting source document ids.
This sets up the retrieval system at startup by:
loading the dataset chunking documents building the BM25 index building the FAISS vector index
Without this step, retrieval would not work.
Memory functions
The memory helpers:
allow the assistant to support basic multi-turn conversations by remembering facts like:
the user’s region selected plan audience type such as young driver
BM25 index → keyword search FAISS index → semantic search
prompt = f""" You are a car insurance assistant.
Answer only from the provided evidence. If the evidence is incomplete, clearly say what is missing.
User question: {question}
User facts: {facts_text}
Recent conversation: {history_text}
Evidence: {evidence_text} """
| Part | Source |
|---|---|
| Question | user input |
| User facts | memory |
| Conversation | previous turns |
| Evidence | retrieved chunks |
| Instructions | system rules |
response = requests.post( "http://localhost:11434/api/generate", json={ "model": "llama3", "prompt": prompt, "stream": False } )
Knowledge Base ↓ data/car_insurance_dataset.json ↓ chunk_documents() ↓ Indexes (BM25 + FAISS)
User Question ↓ plan_retrieval() ← (agentic step) ↓ retrieve_candidates() ↓ rerank_candidates() ↓ select_context() ↓ build_context_text() ← PROMPT ↓ generate_answer() ← LLM (Llama3) ↓ Final Answer