UBC-MDS
diff --git a/‎README.md‎
Lines changed: 46 additions & 0 deletions b/‎README.md‎
Lines changed: 46 additions & 0 deletions
diff --git a/‎app/app.py‎
Lines changed: 10 additions & 11 deletions b/‎app/app.py‎
Lines changed: 10 additions & 11 deletions
@@ -68,6 +68,23 @@ flowchart LR
     output --> app["App\nHTML"]
 ```
 
+### Hybrid
+
+We will merge the above two into a hybrid retriever where we can give weights to the outputs of both retrievers, combining semantic similarity (FAISS) with keyword-based relevance (BM25) to produce a more robust and balanced ranking of documents. By changing the weights, we can control the trade-off between contextual understanding and exact term matching, and we landed on equal weights for our project.
+
+```mermaid
+flowchart LR
+    query["query"] --> sem["FAISS retriever"]
+    query --> bm["BM25 retriever"]
+    sem --> semtop["Top-k semantic"]
+    bm --> bmtop["Top-k BM25"]
+    semtop --->|50% weight| comb["Combined Output docs"]
+    bmtop --->|50% weight| comb
+    comb --> output["Output JSON\n(content + metadata)"]
+    output --> app["App\nHTML"]
+    metadata --->|metadata like image url| output
+```
+
 ## Setup
 
 1. Clone the repository using HTTP
@@ -131,6 +148,35 @@ The app will automatically use the full local index if available, otherwise fall
 
 Evaluation and exploration can be [generated here](./notebooks/milestone1_evaluate_retrieval.ipynb) and are [summarised here](./results/milestone1_discussion.md). We can see a few cases where BM25 was doing better, while in some FAISS was. We cannot compare the scores between them as they are on different scales, but we are able to see how they prioritise items. We have not implemented scoring based on other factors such as popularity or rating, and only rank the products based on their retrieval score. 
 
+## RAG and LLM Integration
+
+We will query the online hosted LLMs through huggingface api and thus we have the liberty to select somewhat heavier and powerful models, and we ended up selecting [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with a `max_token` limit of 512 tokens. Since our app is small scale, we expect to have enough free-tier API calls available, and the LLM performance was quite good over many iterations we tested.
+
+We tested 2 kinds of retriever for the RAG- a fully [semantic](#faiss) (FAISS) retriever and a [hybrid](#hybrid) (BM25+semantic) retriever with equal weights. We discovered the hybrid to work very well in this case and it is the sole RAG retriever for this implementation. In the future we can implement a slider to control the ratio of weights.
+
+Both semantic and hybrid can be explored in this [notebook](./notebooks/milestone2_rag.ipynb) with different prompts and parameters. The `rag_pipeline` object returns a tuple, where the second item will return the context retrieved from the retriever, so both can be tested simultaneously. The input `verbose=True` can also print the entire context which is being sent to the LLM, after each step, for more clear exploration.
+
+Here is the basic workflow:
+
+```mermaid
+flowchart LR
+    reviews["Retriever\n(Hybrid or Semantic)"] --> docs["Top k\nDocuments"]
+    docs --> embeddings["Create\nPage Context"]
+    embeddings --> similar["Prompt"]
+    sys_pro["SYSTEM Prompt"] --> similar
+    query(["User"]) --> qembed["Query"]
+    qembed --> reviews
+    qembed --> similar
+    similar --> response["LLM response"] --> output["Output JSON\n(content + metadata)"]
+    metadata[("Metadata")] --->|metadata like image_url| output["Output JSON\n(llm output + page_content)"]
+```
+#### LLM Evaluation
+
+Similar to Search function, some metrics and exploration can be [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and are [summarised here](./results/milestone2_discussion.md). We found that while LLM was slightly unpredictable, for most simple searches it did pretty well. We tried to depend as less as possible on the output formatting to avoid breaking of code in edge cases, eg. when the LLM does not return the `parent_asin` numbers.
+
+> **Disclaimer**
+> LLM-based pipelines may occasionally produce inaccurate or unexpected results. Since this application handles food and recipe-related queries, any guidance on cooking, storage, or handling should be independently verified before use. Prompting should be done carefully to avoid hallucinations.
+
 ## Authors
 
 - Sarisha Das
 
@@ -15,7 +15,7 @@
 from src.semantic import load_vector_store
 from src.rag_pipeline import run_rag
 from src.bm25 import load
-from src.hybrid import load_hybrid_retriever
+from src.hybrid import HybridRetriever
 
 from dotenv import load_dotenv
 load_dotenv()
@@ -36,6 +36,8 @@
 FEEDBACK_CSV = ROOT / "results" / "feedback.csv"
 FEEDBACK_CSV.parent.mkdir(parents=True, exist_ok=True)
 
+TOP_K = 5
+
 HF_TOKEN = os.getenv('HF_TOKEN')
 
 from datasets import load_dataset
@@ -125,16 +127,14 @@ def semantic_search(query: str, top_k: int = 3) -> list[dict]:
     results = enrich_search_results(vector_store, query, top_k, HF_DATASET['full'])
     return results
 
-@st.cache_resource
-def load_hybrid_retriever_cached():
-    return load_hybrid_retriever(
-        bm25_index_path=ROOT_FOLDER / "data" / "processed" / "tokenisation" / "bm25_index_mini.pkl",
-        faiss_store_path=ROOT_FOLDER / "data" / "processed" / "embeddings",
-        k=5,
+hybrid_retriever = HybridRetriever(
+        bm25_retriever=retriever,
+        semantic_store=vector_store,
+        k=TOP_K,
+        bm25_weight=0.5,
+        semantic_weight=0.5,
     )
 
-hybrid_retriever = load_hybrid_retriever_cached()
-
 def llm_retriever(query: str, top_k: int = 5):
     retriever = hybrid_retriever
     answer, docs = run_rag(retriever, query=query, hf_dataset=HF_DATASET['full'])
@@ -256,8 +256,6 @@ def render_results(results: list[dict], mode: str, query: str) -> None:
     unsafe_allow_html=True,
 )
 
-TOP_K = 5
-
 # ─── Search bar ───────────────────────────────────────────────────────────────
 query = st.text_input(
     "Search for a product or describe what you're looking for",
@@ -321,6 +319,7 @@ def render_results(results: list[dict], mode: str, query: str) -> None:
         )
     else:
         st.markdown(f"#### 🤖 AI Answer — *\"{st.session_state.last_query}\"*")
+        st.caption("⚠️ AI responses may contain errors - please verify before relying on them.")
         html_response = markdown.markdown(
             st.session_state.llm_result,
             extensions=["tables", "fenced_code", "nl2br"],