UBC-MDS
diff --git a/‎README.md‎
Lines changed: 12 additions & 7 deletions b/‎README.md‎
Lines changed: 12 additions & 7 deletions
@@ -4,8 +4,9 @@ This project builds an information retrieval system over the [Amazon Reviews 202
 
 - **BM25** — a classical keyword-based retrieval method using term frequency and inverse document frequency. This project uses LangChain BM25 retriever.
 - **Semantic Search** — dense vector retrieval using sentence embeddings (`all-MiniLM-L6-v2`) and a FAISS index
+- **Hybrid Search** — combines BM25 and Semantic Search using Reciprocal Rank Fusion (RRF) to produce a more robust and balanced ranking of documents
 
-The app allows users to switch between BM25 and Semantic search modes, view top-3 product results with reviews, and provide relevance feedback (👍/👎) which is recorded in a CSV file. Both the BM25 index (603,274 products) and the FAISS semantic index (20,000 products, smaller due to computing constraints) are hosted on HuggingFace and loaded automatically by the app.
+The app allows users to switch between BM25 and Semantic search modes, view top-3 product results with reviews, and provide relevance feedback (👍/👎) which is recorded in a CSV file. An **AI Assistant tab** powered by a Hybrid RAG pipeline (BM25 + Semantic + Llama-3-8B) is also available for natural language product recommendations. Both the BM25 index (603,274 products) and the FAISS semantic index (20,000 products, smaller due to computing constraints) are hosted on HuggingFace and loaded automatically by the app.
 
 **Live App:** [🥕🧀 Grocery & Gourmet Food Search](https://huggingface.co/spaces/rishadaz/amazon_retriever)
 
@@ -26,7 +27,7 @@ We utilise HuggingFace `datasets` package which enable us to use arrow-like SQL
 
 ## Retrieval Workflow
 
-We use a sparse method (BM25) and a dense method (FAISS). For more details please check notebooks related to [bm25](./notebooks/milestone1_bm25.ipynb) and [faiss](./notebooks/milestone1_semantic.ipynb). 
+We use a sparse method (BM25), a dense method (FAISS), and a hybrid combination of the two. For more details please check notebooks related to [bm25](./notebooks/milestone1_bm25.ipynb) and [faiss](./notebooks/milestone1_semantic.ipynb). 
 
 ### BM25
 
@@ -70,7 +71,7 @@ flowchart LR
 
 ### Hybrid
 
-We will merge the above two into a hybrid retriever where we can give weights to the outputs of both retrievers, combining semantic similarity (FAISS) with keyword-based relevance (BM25) to produce a more robust and balanced ranking of documents. By changing the weights, we can control the trade-off between contextual understanding and exact term matching, and we landed on equal weights for our project.
+We merge BM25 and FAISS into a hybrid retriever using **Reciprocal Rank Fusion (RRF)**, which combines semantic similarity with keyword-based relevance to produce a more robust and balanced ranking. RRF assigns each document a score based on its rank position in each retriever's results, then sums these scores with configurable weights — allowing us to control the trade-off between contextual understanding and exact term matching. The hybrid retriever is implemented in [`src/hybrid.py`](./src/hybrid.py) and is the retriever used by the AI Assistant tab.
 
 ```mermaid
 flowchart LR
@@ -146,13 +147,17 @@ The app will automatically use the full local index if available, otherwise fall
 
 ## Evalutation
 
-Evaluation and exploration can be [generated here](./notebooks/milestone1_evaluate_retrieval.ipynb) and are [summarised here](./results/milestone1_discussion.md). We can see a few cases where BM25 was doing better, while in some FAISS was. We cannot compare the scores between them as they are on different scales, but we are able to see how they prioritise items. We have not implemented scoring based on other factors such as popularity or rating, and only rank the products based on their retrieval score. 
+Evaluation and exploration can be [generated here](./notebooks/milestone1_evaluate_retrieval.ipynb) and are [summarized here](./results/milestone1_discussion.md). We can see a few cases where BM25 was doing better, while in some FAISS was. We cannot compare the scores between them as they are on different scales, but we are able to see how they prioritize items. We have not implemented scoring based on other factors such as popularity or rating, and only rank the products based on their retrieval score. 
+
+RAG pipeline evaluation is [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and [summarized here](./results/milestone2_discussion.md).
 
 ## RAG and LLM Integration
 
-We will query the online hosted LLMs through huggingface api and thus we have the liberty to select somewhat heavier and powerful models, and we ended up selecting [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with a `max_token` limit of 512 tokens. Since our app is small scale, we expect to have enough free-tier API calls available, and the LLM performance was quite good over many iterations we tested.
+We query the online hosted LLMs through the HuggingFace API and selected [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with a `max_token` limit of 512 tokens. Since our app is small scale, we expect to have enough free-tier API calls available, and the LLM performance was quite good over many iterations we tested.
+ 
+The **AI Assistant tab** in the app exposes the full RAG pipeline to the user — enter a grocery query and receive AI-generated product recommendations along with recipe ideas and storage tips, grounded in the retrieved product reviews and metadata.
 
-We tested 2 kinds of retriever for the RAG- a fully [semantic](#faiss) (FAISS) retriever and a [hybrid](#hybrid) (BM25+semantic) retriever with equal weights. We discovered the hybrid to work very well in this case and it is the sole RAG retriever for this implementation. In the future we can implement a slider to control the ratio of weights.
+We tested 2 kinds of retriever for the RAG - a fully [semantic](#faiss) (FAISS) retriever and a [hybrid](#hybrid) (BM25 + semantic) retriever with equal weights. We found the hybrid retriever to work better overall and it is the sole RAG retriever used in the AI Assistant tab. In the future we can implement a slider to control the ratio of weights used in the hybrid retriever.
 
 Both semantic and hybrid can be explored in this [notebook](./notebooks/milestone2_rag.ipynb) with different prompts and parameters. The `rag_pipeline` object returns a tuple, where the second item will return the context retrieved from the retriever, so both can be tested simultaneously. The input `verbose=True` can also print the entire context which is being sent to the LLM, after each step, for more clear exploration.
 
@@ -172,7 +177,7 @@ flowchart LR
 ```
 #### LLM Evaluation
 
-Similar to Search function, some metrics and exploration can be [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and are [summarised here](./results/milestone2_discussion.md). We found that while LLM was slightly unpredictable, for most simple searches it did pretty well. We tried to depend as less as possible on the output formatting to avoid breaking of code in edge cases, eg. when the LLM does not return the `parent_asin` numbers.
+Similar to the Search function, some metrics and exploration can be [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and are [summarized here](./results/milestone2_discussion.md). We found that while the LLM was slightly unpredictable, for most simple grocery queries it performed well. We tried to depend as little as possible on the output formatting to avoid breaking of code in edge cases, e.g. when the LLM does not return the `parent_asin` numbers.
 
 > **Disclaimer**
 > LLM-based pipelines may occasionally produce inaccurate or unexpected results. Since this application handles food and recipe-related queries, any guidance on cooking, storage, or handling should be independently verified before use. Prompting should be done carefully to avoid hallucinations.