Skip to content

Commit f4dfdd4

Browse files
authored
Merge pull request #31 from UBC-MDS/update_m2_discussion
Update m2 discussion and readme
2 parents 036c1d9 + b7bc91d commit f4dfdd4

2 files changed

Lines changed: 37 additions & 25 deletions

File tree

README.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@ This project builds an information retrieval system over the [Amazon Reviews 202
44

55
- **BM25** — a classical keyword-based retrieval method using term frequency and inverse document frequency. This project uses LangChain BM25 retriever.
66
- **Semantic Search** — dense vector retrieval using sentence embeddings (`all-MiniLM-L6-v2`) and a FAISS index
7+
- **Hybrid Search** — combines BM25 and Semantic Search using Reciprocal Rank Fusion (RRF) to produce a more robust and balanced ranking of documents
78

8-
The app allows users to switch between BM25 and Semantic search modes, view top-3 product results with reviews, and provide relevance feedback (👍/👎) which is recorded in a CSV file. Both the BM25 index (603,274 products) and the FAISS semantic index (20,000 products, smaller due to computing constraints) are hosted on HuggingFace and loaded automatically by the app.
9+
The app allows users to switch between BM25 and Semantic search modes, view top-3 product results with reviews, and provide relevance feedback (👍/👎) which is recorded in a CSV file. An **AI Assistant tab** powered by a Hybrid RAG pipeline (BM25 + Semantic + Llama-3-8B) is also available for natural language product recommendations. Both the BM25 index (603,274 products) and the FAISS semantic index (20,000 products, smaller due to computing constraints) are hosted on HuggingFace and loaded automatically by the app.
910

1011
**Live App:** [🥕🧀 Grocery & Gourmet Food Search](https://huggingface.co/spaces/rishadaz/amazon_retriever)
1112

@@ -26,7 +27,7 @@ We utilise HuggingFace `datasets` package which enable us to use arrow-like SQL
2627

2728
## Retrieval Workflow
2829

29-
We use a sparse method (BM25) and a dense method (FAISS). For more details please check notebooks related to [bm25](./notebooks/milestone1_bm25.ipynb) and [faiss](./notebooks/milestone1_semantic.ipynb).
30+
We use a sparse method (BM25), a dense method (FAISS), and a hybrid combination of the two. For more details please check notebooks related to [bm25](./notebooks/milestone1_bm25.ipynb) and [faiss](./notebooks/milestone1_semantic.ipynb).
3031

3132
### BM25
3233

@@ -70,7 +71,7 @@ flowchart LR
7071

7172
### Hybrid
7273

73-
We will merge the above two into a hybrid retriever where we can give weights to the outputs of both retrievers, combining semantic similarity (FAISS) with keyword-based relevance (BM25) to produce a more robust and balanced ranking of documents. By changing the weights, we can control the trade-off between contextual understanding and exact term matching, and we landed on equal weights for our project.
74+
We merge BM25 and FAISS into a hybrid retriever using **Reciprocal Rank Fusion (RRF)**, which combines semantic similarity with keyword-based relevance to produce a more robust and balanced ranking. RRF assigns each document a score based on its rank position in each retriever's results, then sums these scores with configurable weights — allowing us to control the trade-off between contextual understanding and exact term matching. The hybrid retriever is implemented in [`src/hybrid.py`](./src/hybrid.py) and is the retriever used by the AI Assistant tab.
7475

7576
```mermaid
7677
flowchart LR
@@ -146,13 +147,17 @@ The app will automatically use the full local index if available, otherwise fall
146147

147148
## Evalutation
148149

149-
Evaluation and exploration can be [generated here](./notebooks/milestone1_evaluate_retrieval.ipynb) and are [summarised here](./results/milestone1_discussion.md). We can see a few cases where BM25 was doing better, while in some FAISS was. We cannot compare the scores between them as they are on different scales, but we are able to see how they prioritise items. We have not implemented scoring based on other factors such as popularity or rating, and only rank the products based on their retrieval score.
150+
Evaluation and exploration can be [generated here](./notebooks/milestone1_evaluate_retrieval.ipynb) and are [summarized here](./results/milestone1_discussion.md). We can see a few cases where BM25 was doing better, while in some FAISS was. We cannot compare the scores between them as they are on different scales, but we are able to see how they prioritize items. We have not implemented scoring based on other factors such as popularity or rating, and only rank the products based on their retrieval score.
151+
152+
RAG pipeline evaluation is [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and [summarized here](./results/milestone2_discussion.md).
150153

151154
## RAG and LLM Integration
152155

153-
We will query the online hosted LLMs through huggingface api and thus we have the liberty to select somewhat heavier and powerful models, and we ended up selecting [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with a `max_token` limit of 512 tokens. Since our app is small scale, we expect to have enough free-tier API calls available, and the LLM performance was quite good over many iterations we tested.
156+
We query the online hosted LLMs through the HuggingFace API and selected [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with a `max_token` limit of 512 tokens. Since our app is small scale, we expect to have enough free-tier API calls available, and the LLM performance was quite good over many iterations we tested.
157+
158+
The **AI Assistant tab** in the app exposes the full RAG pipeline to the user — enter a grocery query and receive AI-generated product recommendations along with recipe ideas and storage tips, grounded in the retrieved product reviews and metadata.
154159

155-
We tested 2 kinds of retriever for the RAG- a fully [semantic](#faiss) (FAISS) retriever and a [hybrid](#hybrid) (BM25+semantic) retriever with equal weights. We discovered the hybrid to work very well in this case and it is the sole RAG retriever for this implementation. In the future we can implement a slider to control the ratio of weights.
160+
We tested 2 kinds of retriever for the RAG - a fully [semantic](#faiss) (FAISS) retriever and a [hybrid](#hybrid) (BM25 + semantic) retriever with equal weights. We found the hybrid retriever to work better overall and it is the sole RAG retriever used in the AI Assistant tab. In the future we can implement a slider to control the ratio of weights used in the hybrid retriever.
156161

157162
Both semantic and hybrid can be explored in this [notebook](./notebooks/milestone2_rag.ipynb) with different prompts and parameters. The `rag_pipeline` object returns a tuple, where the second item will return the context retrieved from the retriever, so both can be tested simultaneously. The input `verbose=True` can also print the entire context which is being sent to the LLM, after each step, for more clear exploration.
158163

@@ -172,7 +177,7 @@ flowchart LR
172177
```
173178
#### LLM Evaluation
174179

175-
Similar to Search function, some metrics and exploration can be [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and are [summarised here](./results/milestone2_discussion.md). We found that while LLM was slightly unpredictable, for most simple searches it did pretty well. We tried to depend as less as possible on the output formatting to avoid breaking of code in edge cases, eg. when the LLM does not return the `parent_asin` numbers.
180+
Similar to the Search function, some metrics and exploration can be [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and are [summarized here](./results/milestone2_discussion.md). We found that while the LLM was slightly unpredictable, for most simple grocery queries it performed well. We tried to depend as little as possible on the output formatting to avoid breaking of code in edge cases, e.g. when the LLM does not return the `parent_asin` numbers.
176181

177182
> **Disclaimer**
178183
> LLM-based pipelines may occasionally produce inaccurate or unexpected results. Since this application handles food and recipe-related queries, any guidance on cooking, storage, or handling should be independently verified before use. Prompting should be done carefully to avoid hallucinations.

0 commit comments

Comments
 (0)