You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-7Lines changed: 12 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,8 +4,9 @@ This project builds an information retrieval system over the [Amazon Reviews 202
4
4
5
5
-**BM25** — a classical keyword-based retrieval method using term frequency and inverse document frequency. This project uses LangChain BM25 retriever.
6
6
-**Semantic Search** — dense vector retrieval using sentence embeddings (`all-MiniLM-L6-v2`) and a FAISS index
7
+
-**Hybrid Search** — combines BM25 and Semantic Search using Reciprocal Rank Fusion (RRF) to produce a more robust and balanced ranking of documents
7
8
8
-
The app allows users to switch between BM25 and Semantic search modes, view top-3 product results with reviews, and provide relevance feedback (👍/👎) which is recorded in a CSV file. Both the BM25 index (603,274 products) and the FAISS semantic index (20,000 products, smaller due to computing constraints) are hosted on HuggingFace and loaded automatically by the app.
9
+
The app allows users to switch between BM25 and Semantic search modes, view top-3 product results with reviews, and provide relevance feedback (👍/👎) which is recorded in a CSV file. An **AI Assistant tab** powered by a Hybrid RAG pipeline (BM25 + Semantic + Llama-3-8B) is also available for natural language product recommendations. Both the BM25 index (603,274 products) and the FAISS semantic index (20,000 products, smaller due to computing constraints) are hosted on HuggingFace and loaded automatically by the app.
@@ -26,7 +27,7 @@ We utilise HuggingFace `datasets` package which enable us to use arrow-like SQL
26
27
27
28
## Retrieval Workflow
28
29
29
-
We use a sparse method (BM25) and a dense method (FAISS). For more details please check notebooks related to [bm25](./notebooks/milestone1_bm25.ipynb) and [faiss](./notebooks/milestone1_semantic.ipynb).
30
+
We use a sparse method (BM25), a dense method (FAISS), and a hybrid combination of the two. For more details please check notebooks related to [bm25](./notebooks/milestone1_bm25.ipynb) and [faiss](./notebooks/milestone1_semantic.ipynb).
30
31
31
32
### BM25
32
33
@@ -70,7 +71,7 @@ flowchart LR
70
71
71
72
### Hybrid
72
73
73
-
We will merge the above two into a hybrid retriever where we can give weights to the outputs of both retrievers, combining semantic similarity (FAISS) with keyword-based relevance (BM25) to produce a more robust and balanced ranking of documents. By changing the weights, we can control the trade-off between contextual understanding and exact term matching, and we landed on equal weights for our project.
74
+
We merge BM25 and FAISS into a hybrid retriever using **Reciprocal Rank Fusion (RRF)**, which combines semantic similarity with keyword-based relevance to produce a more robust and balanced ranking. RRF assigns each document a score based on its rank position in each retriever's results, then sums these scores with configurable weights — allowing us to control the trade-off between contextual understanding and exact term matching. The hybrid retriever is implemented in [`src/hybrid.py`](./src/hybrid.py)and is the retriever used by the AI Assistant tab.
74
75
75
76
```mermaid
76
77
flowchart LR
@@ -146,13 +147,17 @@ The app will automatically use the full local index if available, otherwise fall
146
147
147
148
## Evalutation
148
149
149
-
Evaluation and exploration can be [generated here](./notebooks/milestone1_evaluate_retrieval.ipynb) and are [summarised here](./results/milestone1_discussion.md). We can see a few cases where BM25 was doing better, while in some FAISS was. We cannot compare the scores between them as they are on different scales, but we are able to see how they prioritise items. We have not implemented scoring based on other factors such as popularity or rating, and only rank the products based on their retrieval score.
150
+
Evaluation and exploration can be [generated here](./notebooks/milestone1_evaluate_retrieval.ipynb) and are [summarized here](./results/milestone1_discussion.md). We can see a few cases where BM25 was doing better, while in some FAISS was. We cannot compare the scores between them as they are on different scales, but we are able to see how they prioritize items. We have not implemented scoring based on other factors such as popularity or rating, and only rank the products based on their retrieval score.
151
+
152
+
RAG pipeline evaluation is [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and [summarized here](./results/milestone2_discussion.md).
150
153
151
154
## RAG and LLM Integration
152
155
153
-
We will query the online hosted LLMs through huggingface api and thus we have the liberty to select somewhat heavier and powerful models, and we ended up selecting [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with a `max_token` limit of 512 tokens. Since our app is small scale, we expect to have enough free-tier API calls available, and the LLM performance was quite good over many iterations we tested.
156
+
We query the online hosted LLMs through the HuggingFace API and selected [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with a `max_token` limit of 512 tokens. Since our app is small scale, we expect to have enough free-tier API calls available, and the LLM performance was quite good over many iterations we tested.
157
+
158
+
The **AI Assistant tab** in the app exposes the full RAG pipeline to the user — enter a grocery query and receive AI-generated product recommendations along with recipe ideas and storage tips, grounded in the retrieved product reviews and metadata.
154
159
155
-
We tested 2 kinds of retriever for the RAG- a fully [semantic](#faiss) (FAISS) retriever and a [hybrid](#hybrid) (BM25+semantic) retriever with equal weights. We discovered the hybrid to work very well in this case and it is the sole RAG retriever for this implementation. In the future we can implement a slider to control the ratio of weights.
160
+
We tested 2 kinds of retriever for the RAG- a fully [semantic](#faiss) (FAISS) retriever and a [hybrid](#hybrid) (BM25 + semantic) retriever with equal weights. We found the hybrid retriever to work better overall and it is the sole RAG retriever used in the AI Assistant tab. In the future we can implement a slider to control the ratio of weights used in the hybrid retriever.
156
161
157
162
Both semantic and hybrid can be explored in this [notebook](./notebooks/milestone2_rag.ipynb) with different prompts and parameters. The `rag_pipeline` object returns a tuple, where the second item will return the context retrieved from the retriever, so both can be tested simultaneously. The input `verbose=True` can also print the entire context which is being sent to the LLM, after each step, for more clear exploration.
158
163
@@ -172,7 +177,7 @@ flowchart LR
172
177
```
173
178
#### LLM Evaluation
174
179
175
-
Similar to Search function, some metrics and exploration can be [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and are [summarised here](./results/milestone2_discussion.md). We found that while LLM was slightly unpredictable, for most simple searches it did pretty well. We tried to depend as less as possible on the output formatting to avoid breaking of code in edge cases, eg. when the LLM does not return the `parent_asin` numbers.
180
+
Similar to the Search function, some metrics and exploration can be [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and are [summarized here](./results/milestone2_discussion.md). We found that while the LLM was slightly unpredictable, for most simple grocery queries it performed well. We tried to depend as little as possible on the output formatting to avoid breaking of code in edge cases, e.g. when the LLM does not return the `parent_asin` numbers.
176
181
177
182
> **Disclaimer**
178
183
> LLM-based pipelines may occasionally produce inaccurate or unexpected results. Since this application handles food and recipe-related queries, any guidance on cooking, storage, or handling should be independently verified before use. Prompting should be done carefully to avoid hallucinations.
0 commit comments