You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+46Lines changed: 46 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,6 +68,23 @@ flowchart LR
68
68
output --> app["App\nHTML"]
69
69
```
70
70
71
+
### Hybrid
72
+
73
+
We will merge the above two into a hybrid retriever where we can give weights to the outputs of both retrievers, combining semantic similarity (FAISS) with keyword-based relevance (BM25) to produce a more robust and balanced ranking of documents. By changing the weights, we can control the trade-off between contextual understanding and exact term matching, and we landed on equal weights for our project.
@@ -131,6 +148,35 @@ The app will automatically use the full local index if available, otherwise fall
131
148
132
149
Evaluation and exploration can be [generated here](./notebooks/milestone1_evaluate_retrieval.ipynb) and are [summarised here](./results/milestone1_discussion.md). We can see a few cases where BM25 was doing better, while in some FAISS was. We cannot compare the scores between them as they are on different scales, but we are able to see how they prioritise items. We have not implemented scoring based on other factors such as popularity or rating, and only rank the products based on their retrieval score.
133
150
151
+
## RAG and LLM Integration
152
+
153
+
We will query the online hosted LLMs through huggingface api and thus we have the liberty to select somewhat heavier and powerful models, and we ended up selecting [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with a `max_token` limit of 512 tokens. Since our app is small scale, we expect to have enough free-tier API calls available, and the LLM performance was quite good over many iterations we tested.
154
+
155
+
We tested 2 kinds of retriever for the RAG- a fully [semantic](#faiss) (FAISS) retriever and a [hybrid](#hybrid) (BM25+semantic) retriever with equal weights. We discovered the hybrid to work very well in this case and it is the sole RAG retriever for this implementation. In the future we can implement a slider to control the ratio of weights.
156
+
157
+
Both semantic and hybrid can be explored in this [notebook](./notebooks/milestone2_rag.ipynb) with different prompts and parameters. The `rag_pipeline` object returns a tuple, where the second item will return the context retrieved from the retriever, so both can be tested simultaneously. The input `verbose=True` can also print the entire context which is being sent to the LLM, after each step, for more clear exploration.
158
+
159
+
Here is the basic workflow:
160
+
161
+
```mermaid
162
+
flowchart LR
163
+
reviews["Retriever\n(Hybrid or Semantic)"] --> docs["Top k\nDocuments"]
164
+
docs --> embeddings["Create\nPage Context"]
165
+
embeddings --> similar["Prompt"]
166
+
sys_pro["SYSTEM Prompt"] --> similar
167
+
query(["User"]) --> qembed["Query"]
168
+
qembed --> reviews
169
+
qembed --> similar
170
+
similar --> response["LLM response"] --> output["Output JSON\n(content + metadata)"]
171
+
metadata[("Metadata")] --->|metadata like image_url| output["Output JSON\n(llm output + page_content)"]
172
+
```
173
+
#### LLM Evaluation
174
+
175
+
Similar to Search function, some metrics and exploration can be [generated here](./notebooks/milestone2_evaluate_rag.ipynb) and are [summarised here](./results/milestone2_discussion.md). We found that while LLM was slightly unpredictable, for most simple searches it did pretty well. We tried to depend as less as possible on the output formatting to avoid breaking of code in edge cases, eg. when the LLM does not return the `parent_asin` numbers.
176
+
177
+
> **Disclaimer**
178
+
> LLM-based pipelines may occasionally produce inaccurate or unexpected results. Since this application handles food and recipe-related queries, any guidance on cooking, storage, or handling should be independently verified before use. Prompting should be done carefully to avoid hallucinations.
0 commit comments