Amazon Product Query Assistant

This project explores how to build a smart Amazon product query assistant using multiple information retrieval and generation methods. It compares BM25 keyword retrieval, semantic search with embeddings, and a Hybrid RAG pipeline on the Amazon Reviews 2023 dataset. The system supports both retrieval-only search and retrieval-augmented generation (RAG), and presents the results through an interactive Streamlit app where users can explore product results or receive LLM-generated answers based on retrieved Amazon product metadata and reviews.

The web interface can now be accessed at https://yhouyang02-dsci-575-project-yuhengo-mkcchoy.share.connect.posit.cloud.

Project maintainers

Repository structure

DSCI_575_project_yuhengo_mkcchoy
  ├── app/                     # Streamlit app code
  ├── bm25_index/              # BM25 retriever artifacts
  ├── data/                    # Raw and processed data (downloaded separately)
  ├── notebooks/               # Jupyter notebooks for experimentation
  ├── results/                 # Result discussion and analysis
  ├── semantic_index/          # Semantic retriever artifacts
  ├── src/                     # Source code for data processing and retrieval
  ├── environment.yml          # Conda environment specification
  ├── README.md                # Project overview and instructions

Get started

To run the app locally, follow the following steps:

Clone the repository and navigate to the project directory.

git clone https://github.com/UBC-MDS/DSCI_575_project_yuhengo_mkcchoy.git
cd DSCI_575_project_yuhengo_mkcchoy

Create and activate the conda` environment.

conda env create -f environment.yml
conda activate dsci-575-mkc-yho

To use the RAG mode, create a Hugging Face Access Token following their instruction. When creating a new token,
- Choose "Fine-grained" for token type
- Check all permissions under "Repositories" and "Inference"
Add the API token into your local folder. This should not be committed to any remote repositories. Replace <your-api-token> with the token value you just created and run the following in your terminal. You must have this ready to use the LLM-powered RAG mode.
```
echo "HUGGINGFACEHUB_API_TOKEN=<your-api-token>" > .env
```
Start the Streamlit app. The app should open in your default web browser at http://localhost:8501. If it does not open automatically, you can navigate to that URL manually. It can take a few minutes to load the full app.
```
streamlit run app/app.py
```
Enter product-related queries in the input box and click the "Search" button. The results may be limited since our test models are built on a subset of the full dataset. For better results, you can try queries related to the "Appliances" category, such as "quiet dishwasher stainless steel".
To stop the app, press Ctrl+C in the terminal where the Streamlit app is running.
For developers who want to retrain the models with a different sample size or adjust the training process, you can modify src/build_artifacts.py and run the following command to rebuild the model artifacts. This will retrain the BM25 and semantic retrievers and save the new artifacts in their respective folders.
```
python src/build_artifacts.py
```

RAG workflow diagram

flowchart LR
    A[User Query] --> B{Retriever}
    B --> C[Semantic Retriever<br/>FAISS + Embeddings]
    B --> D[Hybrid Retriever<br/>BM25 + Semantic + RRF]

    C --> E[Top-k Documents]
    D --> E

    E --> F[build_context]
    F --> G[Prompt Template]
    G --> H[Hugging Face LLM]
    H --> I[Generated Answer]

    J[Amazon product docs<br/>reviews + metadata] --> C
    J --> D
    K[.env API key] --> H

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Product Query Assistant

Project maintainers

Repository structure

Get started

RAG workflow diagram

About

Uh oh!

Releases 3

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
app		app
bm25_index		bm25_index
notebooks		notebooks
results		results
semantic_index		semantic_index
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Amazon Product Query Assistant

Project maintainers

Repository structure

Get started

RAG workflow diagram

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Contributors

Uh oh!

Languages