Skip to content

Commit bf835a5

Browse files
committed
Merge branch 'main' of github.com:UBC-MDS/DSCI_575_project_jchuang_esteki
2 parents e47f4ba + 7f13ff3 commit bf835a5

1 file changed

Lines changed: 5 additions & 6 deletions

File tree

README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ We use the **Amazon Reviews 2023** dataset from UC San Diego's McAuley Lab, cont
4242

4343
The project uses two primary files from the Books category:
4444

45-
**Reviews File: `Books.jsonl.gz`** - Contains 11.7 million user-written reviews. Each line is a JSON object with fields: rating (1-5 stars), title (review headline), text (full review), timestamp, verified_purchase, helpful_vote, and parent_asin (links to product metadata).
45+
**Reviews File: `Books.jsonl.gz`** - Contains 11.7 million user-written reviews. Each line is a JSON object with fields:
4646

4747
```
4848
Books.jsonl.gz (Reviews File)
@@ -55,7 +55,7 @@ Books.jsonl.gz (Reviews File)
5555
└── parent_asin [KEY] Product identifier (links to metadata)
5656
```
5757

58-
**Metadata File: `meta_Books.jsonl.gz`** - Contains 3.1 million product records. Each product has: asin (unique ID), parent_asin (for variants), title (book name), description, price, images, features, main_category, average_rating, and store information.
58+
**Metadata File: `meta_Books.jsonl.gz`** - Contains 3.1 million product records. Each product has:
5959

6060
```
6161
meta_Books.jsonl.gz (Metadata File)
@@ -125,22 +125,21 @@ streamlit run app/app.py
125125

126126
Try these example queries with the app running:
127127

128+
#### Easy Queries (BM25 works well):
128129
```
129-
Easy Queries (BM25 works well):
130130
mystery novel
131131
cookbook recipes
132132
science fiction space
133133
```
134134

135+
#### Medium Queries (Semantic works well):
135136
```
136-
Medium Queries (Semantic works well):
137137
book to help with anxiety
138138
guide for first time parents
139139
story about finding yourself
140140
```
141-
141+
#### Complex Queries (Both struggle):
142142
```
143-
Complex Queries (Both struggle):
144143
best book to learn machine learning with no math background
145144
historical fiction set in world war 2 from a female perspective
146145
self help book for overcoming procrastination and building better habits

0 commit comments

Comments
 (0)