You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-6Lines changed: 5 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,7 +42,7 @@ We use the **Amazon Reviews 2023** dataset from UC San Diego's McAuley Lab, cont
42
42
43
43
The project uses two primary files from the Books category:
44
44
45
-
**Reviews File: `Books.jsonl.gz`** - Contains 11.7 million user-written reviews. Each line is a JSON object with fields: rating (1-5 stars), title (review headline), text (full review), timestamp, verified_purchase, helpful_vote, and parent_asin (links to product metadata).
45
+
**Reviews File: `Books.jsonl.gz`** - Contains 11.7 million user-written reviews. Each line is a JSON object with fields:
46
46
47
47
```
48
48
Books.jsonl.gz (Reviews File)
@@ -55,7 +55,7 @@ Books.jsonl.gz (Reviews File)
55
55
└── parent_asin [KEY] Product identifier (links to metadata)
56
56
```
57
57
58
-
**Metadata File: `meta_Books.jsonl.gz`** - Contains 3.1 million product records. Each product has: asin (unique ID), parent_asin (for variants), title (book name), description, price, images, features, main_category, average_rating, and store information.
58
+
**Metadata File: `meta_Books.jsonl.gz`** - Contains 3.1 million product records. Each product has:
59
59
60
60
```
61
61
meta_Books.jsonl.gz (Metadata File)
@@ -125,22 +125,21 @@ streamlit run app/app.py
125
125
126
126
Try these example queries with the app running:
127
127
128
+
#### Easy Queries (BM25 works well):
128
129
```
129
-
Easy Queries (BM25 works well):
130
130
mystery novel
131
131
cookbook recipes
132
132
science fiction space
133
133
```
134
134
135
+
#### Medium Queries (Semantic works well):
135
136
```
136
-
Medium Queries (Semantic works well):
137
137
book to help with anxiety
138
138
guide for first time parents
139
139
story about finding yourself
140
140
```
141
-
141
+
#### Complex Queries (Both struggle):
142
142
```
143
-
Complex Queries (Both struggle):
144
143
best book to learn machine learning with no math background
145
144
historical fiction set in world war 2 from a female perspective
146
145
self help book for overcoming procrastination and building better habits
0 commit comments