A reddit scraper created for the Buy From EU project. Scrapes products from the subreddit based on search terms, distill the data using an LLM and format. This has been written with the aim of accessibility for users of all skillsets. Feel free to reach out if you have any questions.
Required:
- Mistral API key, paid tier
- Reddit developer project credentials
- A reddit account
How to:
- If you wish to run this in your own environment, download the notebook and use it locally.
- Otherwise, use the colab notebook here.
Approximate cost: Under 5EUR in LLM costs.
Todo: Data formatting needs improvement. For example, list strings and misformatted items. No item in the european column should appear in the american column. Etc..
Notes:
- If you are running to re-collect data as more posts are shared, adjust the timeframe in the code to avoid scraping and processing duplicate information.