Skip to content

Jliezed/oc_project_2_BookToScrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

oc-project-shield web-scraping-shield beautiful-soup-shield


OC - PROJECT N°2 - BOOK TO SCRAPE

From Book To Scrape, get product information of all products pages and save it into a CSV file using Requests, Beautiful Soup, CSV and RE libraries.

web-scraping-unsplash By Emile Perron

About The Project

Understand the logic with the flowchart below:

  • Get a list of all Categories Links

  • For each Category Link

    • Parse Links Products Pages
    • IF there is a "Next" page then go to this page and parse Links Products Pages
  • Create a CSV file

  • For each Product Page

    • Parse Products Information
    • Insert product information in the CSV
    • Save the image of the book

Product Name Screen Shot

(back to top)

Built With

(back to top)

Getting Started

You will need to install Requests and BeautifulSoup libraries.

Prerequisites

Install Python libraries before to clone the repo:

  • Requests
    pip install requests
  • Beautiful Soup
    pip install bs4

Installation & Running the script

  1. Clone the repo
    git clone https://github.com/Jliezed/oc_project_2_BookToScrape.git

Create and activate a virtual environment

  1. Go to your project directory
    cd /oc_project_2_BookToScrape
  2. Install venv library (if not yet in your computer)
    pip install venv
  3. Create a virtual environment
    python -m venv env
  4. Activate the virtual environment
    source env/bin/activate

  1. Install the packages using requirements.txt
    pip install -r requirements.txt
  2. Run the script using the terminal
    python main.py

(back to top)

Outputs

You will get a separate CSV file by category including for each product page :

  • product_page_url
  • universal_ product_code (upc)
  • title
  • price_including_tax
  • price_excluding_tax
  • number_available
  • product_description
  • category
  • review_rating
  • image_url

It will also save product image for each product page.

(back to top)

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages