Skip to content

FNXDOOM/Gpu_Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU Price Tracker

A web scraping project that collects graphics card data from e-commerce websites and displays it in a user-friendly interface.

Features

  • Scrapes GPU product information from Amazon
  • Extracts product name, price, user rating, memory size, GPU model, and specifications
  • Saves data to CSV files for easy analysis
  • Includes a simple HTML frontend to view the collected data

Project Structure

webScraper/
├── gpu_scraper/            # Scrapy project directory
│   ├── spiders/            # Spider implementations
│   │   └── amazon_gpu.py   # Amazon GPU spider
│   ├── items.py            # Item definitions
│   ├── middlewares.py      # Spider and downloader middlewares
│   ├── pipelines.py        # Item pipelines
│   └── settings.py         # Project settings
├── output/                 # Output directory for scraped data
│   └── amazon_gpu_prices.csv
├── index.html              # Frontend HTML interface
└── scrapy.cfg              # Scrapy configuration file

Installation

  1. Clone this repository:

    git clone https://github.com/yourusername/gpu-price-tracker.git
    cd gpu-price-tracker
    
  2. Create and activate a virtual environment:

    python -m venv venv
    # On Windows
    venv\Scripts\activate
    # On macOS/Linux
    source venv/bin/activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Install Playwright browsers (required for JavaScript rendering):

    playwright install
    

Usage

Running the Scraper

To run the Amazon GPU spider:

cd webScraper
scrapy crawl amazon_search_product

The scraped data will be saved to output/amazon_gpu_prices.csv.

Viewing the Data

To view the scraped data in the web interface:

  1. Start a local web server:

    python -m http.server 8000
    
  2. Open a web browser and navigate to:

    http://localhost:8000/index.html
    

Configuration

You can modify the scraper settings in gpu_scraper/settings.py:

  • USER_AGENT: Browser user agent string
  • DOWNLOAD_DELAY: Delay between requests (in seconds)
  • CONCURRENT_REQUESTS_PER_DOMAIN: Maximum concurrent requests per domain

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Scrapy - The web scraping framework used
  • Playwright - Used for JavaScript rendering

About

Project to scrape the current gpu prices, specifications from amazon.in

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors