A web scraping project that collects graphics card data from e-commerce websites and displays it in a user-friendly interface.
- Scrapes GPU product information from Amazon
- Extracts product name, price, user rating, memory size, GPU model, and specifications
- Saves data to CSV files for easy analysis
- Includes a simple HTML frontend to view the collected data
webScraper/
├── gpu_scraper/ # Scrapy project directory
│ ├── spiders/ # Spider implementations
│ │ └── amazon_gpu.py # Amazon GPU spider
│ ├── items.py # Item definitions
│ ├── middlewares.py # Spider and downloader middlewares
│ ├── pipelines.py # Item pipelines
│ └── settings.py # Project settings
├── output/ # Output directory for scraped data
│ └── amazon_gpu_prices.csv
├── index.html # Frontend HTML interface
└── scrapy.cfg # Scrapy configuration file
-
Clone this repository:
git clone https://github.com/yourusername/gpu-price-tracker.git cd gpu-price-tracker -
Create and activate a virtual environment:
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt -
Install Playwright browsers (required for JavaScript rendering):
playwright install
To run the Amazon GPU spider:
cd webScraper
scrapy crawl amazon_search_product
The scraped data will be saved to output/amazon_gpu_prices.csv.
To view the scraped data in the web interface:
-
Start a local web server:
python -m http.server 8000 -
Open a web browser and navigate to:
http://localhost:8000/index.html
You can modify the scraper settings in gpu_scraper/settings.py:
USER_AGENT: Browser user agent stringDOWNLOAD_DELAY: Delay between requests (in seconds)CONCURRENT_REQUESTS_PER_DOMAIN: Maximum concurrent requests per domain
This project is licensed under the MIT License - see the LICENSE file for details.
- Scrapy - The web scraping framework used
- Playwright - Used for JavaScript rendering