Final Project: Parallel Web Scraping System

Author: Wenlue Jeff Zhong

Project Date: 05/23/2024

Overview

This project demonstrates the implementation of a parallel web scraping system designed to scrape product data from an e-commerce demo site. The system includes three versions:

Sequential Implementation
Parallel Implementation using Channels
Work-Stealing Implementation

Usage

To run the benchmark script and reproduce the results, use the following command:

python benchmark/benchmark.py

This script will execute each implementation (sequential, parallel channel-based, and work-stealing) 10 times, calculate the average execution time, and generate performance analysis.

Specific Implementation Execution

For examining specific implementations:

Sequential Implementation:

./scraper --seq

Parallel Implementation using Channels:

./scraper --chan --threads <numThreads>

Work-Stealing Implementation:

./scraper --workstealing --threads <numThreads>

Final Report

The final report summarizing the results and analysis can be found at:

benchmark/final_report.pdf

Project Structure

main.go: The main entry point for the program.
sequential/sequential.go: Contains the sequential implementation.
parallel/parallel.go: Contains the parallel implementation using channels.
workstealing/workstealing.go: Contains the work-stealing implementation.
util/util.go: Contains utility functions for downloading images and saving to CSV.
benchmark/benchmark.py: The benchmark script for running and analyzing the implementations.
benchmark/speedup_graph.png: The generated speedup graph.
benchmark/benchmark_results.txt: The raw results of the benchmark runs.
benchmark/final_report.pdf: The final report in PDF format.

System Requirements

Go: Ensure you have Go installed on your system.
Python: For running the benchmark script and generating graphs.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
proj3		proj3
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Final Project: Parallel Web Scraping System

Overview

Usage

Specific Implementation Execution

Final Report

Project Structure

System Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jeffwzhong1994/Parallel_Programming_final_project

Folders and files

Latest commit

History

Repository files navigation

Final Project: Parallel Web Scraping System

Overview

Usage

Specific Implementation Execution

Final Report

Project Structure

System Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages