A simple web scraping project built for educational purposes.
It demonstrates how to extract content from a website and save the data into various formats including CSV, PDF, HTML, JPG, and TXT.
This project was created as part of a college assignment to practice web scraping and data extraction using Python.
It includes:
- A Python script (
scrab.py) that performs the scraping. - Output files demonstrating different formats:
.csvfor structured tabular data.pdffor document-style output.htmlfor saving the web page structure.jpgfor saving images.txtfor plain text extraction
Built using:
- Python 3
- BeautifulSoup4
- Requests
- Pandas (for CSV handling)
- fpdf or similar library (for PDF generation)
| File | Purpose |
|---|---|
scrab.py |
Main Python script for web scraping |
scraped_data.csv |
Output of scraped data in CSV format |
scraped_document.pdf |
Output of scraped data in PDF format |
scraped_html.html |
Full saved HTML of the scraped page |
scraped_image.jpg |
Example of an image extracted |
scraped_text.txt |
Plain text extracted from the page |
-
Clone the repository
git clone https://github.com/Noursalem2005/Web-Scraping-Project.git cd Web-Scraping-Project