Skip to content

This project contains two web scrapping methods: scrapping a webpage and scrappping a web page table for analysis purposes

Notifications You must be signed in to change notification settings

Kokolipa/Web_scrapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scrapping

Project description:

This project outlines two scrapping methods:

  1. Scrapping a web page (identifying HTML elements and extracting the content) to extract all the articles headers and teasers from mars news.
    • Creating a jsonified representation of the data to ease information sharing with others (title_preview.txt).
  2. Scrapping a web page table for analysis purposes mars data
    • Extracting the table info from the URL address above and using Pandas and Matplotlib to summarise the analysis results and answer the following questions:
      • How many months exist on Mars?
      • How many Martian (and not Earth) days worth of data exist in the scraped dataset?
      • What are the coldest and the warmest months on Mars (at the location of Curiosity)?
      • Which months have the lowest and the highest atmospheric pressure on Mars?
      • About how many terrestrial (Earth) days exist in a Martian year?

Libraries used:

  1. Splinter
  2. BeautifulSoup
  3. Pandas
  4. NumPy
  5. Matplotlib

Analysis Images

Average temperature by month

avg_temp

Identify the coldest and hottest months in Curiosity's location

avg_temp

Average pressure by month

avg_temp

Terrestrial (earth) days in Martian year? - Visual Representation

avg_temp

Folder structure

.
├── SurfsUp
│   ├── Images    
│   |   ├── Fig_1.png
│   |   ├── Fig_2.png
│   |   ├── Fig_3.png               
│   |   ├── Fig_4.png               
│   ├── mars_scrapping
│   |   ├── part_1_mars_news.ipynb               
│   |   ├── part_2_mars_weather.ipynb              
│   ├── output
│   |   ├── df.csv             
│   |   ├── title_preview.txt              
|___.gitignore               
|___README.md

About

This project contains two web scrapping methods: scrapping a webpage and scrappping a web page table for analysis purposes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published