Skip to content

allan-gadelha/ibm-python-project-data-engineering-practice

Repository files navigation

Practice Codes from IBM course "Python Project for Data Engineering" in Coursera

This repository contains practical code developed during the "Python Project for Data Engineering" course by IBM on Coursera. The course covers a variety of essential topics for data engineering using Python.

Organization

ETL Process Practice

The ETL Process Practice directory contains practice code for ETL processes. It includes the following components:

  • data/: Directory containing all data used for the ETL process.
  • log/: Directory containing log file
    • log_file.txt: Txt file for logging
  • output/: Directory containing the output file generated by the "etl_practice.py" script.
    • transformed_data.csv: CSV with all the data already transformed
  • script/: Directory containing Python script(s) for performing the ETL process.
    • etl_practice.py: Python script for ETL process practice.

Web Scraping Practice

The Web Scraping Practice directory contains practice code for Web Scraping processes. It includes the following components:

  • output/: Directory containing the output file generated by the "webscraping_movies.py" script.
    • Movies.db: Database with all extracted data.
    • top_50_films.csv: CSV with all the extracted data.
  • script/: Directory containing Python script(s) for performing the Web Scraping process.
    • webscraping_movies.py: Python script for ETL process practice.

Accessing Databases using Python Practice

The Accessing Databases with Python Practice directory contains practice code for Accessing Databases with Python processes. It includes the following components:

  • data/: Directory containing the csv file that we'll use to load the contents on the database.
  • database/: Directory containing the created database
  • scripts/: Directory containing Python script for performing the Accessing Databases with Python process

Practice Project

The Practice Project directory contains practice code for the Practice Project. It includes the following components:

  • database/: Directory containing the SQLite database file 'World_Economies.db'.
  • logs/ Directory containing the log from the project
  • output/: Directory containing the output JSON file 'Countries_by_GDP.json'.
  • scripts/: Directory containing the Python script(s) for the project.

Final Project

The Final Project directory contains practice code for the Final Project of the course. It includes the following components:

  • data/: Directory containing the csv file that we'll use.
  • database/: Directory containing the SQLite database file 'Banks.db'.
  • log/ Directory containing the log from the project
  • output/: Directory containing the output csv file 'Largest_banks_data.csv'.
  • script/: Directory containing the Python script(s) for the project.

Requirements

Make sure you have the following Python packages installed:

  • Pandas
  • Glob
  • XML
  • Datetime
  • Sqlite3
  • BeautifulSoup

About

Practice Codes from IBM course "Python Project for Data Engineering" in Coursera

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages