This repository contains practical code developed during the "Python Project for Data Engineering" course by IBM on Coursera. The course covers a variety of essential topics for data engineering using Python.
The ETL Process Practice directory contains practice code for ETL processes. It includes the following components:
- data/: Directory containing all data used for the ETL process.
- log/: Directory containing log file
- log_file.txt: Txt file for logging
- output/: Directory containing the output file generated by the "etl_practice.py" script.
- transformed_data.csv: CSV with all the data already transformed
- script/: Directory containing Python script(s) for performing the ETL process.
- etl_practice.py: Python script for ETL process practice.
The Web Scraping Practice directory contains practice code for Web Scraping processes. It includes the following components:
- output/: Directory containing the output file generated by the "webscraping_movies.py" script.
- Movies.db: Database with all extracted data.
- top_50_films.csv: CSV with all the extracted data.
- script/: Directory containing Python script(s) for performing the Web Scraping process.
- webscraping_movies.py: Python script for ETL process practice.
The Accessing Databases with Python Practice directory contains practice code for Accessing Databases with Python processes. It includes the following components:
- data/: Directory containing the csv file that we'll use to load the contents on the database.
- database/: Directory containing the created database
- scripts/: Directory containing Python script for performing the Accessing Databases with Python process
The Practice Project directory contains practice code for the Practice Project. It includes the following components:
- database/: Directory containing the SQLite database file 'World_Economies.db'.
- logs/ Directory containing the log from the project
- output/: Directory containing the output JSON file 'Countries_by_GDP.json'.
- scripts/: Directory containing the Python script(s) for the project.
The Final Project directory contains practice code for the Final Project of the course. It includes the following components:
- data/: Directory containing the csv file that we'll use.
- database/: Directory containing the SQLite database file 'Banks.db'.
- log/ Directory containing the log from the project
- output/: Directory containing the output csv file 'Largest_banks_data.csv'.
- script/: Directory containing the Python script(s) for the project.
Make sure you have the following Python packages installed:
- Pandas
- Glob
- XML
- Datetime
- Sqlite3
- BeautifulSoup