Skip to content

EbukaNduka/kaggle_to_postgres

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Kaggle Data Fetch and PostgreSQL Population

This script fetches data from a Kaggle endpoint, extracts it from a zip file, and populates a PostgreSQL database with the extracted data.

Prerequisites

  • Python 3.x installed on your machine.
  • PostgreSQL server installed and running locally or accessible remotely.
  • Necessary Python packages installed (requests, pandas, sqlalchemy).

Setup

  1. Clone the repository or download the script file.

  2. Install the required Python dependencies using pip:

    pip install requests pandas sqlalchemy
  3. Ensure PostgreSQL server is running and configure connection parameters (DB_USER, DB_PASSWORD, DB_HOST, DB_PORT, DB_NAME) in the script according to your setup.

Usage

  1. Run the script populate.py using Python:

    python populate.py
  2. The script will:

    • Fetch the data from the Kaggle endpoint specified (KAGGLE_ZIP_URL).
    • Extract the data from the downloaded zip file into memory.
    • Populate a PostgreSQL database named Email_db (change as per your database name) with the extracted data assuming it's in CSV format.

Script Explanation

  • fetch_and_extract_data(): Fetches the data from KAGGLE_ZIP_URL, extracts it into memory from the zip file, and returns it.

  • populate_db(data): Establishes a connection to the PostgreSQL database using SQLAlchemy, reads the extracted data (assuming CSV format), and populates a table named ClientData (replace with your desired table name) with the data.

Configuration

  • Database Connection Parameters (DB_USER, DB_PASSWORD, DB_HOST, DB_PORT, DB_NAME): Modify these variables in the script to match your PostgreSQL database connection details.

  • KAGGLE_ZIP_URL: Replace this variable with the actual Kaggle endpoint URL from where you want to fetch the data.

Notes

  • Ensure the PostgreSQL user (DB_USER) has appropriate privileges to create and modify tables in the specified database (DB_NAME).

  • This script assumes the fetched data is in CSV format. Adjust the pd.read_csv() function if your data format is different.

Error Handling

  • The script includes basic error handling to manage exceptions during data fetching, extraction, and database population. Errors will be printed to the console with appropriate messages.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages