This script fetches data from a Kaggle endpoint, extracts it from a zip file, and populates a PostgreSQL database with the extracted data.
- Python 3.x installed on your machine.
- PostgreSQL server installed and running locally or accessible remotely.
- Necessary Python packages installed (
requests
,pandas
,sqlalchemy
).
-
Clone the repository or download the script file.
-
Install the required Python dependencies using pip:
pip install requests pandas sqlalchemy
-
Ensure PostgreSQL server is running and configure connection parameters (
DB_USER
,DB_PASSWORD
,DB_HOST
,DB_PORT
,DB_NAME
) in the script according to your setup.
-
Run the script
populate.py
using Python:python populate.py
-
The script will:
- Fetch the data from the Kaggle endpoint specified (
KAGGLE_ZIP_URL
). - Extract the data from the downloaded zip file into memory.
- Populate a PostgreSQL database named
Email_db
(change as per your database name) with the extracted data assuming it's in CSV format.
- Fetch the data from the Kaggle endpoint specified (
-
fetch_and_extract_data(): Fetches the data from
KAGGLE_ZIP_URL
, extracts it into memory from the zip file, and returns it. -
populate_db(data): Establishes a connection to the PostgreSQL database using SQLAlchemy, reads the extracted data (assuming CSV format), and populates a table named
ClientData
(replace with your desired table name) with the data.
-
Database Connection Parameters (
DB_USER
,DB_PASSWORD
,DB_HOST
,DB_PORT
,DB_NAME
): Modify these variables in the script to match your PostgreSQL database connection details. -
KAGGLE_ZIP_URL: Replace this variable with the actual Kaggle endpoint URL from where you want to fetch the data.
-
Ensure the PostgreSQL user (
DB_USER
) has appropriate privileges to create and modify tables in the specified database (DB_NAME
). -
This script assumes the fetched data is in CSV format. Adjust the
pd.read_csv()
function if your data format is different.
- The script includes basic error handling to manage exceptions during data fetching, extraction, and database population. Errors will be printed to the console with appropriate messages.