Skip to content

Latest commit

 

History

History
36 lines (23 loc) · 938 Bytes

File metadata and controls

36 lines (23 loc) · 938 Bytes

Hugging Face Dataset Exporter

This script exports data from a Postgres database to a Hugging Face dataset in Parquet format.

Setup

  1. Install dependencies:

    Navigate to this directory and install the required Python packages.

    pip install -r requirements.txt
  2. Set environment variables:

    The script uses environment variables for database credentials. You can set them in your shell or use a .env file.

    export DB_USER="your_db_user"
    export DB_PASSWORD="your_db_password"
    export DB_HOST="localhost"
    export DB_PORT="5432"
    export DB_NAME="your_db_name"

Usage

Run the script from the root of the repository:

python export.py

The script will create a directory at the specified output path containing the dataset in Parquet format. If --output_dir is not provided, it will save to dataset in the current working directory.