Skip to content

Load data in the Singer format to any file type you wish.

License

Notifications You must be signed in to change notification settings

taliawsmiley/target-universal-file

Repository files navigation

target-universal-file

target-universal-file is a Singer target built with the Meltano SDK designed to load data to any file system (local, GCP, AWS, etc.) in any format (.csv, .json, .parquet, .xlsx, etc.).

Installation

Install from PyPI:

pipx install target-universal-file

Install from GitHub:

pipx install git+https://github.com/taliawsmiley/target-universal-file.git@main

Add to your Meltano project from the Meltano Hub:

meltano add loader target-universal-file

Configuration

Accepted Config Options

Setting Required Default Description
protocol True None The protocol to connect to the file system. See: Protocols.
file_type True None The file type to use when writing data. See: File Types.
path True None The path on the file system where data will be written.
file_name_format True {stream_name}.{file_type} The format for how to store data. {stream_name} will be replaced with the name of the stream and {file_type} will be replaced with the file type.
protocol_options False None Extended options for the protocol specified in the protocol config. Provide this value as an object with key-value pairs as described by the protocol.See: Protocols.
file_type_options False None Extended options for the file type specified in the file_type config. Provide this value as an object with key-value pairs as described by the file type. See: File Types.
add_record_metadata False None Whether to add metadata fields to records.
load_method False TargetLoadMethods.APPEND_ONLY The method to use when loading data into the destination. append-only will always write all input records whether that records already exists or not. upsert will update existing records and insert new records. overwrite will delete all existing records and insert all input records.
batch_size_rows False None Maximum number of rows in each batch.
validate_records False 1 Whether to validate the schema of the incoming streams.
stream_maps False None Config object for stream maps capability. For more information check out Stream Maps.
stream_map_config False None User-defined config values to be used within map expressions.
faker_config False None Config for the Faker instance variable fake used within map expressions. Only applicable if the plugin specifies faker as an additional dependency (through the singer-sdk faker extra or directly).
faker_config.seed False None Value to seed the Faker generator for deterministic output: https://faker.readthedocs.io/en/master/#seeding-the-generator
faker_config.locale False None One or more LCID locale strings to produce localized output for: https://faker.readthedocs.io/en/master/#localization
flattening_enabled False None 'True' to enable schema flattening and automatically expand nested properties.
flattening_max_depth False None The max depth to flatten schemas.

A full list of supported settings and capabilities for target-universal-file is available by running:

target-universal-file --about

Environment Variables

target-universal-file will automatically import a .env file if --config=ENV is provided,.

Protocols

target-universal-file uses fsspec to easily connect to a wide variety of external file systems. Determine the file system to connect to by specifying a protocol in target configuration.

The supported protocols are:

  1. local: For writing data to a local file.
  2. gcs: For connecting to a bucket on Google Cloud.
  3. s3: For connecting to a bucket on Amazon Web Services.

Local

Protocol: local
Description: For writing data to a local file.
Protocol Options: N/A

Local paths support both relative ./folder and absolute /folder reference.

GCS

Protocol: gcs
Description: For connecting to a bucket on Google Cloud.
Protocol Options: token

The suggested method of authenticating to GCS is to log in with gcloud and copy the provided credentials, such as from ~/.config/gcloud/application_default_credentials.json.

Steps to authenticate:

  1. Install the gcloud CLI.
  2. Run gcloud auth application-default login.
  3. Copy the output file from ~/.config/gcloud/application_default_credentials.json and provide it as an environment variable to protocol_options.token.

Other options are available for authentication as described by the gcsfs documention.

S3

Protocol: local
Description: For connecting to a bucket on Amazon Web Services.
Protocol Options: key, secret

Connecting to an S3 bucket anonymously is not supported.

File Types

target-universal-file supports a variety of data formats. Determine the data format to use by specifying a file_type in target configuration.

The supported file types are:

  1. csv: For Comma-Separated Value files.
  2. jsonl: For JSON Lines files.
  3. parquet: For Apache Parquet files.
  4. xlsx: For Microsoft Excel files.

Usage Example

target-universal-file --version
target-universal-file --help
# Test using the "smoke test" sample tap from Meltano
tap-smoke-test | target-universal-file --config /path/to/target-universal-file-config.json

About

Load data in the Singer format to any file type you wish.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages