target-universal-file is a Singer target built with the Meltano SDK designed to load data to any file system (local, GCP, AWS, etc.) in any format (.csv, .json, .parquet, .xlsx, etc.).
Install from PyPI:
pipx install target-universal-file
Install from GitHub:
pipx install git+https://github.com/taliawsmiley/target-universal-file.git@main
Add to your Meltano project from the Meltano Hub:
meltano add loader target-universal-file
Setting | Required | Default | Description |
---|---|---|---|
protocol | True | None | The protocol to connect to the file system. See: Protocols. |
file_type | True | None | The file type to use when writing data. See: File Types. |
path | True | None | The path on the file system where data will be written. |
file_name_format | True | {stream_name}.{file_type} | The format for how to store data. {stream_name} will be replaced with the name of the stream and {file_type} will be replaced with the file type. |
protocol_options | False | None | Extended options for the protocol specified in the protocol config. Provide this value as an object with key-value pairs as described by the protocol.See: Protocols. |
file_type_options | False | None | Extended options for the file type specified in the file_type config. Provide this value as an object with key-value pairs as described by the file type. See: File Types. |
add_record_metadata | False | None | Whether to add metadata fields to records. |
load_method | False | TargetLoadMethods.APPEND_ONLY | The method to use when loading data into the destination. append-only will always write all input records whether that records already exists or not. upsert will update existing records and insert new records. overwrite will delete all existing records and insert all input records. |
batch_size_rows | False | None | Maximum number of rows in each batch. |
validate_records | False | 1 | Whether to validate the schema of the incoming streams. |
stream_maps | False | None | Config object for stream maps capability. For more information check out Stream Maps. |
stream_map_config | False | None | User-defined config values to be used within map expressions. |
faker_config | False | None | Config for the Faker instance variable fake used within map expressions. Only applicable if the plugin specifies faker as an additional dependency (through the singer-sdk faker extra or directly). |
faker_config.seed | False | None | Value to seed the Faker generator for deterministic output: https://faker.readthedocs.io/en/master/#seeding-the-generator |
faker_config.locale | False | None | One or more LCID locale strings to produce localized output for: https://faker.readthedocs.io/en/master/#localization |
flattening_enabled | False | None | 'True' to enable schema flattening and automatically expand nested properties. |
flattening_max_depth | False | None | The max depth to flatten schemas. |
A full list of supported settings and capabilities for target-universal-file is available by running:
target-universal-file --about
target-universal-file will automatically import a .env
file if --config=ENV
is provided,.
target-universal-file uses fsspec to easily connect to a wide variety of external file systems. Determine the file system to connect to by specifying a protocol
in target configuration.
The supported protocols are:
local
: For writing data to a local file.gcs
: For connecting to a bucket on Google Cloud.s3
: For connecting to a bucket on Amazon Web Services.
Protocol: local
Description: For writing data to a local file.
Protocol Options: N/A
Local paths support both relative ./folder
and absolute /folder
reference.
Protocol: gcs
Description: For connecting to a bucket on Google Cloud.
Protocol Options: token
The suggested method of authenticating to GCS is to log in with gcloud and copy the provided credentials, such as from ~/.config/gcloud/application_default_credentials.json
.
Steps to authenticate:
- Install the gcloud CLI.
- Run
gcloud auth application-default login
. - Copy the output file from
~/.config/gcloud/application_default_credentials.json
and provide it as an environment variable toprotocol_options.token
.
Other options are available for authentication as described by the gcsfs
documention.
Protocol: local
Description: For connecting to a bucket on Amazon Web Services.
Protocol Options: key
, secret
Connecting to an S3 bucket anonymously is not supported.
target-universal-file supports a variety of data formats. Determine the data format to use by specifying a file_type
in target configuration.
The supported file types are:
csv
: For Comma-Separated Value files.jsonl
: For JSON Lines files.parquet
: For Apache Parquet files.xlsx
: For Microsoft Excel files.
target-universal-file --version
target-universal-file --help
# Test using the "smoke test" sample tap from Meltano
tap-smoke-test | target-universal-file --config /path/to/target-universal-file-config.json