Skip to content

Latest commit

 

History

History
104 lines (80 loc) · 5.45 KB

File metadata and controls

104 lines (80 loc) · 5.45 KB
description
FTP/SFTP Data Connector Documentation

FTP/SFTP Data Connector

FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) are network protocols used for transferring files between a client and server, with FTP being less secure and SFTP providing encrypted file transfer over SSH.

The FTP/SFTP Data Connector enables federated/accelerated SQL query across supported file formats stored in FTP/SFTP servers.

datasets:
  - from: sftp://remote-sftp-server.com/path/to/folder/
    name: my_dataset
    params:
      file_format: csv
      sftp_port: 22
      sftp_user: my-sftp-user
      sftp_pass: ${secrets:my_sftp_password}

Configuration

from

The from field takes one of two forms: ftp://<host>/<path> or sftp://<host>/<path> where <host> is the host to connect to and <path> is the path to the file or directory to read from.

If a folder is provided, all child files will be loaded.

name

The dataset name. This will be used as the table name within Spice.

Example:

datasets:
  - from: sftp://remote-sftp-server.com/path/to/folder/
    name: cool_dataset
    params: ...
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+

params

FTP

Parameter Name Description
file_format Specifies the data file format. Required if the format cannot be inferred by from the from path. See Object Store File Formats.
ftp_port Optional, specifies the port of the FTP server. Default is 21. E.g. ftp_port: 21
ftp_user The username for the FTP server. E.g. ftp_user: my-ftp-user
ftp_pass The password for the FTP server. Use the secret replacement syntax to load the password from a secret store, e.g. ${secrets:my_ftp_pass}.
client_timeout Optional. Specifies timeout for FTP connection. E.g. client_timeout: 30s. When not set, no timeout will be configured for FTP client.
hive_partitioning_enabled Optional. Enable partitioning using hive-style partitioning from the folder structure. Defaults to false

SFTP

Parameter Name Description
file_format Specifies the data file format. Required if the format cannot be inferred by from the from path. See Object Store File Formats.
sftp_port Optional, specifies the port of the SFTP server. Default is 22. E.g. sftp_port: 22
sftp_user The username for the SFTP server. E.g. sftp_user: my-sftp-user
sftp_pass The password for the SFTP server. Use the secret replacement syntax to load the password from a secret store, e.g. ${secrets:my_sftp_pass}.
client_timeout Optional. Specifies timeout for SFTP connection. E.g. client_timeout: 30s. When not set, no timeout will be configured for SFTP client.
hive_partitioning_enabled Optional. Enable partitioning using hive-style partitioning from the folder structure. Defaults to false

Examples

Connecting to FTP

- from: ftp://remote-ftp-server.com/path/to/folder/
  name: my_dataset
  params:
    file_format: csv
    ftp_user: my-ftp-user
    ftp_pass: ${secrets:my_ftp_password}
    hive_partitioning_enabled: false

Connecting to SFTP

- from: sftp://remote-sftp-server.com/path/to/folder/
  name: my_dataset
  params:
    file_format: csv
    sftp_port: 22
    sftp_user: my-sftp-user
    sftp_pass: ${secrets:my_sftp_password}
    hive_partitioning_enabled: false