| description | Learn how to use Data Connector to query external data. |
|---|---|
| icon | database |
Data Connectors provide connections to databases, data warehouses, and data lakes for federated SQL queries and data replication.
Supported Data Connectors include:
| Name | Description | Protocol/Format |
|---|---|---|
databricks (mode: delta_lake) |
Databricks | S3/Delta Lake |
delta_lake |
Delta Lake | Delta Lake |
dremio |
Dremio | Arrow Flight |
duckdb |
DuckDB | Embedded |
github |
GitHub | GitHub API |
postgres |
PostgreSQL | |
s3 |
S3 | Parquet, CSV |
mysql |
MySQL | |
delta_lake |
Delta Lake | Delta Lake |
graphql |
GraphQL | JSON |
databricks (mode: spark_connect) |
Databricks | Spark Connect |
flightsql |
FlightSQL | Arrow Flight SQL |
mssql |
Microsoft SQL Server | Tabular Data Stream (TDS) |
snowflake |
Snowflake | Arrow |
spark |
Spark | Spark Connect |
spice.ai |
Spice.ai | Arrow Flight |
iceberg |
Apache Iceberg | Parquet |
abfs |
Azure BlobFS | Parquet, CSV |
clickhouse |
Clickhouse | |
debezium |
Debezium CDC | Kafka + JSON |
dynamodb |
DynamoDB | |
ftp, sftp |
FTP/SFTP | Parquet, CSV |
http, https |
HTTP(s) | Parquet, CSV |
sharepoint |
Microsoft SharePoint | Unstructured UTF-8 documents |
For data connectors that are object store compatible, if a folder is provided, the file format must be specified with params.file_format.
If a file is provided, the file format will be inferred, and params.file_format is unnecessary.
File formats currently supported are:
| Name | Parameter | Supported | Is Document Format |
|---|---|---|---|
| Apache Parquet | file_format: parquet |
✅ | ❌ |
| CSV | file_format: csv |
✅ | ❌ |
| Apache Iceberg | file_format: iceberg |
Roadmap | ❌ |
| JSON | file_format: json |
Roadmap | ❌ |
| Microsoft Excel | file_format: xlsx |
Roadmap | ❌ |
| Markdown | file_format: md |
✅ | ✅ |
| Text | file_format: txt |
✅ | ✅ |
file_format: pdf |
Alpha | ✅ | |
| Microsoft Word | file_format: docx |
Alpha | ✅ |
File formats support additional parameters in the params (like csv_has_header) described in File Formats
If a format is a document format, each file will be treated as a document, as per document support below.
{% hint style="info" %} Note Document formats in Alpha (e.g. pdf, docx) may not parse all structure or text from the underlying documents correctly. {% endhint %}
Spice follows PostgreSQL conventions for identifier handling: unquoted identifiers are normalized to lowercase. This applies to both the from field in dataset definitions and the name field used for SQL queries.
To reference a table or schema with mixed-case or uppercase characters in the from field, wrap each case-sensitive part in double quotes:
datasets:
# Without quoting — "ActionExecutions" is lowercased to "actionexecutions"
- from: postgres:my_schema.ActionExecutions
name: action_executions
# With quoting — case is preserved for the table name
- from: postgres:my_schema."ActionExecutions"
name: action_executions
# Quote each part individually as needed
- from: postgres:"MySchema"."ActionExecutions"
name: action_executionsEach dotted part of the identifier is treated independently — quote only the parts that require case preservation. For example, postgres:my_schema."ActionExecutions" preserves the case of ActionExecutions while my_schema is normalized to lowercase.
This applies to all federated database connectors where the from field references a table identifier (e.g. postgres, mysql, snowflake, databricks, clickhouse, mssql, duckdb, dremio, flightsql, spark, mongodb, oracle). Connectors that interpret from as a file path (e.g. s3, delta_lake, ftp, abfs) do not apply identifier normalization.
The name field controls the table name used in Spice SQL queries and follows the same lowercase normalization. To preserve case in the dataset name, wrap the value in double quotes. In YAML, use single quotes around the double-quoted value:
datasets:
- from: postgres:my_schema."ActionExecutions"
name: '"ActionExecutions"'-- Query using the preserved-case name
SELECT * FROM "ActionExecutions";If you don't need to preserve case in queries, a lowercase name works without quoting:
datasets:
- from: postgres:my_schema."ActionExecutions"
name: action_executionsSELECT * FROM action_executions;Dataset name quoting works regardless of connector type.