Works with v1.0+
Using the File Data Connector, you can create datasets from files. This enables you to easily query locally accessible data stored in various file formats, including CSV, Parquet, and Markdown.
- Spice.ai CLI installed (see Getting Started)
Follow these steps to use local Parquet files as a dataset.
Download a sample Parquet file using the following command:
curl https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet -o yellow_tripdata_2024-01.parquetcat <<EOF > spicepod.yaml
version: v1
kind: Spicepod
name: file_recipe
datasets:
- name: yellow_taxis
from: file://yellow_tripdata_2024-01.parquet
EOFRun the Spice runtime to load the dataset:
spice runOpen a new terminal and start the SQL REPL:
spice sqlThen execute a query on the yellow_taxis dataset.
select avg(passenger_count) from yellow_taxis;You should see the following output:
sql> select avg(passenger_count) from yellow_taxis;
+-----------------------------------+
| avg(yellow_taxis.passenger_count) |
+-----------------------------------+
| 1.3392808966805005 |
+-----------------------------------+
Time: 0.0253585 seconds. 1 rows.
Stop the Spice runtime and close the SQL REPL when done.
Remove the created files:
# Remove the spicepod.yaml
rm spicepod.yaml
# Remove the Parquet file
rm yellow_tripdata_2024-01.parquetFollow these steps to use local Markdown files as a dataset.
Download sample Markdown files using the following script:
base_url="https://raw.githubusercontent.com/spiceai/docs/refs/heads/trunk/website/docs/components/data-connectors"
files=(
"clickhouse.md"
"databricks.md"
"debezium.md"
"delta-lake.md"
)
for file in "${files[@]}"; do
curl -O "$base_url/$file"
doneCreate a spicepod.yaml file to define your dataset:
cat <<EOF > spicepod.yaml
version: v1
kind: Spicepod
name: file_recipe_markdown
datasets:
- name: docs
from: file:./
params:
file_format: md
EOFRun the Spice runtime to load the dataset:
spice runOpen a new terminal and start the SQL REPL:
spice sqlThen execute a query on the docs dataset.
select location from docs;Expected output:
+---------------------------------------------+
| location |
+---------------------------------------------+
| Users/lukim/dev/cookbook/file/debezium.md |
| Users/lukim/dev/cookbook/file/databricks.md |
| Users/lukim/dev/cookbook/file/README.md |
| Users/lukim/dev/cookbook/file/clickhouse.md |
| Users/lukim/dev/cookbook/file/delta-lake.md |
+---------------------------------------------+
Stop the Spice runtime and close the SQL REPL when done.
Remove the created files:
# Remove the spicepod.yaml
rm spicepod.yaml
# Remove the downloaded Markdown files
rm *.mdFor more information, see the File Data Connector documentation.