Skip to content

Commit d5c4423

Browse files
ewgeniussgrebnov
andauthored
Add Snowflake DML recipe for HTTP API ingestion pipeline (#389)
* Add Snowflake DML recipe for HTTP API ingestion pipeline * Update snowflake/dml/README.md Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> * update snowflake dml cookbook * cleanup * cleanup spicepod example and instructions --------- Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
1 parent 8928f71 commit d5c4423

2 files changed

Lines changed: 203 additions & 0 deletions

File tree

snowflake/dml/README.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
# Snowflake DML — HTTP API Ingestion Pipeline
2+
3+
Works with `v2.0+`
4+
5+
> This recipe demonstrates how to build a mini ingestion pipeline with Spice: fetch data from a public HTTP API ([TVMaze](https://www.tvmaze.com/api)), transform it with SQL, and write it into a writable Snowflake table using **Snowflake DML** (`INSERT`).
6+
7+
## Pre-requisites
8+
9+
- Spice `v2.0+`[Install Spice](https://docs.spiceai.org/getting-started/installation)
10+
- [A Snowflake account](https://signup.snowflake.com/)
11+
12+
## Step 1. Create the destination table in Snowflake
13+
14+
Sign in to your Snowflake account, open a worksheet, and run:
15+
16+
```sql
17+
CREATE DATABASE IF NOT EXISTS SPICE_DEMO;
18+
USE DATABASE SPICE_DEMO;
19+
USE SCHEMA PUBLIC;
20+
21+
CREATE TABLE IF NOT EXISTS TV_SHOWS (
22+
id INTEGER NOT NULL,
23+
name VARCHAR(255),
24+
type VARCHAR(100),
25+
language VARCHAR(100),
26+
status VARCHAR(100),
27+
runtime INTEGER,
28+
premiered DATE,
29+
ended DATE,
30+
rating_average FLOAT,
31+
PRIMARY KEY (id)
32+
);
33+
```
34+
35+
## Step 2. Configure Snowflake credentials
36+
37+
```bash
38+
spice login snowflake -a <account-identifier> -u <username> -p <password>
39+
```
40+
41+
This creates a `.env` file:
42+
43+
```bash
44+
SPICE_SNOWFLAKE_ACCOUNT=<account-identifier>
45+
SPICE_SNOWFLAKE_USERNAME=<username>
46+
SPICE_SNOWFLAKE_PASSWORD=<password>
47+
```
48+
49+
## Step 3. Start Spice
50+
51+
```bash
52+
spice run
53+
```
54+
55+
Expected output:
56+
57+
```
58+
Spice.ai runtime starting...
59+
...
60+
2026-05-10T22:03:07.236786Z INFO runtime::init::dataset: Dataset tv_shows initializing...
61+
2026-05-10T22:03:07.236915Z INFO runtime::init::dataset: Dataset tvmaze_shows_raw initializing...
62+
2026-05-10T22:03:07.237083Z INFO runtime::init::worker: Loading worker [ingest_tvmaze_shows]...
63+
2026-05-10T22:03:07.237202Z INFO runtime::init::worker: Worker [ingest_tvmaze_shows] loaded, ready for use
64+
2026-05-10T22:03:07.237660Z INFO runtime::init::worker: Scheduler for worker [ingest_tvmaze_shows] created successfully
65+
2026-05-10T22:03:07.246890Z INFO runtime::init::dataset: Dataset tvmaze_shows_raw registered (https://api.tvmaze.com/shows), results cache enabled. duration_ms=0
66+
2026-05-10T22:03:07.250581Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
67+
2026-05-10T22:03:10.209492Z INFO runtime::init::dataset: Dataset tv_shows_ro registered (snowflake:SPICE_DEMO.PUBLIC."TV_SHOWS"), results cache enabled. duration_ms=0
68+
2026-05-10T22:03:11.936902Z INFO runtime::init::dataset: Dataset tv_shows registered (snowflake:SPICE_DEMO.PUBLIC."TV_SHOWS"), results cache enabled. duration_ms=1865
69+
2026-05-10T22:03:12.039156Z INFO runtime: All components are loaded. Spice runtime is ready!
70+
2026-05-10T22:03:37.238925Z INFO runtime::init::dataset: Dataset load summary (after 30s): 4/4 ready, 0 unhealthy, 0 still initializing.
71+
```
72+
73+
## Step 4. Wait for the ingestion worker to run
74+
75+
The `ingest_tvmaze_shows` worker runs automatically on the cron schedule. You can inspect its run history in the SQL REPL:
76+
77+
```bash
78+
spice sql
79+
```
80+
81+
```sql
82+
SELECT task, start_time, end_time, error_message
83+
FROM runtime.task_history
84+
WHERE task = 'scheduled_worker'
85+
ORDER BY start_time DESC
86+
LIMIT 5;
87+
```
88+
89+
```
90+
+------------------+--------------------------------+--------------------------------+
91+
| task | start_time | end_time |
92+
| varchar | timestamp[ns] (UTC) | timestamp[ns] (UTC) |
93+
+------------------+--------------------------------+--------------------------------+
94+
| scheduled_worker | 2026-05-10T22:02:00.776514814Z | 2026-05-10T22:02:01.188614794Z |
95+
| scheduled_worker | 2026-05-10T22:01:00.410603700Z | 2026-05-10T22:01:00.776281757Z |
96+
| scheduled_worker | 2026-05-10T22:00:00.002615777Z | 2026-05-10T22:00:02.409577328Z |
97+
+------------------+--------------------------------+--------------------------------+
98+
99+
Time: 0.014705856 seconds. 3 rows.
100+
```
101+
102+
## Step 5. Query the data
103+
104+
Verify rows landed in Snowflake:
105+
106+
```sql
107+
SELECT count(*) FROM tv_shows;
108+
```
109+
110+
```
111+
+----------+
112+
| count(*) |
113+
| int64 |
114+
+----------+
115+
| 240 |
116+
+----------+
117+
118+
Time: 0.582791267 seconds. 1 rows.
119+
```
120+
121+
> On subsequent runs, the worker only inserts shows not already present in Snowflake, so the count grows incrementally as TVMaze adds new shows.
122+
123+
Top-rated shows:
124+
125+
```sql
126+
SELECT "NAME" AS show_name, "STATUS" AS status, "LANGUAGE" AS language, "RATING_AVERAGE" AS rating_average
127+
FROM tv_shows
128+
WHERE "RATING_AVERAGE" IS NOT NULL
129+
ORDER BY "RATING_AVERAGE" DESC
130+
LIMIT 10;
131+
```
132+
133+
```
134+
+----------------------+---------+----------+----------------+
135+
| show_name | status | language | rating_average |
136+
| varchar | varchar | varchar | float64 |
137+
+----------------------+---------+----------+----------------+
138+
| Breaking Bad | Ended | English | 9.2 |
139+
| Game of Thrones | Ended | English | 8.9 |
140+
| Firefly | Ended | English | 8.9 |
141+
| The Wire | Ended | English | 8.9 |
142+
| Stargate Atlantis | Ended | English | 8.8 |
143+
| Death Note | Ended | Japanese | 8.8 |
144+
| Stargate SG·1 | Ended | English | 8.8 |
145+
| Rick and Morty | Running | English | 8.8 |
146+
| Person of Interest | Ended | English | 8.8 |
147+
| Battlestar Galactica | Ended | English | 8.7 |
148+
+----------------------+---------+----------+----------------+
149+
150+
Time: 0.678913675 seconds. 10 rows.
151+
```
152+
153+
## Learn More
154+
155+
- [Snowflake Data Connector](https://spiceai.org/docs/components/data-connectors/snowflake)
156+
- [HTTP(s) Data Connector](https://docs.spiceai.org/components/data-connectors/https)
157+
- [Spice Workers](https://docs.spiceai.org/components/workers)
158+
- [TVMaze API](https://www.tvmaze.com/api)

snowflake/dml/spicepod.yaml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
version: v1
2+
kind: Spicepod
3+
name: snowflake-dml
4+
5+
datasets:
6+
# Source: TVMaze public HTTP API — paginated list of shows
7+
- from: https://api.tvmaze.com/shows
8+
name: tvmaze_shows_raw
9+
params:
10+
file_format: json
11+
client_timeout: 30s
12+
pagination: enabled
13+
pagination_query_params: "page={page}"
14+
pagination_page_size: "250"
15+
pagination_max_pages: "2" # ~500 shows; adjust as needed
16+
17+
# Destination: writable Snowflake table
18+
- from: snowflake:SPICE_DEMO.PUBLIC."TV_SHOWS"
19+
name: tv_shows
20+
access: read_write
21+
params:
22+
snowflake_role: accountadmin
23+
snowflake_warehouse: COMPUTE_WH
24+
snowflake_username: ${secrets:SPICE_SNOWFLAKE_USERNAME}
25+
snowflake_account: ${secrets:SPICE_SNOWFLAKE_ACCOUNT}
26+
snowflake_password: ${secrets:SPICE_SNOWFLAKE_PASSWORD}
27+
28+
workers:
29+
- name: ingest_tvmaze_shows
30+
description: "Periodic pipeline: fetch new TVMaze shows from HTTP API and insert into Snowflake"
31+
cron: "* * * * *"
32+
sql: |
33+
INSERT INTO tv_shows ("ID", "NAME", "TYPE", "LANGUAGE", "STATUS", "RUNTIME", "PREMIERED", "ENDED", "RATING_AVERAGE")
34+
SELECT
35+
json_get_int(content, 'id') AS "ID",
36+
json_get_str(content, 'name') AS "NAME",
37+
json_get_str(content, 'type') AS "TYPE",
38+
json_get_str(content, 'language') AS "LANGUAGE",
39+
json_get_str(content, 'status') AS "STATUS",
40+
json_get_int(content, 'runtime') AS "RUNTIME",
41+
CAST(json_get_str(content, 'premiered') AS DATE) AS "PREMIERED",
42+
CAST(json_get_str(content, 'ended') AS DATE) AS "ENDED",
43+
json_get_float(json_get(content, 'rating'), 'average') AS "RATING_AVERAGE"
44+
FROM tvmaze_shows_raw
45+
WHERE json_get_int(content, 'id') NOT IN (SELECT "ID" FROM tv_shows);

0 commit comments

Comments
 (0)