Skip to content

Commit 0d713dc

Browse files
committed
Promote dynamic-tables-guidance with v1.1.1 audit pass
Re-staged with v1.1.1 holistic prompt that adds stopping-point markers, correct INSTRUCTIONS.md sub-flow cross-refs, and drops invalid tool snowflake_object_search.
1 parent 49dd772 commit 0d713dc

5 files changed

Lines changed: 675 additions & 0 deletions

File tree

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Snowflake Skills License
2+
3+
© 2026 Snowflake Inc. All rights reserved.
4+
5+
LICENSE: Use of these materials (including all code, prompts, assets, files, and other components of these skills (collectively, “Skills”)) is governed by your agreement with Snowflake for the Service. If no separate agreement exists, use is governed by Snowflake’s Terms of Service (available at: https://www.snowflake.com/en/legal/terms-of-service/).
6+
7+
Your applicable agreement is referred to as the "Agreement." "Service" is as defined in the Agreement.
8+
9+
ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the contrary, you may not:
10+
11+
* Extract from the Service or retain copies of the Skills outside use with the Service;
12+
* Reproduce or copy the Skills , except for temporary copies created automatically during authorized use of the Service;
13+
* Create derivative works based on the Skills;
14+
* Distribute, sublicense, or transfer the Skills to any third party;
15+
* Make, offer to sell, sell, or import any inventions embodied in the Skills; nor,
16+
* Reverse engineer, decompile, or disassemble the Skills.
17+
18+
The receipt, viewing, or possession of the Skills does not convey or imply any license or right beyond those expressly granted above.
19+
20+
Snowflake retains all rights, title, and interest in the Skills, including all copyrights, trademarks, patents, and all other applicable intellectual property rights.
21+
22+
THE SKILLS ARE PROVIDED “AS IS,” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SKILLS OR THE USE OR OTHER DEALINGS IN THE SKILLS.
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
name: dynamic-tables-guidance
3+
title: Dynamic Tables Guidance
4+
summary: Decide when to use Dynamic Tables vs MVs, streams+tasks, or dbt, and design production-ready DT pipelines.
5+
description: "Use when choosing between Dynamic Tables and alternatives (materialized views, streams+tasks, dbt), designing multi-layer DT pipelines, debugging FULL-refresh fallback, or hardening DTs for production. Covers comparison matrices, decision flowcharts, common pitfalls, monitoring queries, and hybrid DT+task patterns. Triggers: dynamic tables guidance, when to use DT, DT vs MV, DT vs streams tasks, DT vs dbt, DT pitfalls, DT best practices, DT pipeline design, target lag, downstream lag, full refresh fallback."
6+
tools:
7+
- snowflake_sql_execute
8+
- Bash
9+
- Read
10+
- Write
11+
- Edit
12+
- Glob
13+
- Grep
14+
prompt: Should I use Dynamic Tables or streams+tasks for my CDC pipeline?
15+
language: en
16+
status: Published
17+
author: Snowflake Solutions Team
18+
type: snowflake
19+
---
20+
21+
# Dynamic Tables Guidance
22+
23+
## Overview
24+
25+
Dynamic Tables (DTs) are declarative, auto-refreshing materialized queries. You write a `SELECT`, set a `TARGET_LAG`, and Snowflake keeps results fresh on the schedule you pick. This skill helps you decide when DTs are the right tool, design multi-layer pipelines, and avoid the failure modes that catch real teams in production.
26+
27+
Use this skill when picking between DTs, materialized views, streams+tasks, or dbt — or when an existing DT pipeline is misbehaving (full-refresh fallback, lag drift, runaway cost).
28+
29+
## Quick Decision Flowchart
30+
31+
```
32+
Need to transform data in Snowflake?
33+
├─ Single table, accelerate queries? → Materialized View
34+
├─ Multi-step SQL pipeline, fresh data? → Dynamic Tables
35+
├─ Stream-static joins / append-only? → Custom Incremental DTs (PrPr)
36+
├─ Cross-warehouse portability or dbt tests? → dbt models
37+
├─ Procedural logic, IF/ELSE, API calls? → Streams + Tasks
38+
└─ Sub-15-second latency? → Streams + Tasks
39+
```
40+
41+
## Comparison Matrix
42+
43+
| Dimension | Dynamic Tables | Materialized Views | Streams + Tasks | dbt |
44+
|---|---|---|---|---|
45+
| Refresh | Target lag (15s+) | Auto, near-real-time | Manual schedule/trigger | Batch (`dbt run`) |
46+
| SQL support | Full SELECT, JOINs, windows | Single table only | Full + procedural | Full + Jinja |
47+
| Chaining | `TARGET_LAG = DOWNSTREAM` | No | Manual DAG | Ref graph |
48+
| Incremental | Built-in for supported ops | Auto | You write it | Manual `is_incremental` |
49+
| Side effects | None | None | Email, API, externals | None |
50+
| Cost | Your warehouse | Serverless | Your warehouse | Your warehouse |
51+
52+
**Rule of thumb:** Start with DTs. Reach for streams+tasks only when you need procedural logic, side effects, or sub-15s latency. Use dbt when you need its testing framework or cross-warehouse portability.
53+
54+
## Pipeline Pattern: Bronze → Silver → Gold
55+
56+
```sql
57+
-- Bronze: parse raw VARIANT
58+
CREATE DYNAMIC TABLE bronze_events
59+
TARGET_LAG = DOWNSTREAM
60+
WAREHOUSE = pipeline_wh
61+
AS SELECT
62+
record_content:event_id::STRING AS event_id,
63+
record_content:event_type::STRING AS event_type,
64+
record_content:timestamp::TIMESTAMP_NTZ AS event_ts
65+
FROM raw_events_topic;
66+
67+
-- Silver: business logic + joins
68+
CREATE DYNAMIC TABLE silver_purchases
69+
TARGET_LAG = DOWNSTREAM
70+
WAREHOUSE = pipeline_wh
71+
AS SELECT e.event_id, e.event_ts, p.product_name, p.category
72+
FROM bronze_events e
73+
JOIN products p ON e.payload:product_id::STRING = p.product_id
74+
WHERE e.event_type = 'purchase';
75+
76+
-- Gold: only the leaf has a time-based lag
77+
CREATE DYNAMIC TABLE gold_hourly_sales
78+
TARGET_LAG = '5 minutes'
79+
WAREHOUSE = pipeline_wh
80+
AS SELECT DATE_TRUNC('hour', event_ts) AS sales_hour, category,
81+
COUNT(*) AS order_count
82+
FROM silver_purchases
83+
GROUP BY 1, 2;
84+
```
85+
86+
**Key rule:** Only the leaf DT has a time-based `TARGET_LAG`. Intermediates use `DOWNSTREAM` so Snowflake derives their lag from the leaf.
87+
88+
## Monitoring Essentials
89+
90+
```sql
91+
-- Health check
92+
SELECT name, scheduling_state, last_completed_refresh_state,
93+
refresh_mode, time_within_target_lag_ratio
94+
FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLES())
95+
ORDER BY time_within_target_lag_ratio ASC;
96+
97+
-- Recent refresh history
98+
SELECT name, state, refresh_action,
99+
DATEDIFF('second', refresh_start_time, refresh_end_time) AS duration_sec
100+
FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY(
101+
NAME_PREFIX => '<db>.<schema>'))
102+
ORDER BY refresh_start_time DESC LIMIT 20;
103+
104+
-- Errors only
105+
SELECT name, state, state_message
106+
FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY(
107+
NAME_PREFIX => '<db>.<schema>', ERROR_ONLY => TRUE));
108+
```
109+
110+
For account-wide DT cost, query `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` filtered to refresh queries.
111+
112+
## Common Mistakes
113+
114+
- **`SELECT *` in DT definition** — breaks incremental refresh on schema changes. Always use explicit column lists.
115+
- **Time-based lag on every layer** — only the leaf should have a time-based lag. Use `TARGET_LAG = DOWNSTREAM` on intermediates.
116+
- **Change tracking off on base tables** — DTs require `CHANGE_TRACKING = TRUE` for incremental refresh. Check with `SHOW TABLES`.
117+
- **Falling back to FULL refresh silently** — check `refresh_mode_reason` if `refresh_mode` shows `FULL` when you expected `INCREMENTAL`.
118+
- **Missing PRIMARY KEY RELY** — without it, `INSERT OVERWRITE` reprocesses everything downstream and you lose incremental chains.
119+
- **DISTINCT/UNION fanout** — these operators force full refresh. Refactor with `QUALIFY ROW_NUMBER()` or `UNION ALL` where possible.
120+
- **Sharing one warehouse with interactive queries** — DT refreshes will compete with user queries. Use a dedicated warehouse.
121+
- **No `INITIALIZATION_WAREHOUSE` for big initial loads** — first refresh on large DTs can OOM a small warehouse. Set a larger init WH, then unset.
122+
123+
## Production Readiness Checklist
124+
125+
- [ ] Explicit column lists (no `SELECT *`)
126+
- [ ] `CHANGE_TRACKING = TRUE` on all base tables
127+
- [ ] Intermediates use `TARGET_LAG = DOWNSTREAM`
128+
- [ ] Leaf target lag ≥ all upstream lags
129+
- [ ] `refresh_mode` is `INCREMENTAL` (verify `refresh_mode_reason`)
130+
- [ ] Dedicated warehouse for DT refreshes
131+
- [ ] Monitoring on `time_within_target_lag_ratio > 0.95`
132+
- [ ] Alerting on refresh failures
133+
- [ ] `INITIALIZATION_WAREHOUSE` set for large initial loads
134+
135+
## Hybrid Pattern: DT + Task for Side Effects
136+
137+
```sql
138+
CREATE STREAM gold_metrics_stream ON DYNAMIC TABLE gold_metrics;
139+
140+
CREATE TASK notify_on_refresh
141+
WAREHOUSE = ops_wh
142+
WHEN SYSTEM$STREAM_HAS_DATA('gold_metrics_stream')
143+
AS
144+
BEGIN
145+
LET change_count INT := (SELECT COUNT(*) FROM gold_metrics_stream);
146+
CALL SYSTEM$SEND_EMAIL('team@co.com', 'Metrics Updated',
147+
change_count || ' rows changed');
148+
CREATE OR REPLACE TEMP TABLE _consume AS SELECT * FROM gold_metrics_stream;
149+
END;
150+
```
151+
152+
## Workflow
153+
154+
1. **Assess fit** — run the decision flowchart. If DTs aren't the right tool, stop here.
155+
2. **Pick refresh mode**`AUTO` (default), `INCREMENTAL` (force, fail if ineligible), `FULL`, or `CUSTOM_INCREMENTAL` (PrPr).
156+
3. **Design layers** — Bronze→Silver→Gold with `DOWNSTREAM` on intermediates.
157+
4. **Harden** — apply the production checklist.
158+
159+
⚠️ STOPPING POINT: Before running `CREATE OR REPLACE DYNAMIC TABLE` against existing pipelines, show the user the planned DDL and confirm. Replacing a DT triggers a full reload and may invalidate downstream incremental chains.
160+
161+
⚠️ STOPPING POINT: Before applying `ALTER DYNAMIC TABLE ... SUSPEND` or `DROP DYNAMIC TABLE`, confirm with the user — downstream DTs depending on the target will stop refreshing.
162+
163+
## Stopping Points
164+
165+
- Workflow Step 4 — confirm DDL before `CREATE OR REPLACE DYNAMIC TABLE` on existing pipelines
166+
- Workflow Step 4 — confirm before `ALTER ... SUSPEND` or `DROP DYNAMIC TABLE`
167+
168+
## References
169+
170+
- `references/pitfalls-and-pks.md` — full pitfall deep-dive plus `PRIMARY KEY RELY`, `IMMUTABLE WHERE`, `BACKFILL FROM`
171+
- `references/custom-incremental.md` — Custom Incremental DTs (PrPr) syntax and patterns
172+
- `references/dcm-for-dts.md` — Database Change Management for git-native DT deployment
173+
- Built-in Cortex Code skill: `dynamic-tables`
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# Custom Incremental Dynamic Tables (Private Preview)
2+
3+
Custom incremental DTs let you define refresh logic using **imperative DML** (MERGE or INSERT INTO) instead of a declarative SELECT. This unlocks patterns that standard DTs can't express efficiently.
4+
5+
**When to use:** Standard DTs should always be your first choice. Use custom incremental only when:
6+
- You need **stream-static joins** (fact stream + dimension snapshot)
7+
- You need **append-only pipelines** (only process inserts, ignore updates/deletes)
8+
- You need **user-defined semantics** (audit deletes, soft-delete, running aggregates)
9+
10+
## Syntax
11+
12+
```sql
13+
CREATE OR REPLACE DYNAMIC TABLE my_dt (
14+
col1 TYPE, col2 TYPE -- explicit columns required
15+
)
16+
TARGET_LAG = '5 minutes'
17+
WAREHOUSE = my_wh
18+
REFRESH_MODE = CUSTOM_INCREMENTAL
19+
[ BACKFILL FROM existing_table ]
20+
REFRESH USING (
21+
-- MERGE INTO SELF or INSERT INTO SELF
22+
);
23+
```
24+
25+
Key concepts:
26+
- `SELF` references the DT being created (you cannot use the DT's name)
27+
- `CHANGES(INFORMATION => { DEFAULT | APPEND_ONLY })` consumes changes since last refresh
28+
- Tables outside `CHANGES()` are read as static snapshots at refresh time
29+
- Explicit column schema is required (no `AS SELECT` inference)
30+
31+
## Pattern: Stream-Static Join (Append-Only)
32+
33+
Enrich new events with current dimension data. Only new events are processed — dimension changes don't trigger reprocessing.
34+
35+
```sql
36+
CREATE OR REPLACE DYNAMIC TABLE enriched_clicks (
37+
click_id INT, user_id INT, page_title STRING,
38+
section STRING, click_ts TIMESTAMP
39+
)
40+
TARGET_LAG = DOWNSTREAM
41+
WAREHOUSE = my_wh
42+
REFRESH USING (
43+
INSERT INTO SELF
44+
SELECT c.click_id, c.user_id, p.page_title, p.section, c.click_ts
45+
FROM clicks CHANGES(INFORMATION => APPEND_ONLY) AS c
46+
LEFT OUTER JOIN pages AS p ON c.page_id = p.page_id
47+
);
48+
```
49+
50+
## Pattern: Stream-Static Join (MERGE with Updates/Deletes)
51+
52+
When the fact table has updates and deletes, use MERGE with `ROW_NUMBER()` dedup:
53+
54+
```sql
55+
CREATE OR REPLACE DYNAMIC TABLE enriched_inventory (
56+
sku_id INT, product_name STRING, category STRING,
57+
warehouse_name STRING, region STRING, qty_on_hand INT
58+
)
59+
TARGET_LAG = DOWNSTREAM
60+
WAREHOUSE = my_wh
61+
REFRESH USING (
62+
MERGE INTO SELF AS tgt
63+
USING (
64+
SELECT sku_id, product_name, category, warehouse_name, region,
65+
qty_on_hand, action
66+
FROM (
67+
SELECT s.sku_id, p.product_name, p.category,
68+
w.warehouse_name, w.region, s.qty_on_hand,
69+
s.METADATA$ACTION AS action,
70+
ROW_NUMBER() OVER (
71+
PARTITION BY s.sku_id
72+
ORDER BY CASE s.METADATA$ACTION WHEN 'INSERT' THEN 0 ELSE 1 END
73+
) AS rn
74+
FROM stock CHANGES(INFORMATION => DEFAULT) AS s
75+
LEFT OUTER JOIN products AS p ON s.product_id = p.product_id
76+
LEFT OUTER JOIN warehouses AS w ON s.warehouse_id = w.warehouse_id
77+
)
78+
WHERE rn = 1
79+
) AS src
80+
ON tgt.sku_id = src.sku_id
81+
WHEN MATCHED AND src.action = 'DELETE' THEN DELETE
82+
WHEN MATCHED AND src.action = 'INSERT' THEN
83+
UPDATE SET tgt.product_name = src.product_name,
84+
tgt.category = src.category,
85+
tgt.warehouse_name = src.warehouse_name,
86+
tgt.region = src.region,
87+
tgt.qty_on_hand = src.qty_on_hand
88+
WHEN NOT MATCHED AND src.action = 'INSERT' THEN
89+
INSERT (sku_id, product_name, category, warehouse_name, region, qty_on_hand)
90+
VALUES (src.sku_id, src.product_name, src.category, src.warehouse_name,
91+
src.region, src.qty_on_hand)
92+
);
93+
```
94+
95+
## Example: Stream-Static Join End-to-End
96+
97+
A complete walkthrough showing how a stream-static join works in practice. Scenario: an IoT pipeline where sensor readings (high-volume, append-only) are enriched with device metadata (low-volume, rarely changes).
98+
99+
```sql
100+
-- 1. Setup: fact table (append-only sensor readings) + dimension table (device registry)
101+
CREATE TABLE sensor_readings (
102+
reading_id INT AUTOINCREMENT,
103+
device_id INT,
104+
temperature FLOAT,
105+
humidity FLOAT,
106+
reading_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
107+
);
108+
ALTER TABLE sensor_readings SET CHANGE_TRACKING = TRUE;
109+
110+
CREATE TABLE devices (
111+
device_id INT PRIMARY KEY,
112+
device_name STRING,
113+
location STRING,
114+
floor INT
115+
);
116+
117+
-- 2. Custom incremental DT: enrich readings with device info
118+
-- - sensor_readings is the STREAM side (CHANGES => APPEND_ONLY)
119+
-- - devices is the STATIC side (read in full at each refresh, changes ignored)
120+
CREATE OR REPLACE DYNAMIC TABLE enriched_readings (
121+
reading_id INT,
122+
device_id INT,
123+
device_name STRING,
124+
location STRING,
125+
floor INT,
126+
temperature FLOAT,
127+
humidity FLOAT,
128+
reading_ts TIMESTAMP
129+
)
130+
TARGET_LAG = '1 minute'
131+
WAREHOUSE = iot_wh
132+
REFRESH USING (
133+
INSERT INTO SELF
134+
SELECT
135+
r.reading_id, r.device_id,
136+
d.device_name, d.location, d.floor,
137+
r.temperature, r.humidity, r.reading_ts
138+
FROM sensor_readings CHANGES(INFORMATION => APPEND_ONLY) AS r
139+
LEFT OUTER JOIN devices AS d ON r.device_id = d.device_id
140+
);
141+
```
142+
143+
**What happens at each refresh:**
144+
1. `CHANGES(APPEND_ONLY)` returns only new sensor readings since last refresh
145+
2. Each new reading is joined to the **current** device metadata (static snapshot)
146+
3. Results are appended to the DT — previously enriched rows are never touched
147+
4. If a device name changes in `devices`, old readings keep the old name — only new readings pick up the update
148+
149+
**Why this matters:** A standard DT would reprocess ALL readings whenever a device name changes (since it depends on `devices`). The custom incremental version only processes new readings, making it orders of magnitude cheaper for high-volume fact tables with slowly-changing dimensions.
150+
151+
---
152+
153+
## Pattern: Audit Deletes Log
154+
155+
Append-only log of every deletion from a source table:
156+
157+
```sql
158+
CREATE OR REPLACE DYNAMIC TABLE deletions_log (id INT, name STRING, email STRING)
159+
TARGET_LAG = DOWNSTREAM
160+
WAREHOUSE = my_wh
161+
INITIALIZE = ON_SCHEDULE
162+
REFRESH USING (
163+
INSERT INTO SELF
164+
SELECT * EXCLUDE (METADATA$ISUPDATE, METADATA$ACTION)
165+
FROM users CHANGES(INFORMATION => DEFAULT)
166+
WHERE NOT METADATA$ISUPDATE AND METADATA$ACTION = 'DELETE'
167+
);
168+
```
169+
170+
## Limitations (PrPr)
171+
172+
- No cloning or replication
173+
- No DCM/dbt integration yet
174+
- No data governance policies on custom incremental DTs
175+
- No CREATE OR ALTER — must use CREATE OR REPLACE
176+
- Correctness is the user's responsibility (not delayed-view semantics)

0 commit comments

Comments
 (0)