Stage dynamic-tables-guidance from Snowflake-Solutions

jdanielmyers · jdanielmyers · commit 622bc38e5b5d · 2026-05-20T15:10:06.000-05:00
diff --git a/skills/dynamic-tables-guidance/LICENSE b/skills/dynamic-tables-guidance/LICENSE
@@ -0,0 +1,22 @@
+Snowflake Skills License 
+
+© 2026 Snowflake Inc. All rights reserved.
+
+LICENSE: Use of these materials (including all code, prompts, assets, files, and other components of these skills (collectively, “Skills”)) is governed by your agreement with Snowflake for the Service. If no separate agreement exists, use is governed by Snowflake’s Terms of Service (available at: https://www.snowflake.com/en/legal/terms-of-service/). 
+
+Your applicable agreement is referred to as the "Agreement." "Service" is as defined in the Agreement.
+
+ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the contrary, you may not:
+
+* Extract from the Service or retain copies of the Skills outside use with the Service;
+* Reproduce or copy the Skills , except for temporary copies created automatically during authorized use of the Service;
+* Create derivative works based on the Skills; 
+* Distribute, sublicense, or transfer the Skills to any third party;
+* Make, offer to sell, sell, or import any inventions embodied in the Skills; nor, 
+* Reverse engineer, decompile, or disassemble the Skills. 
+
+The receipt, viewing, or possession of the Skills does not convey or imply any license or right beyond those expressly granted above.
+
+Snowflake retains all rights, title, and interest in the Skills, including all copyrights, trademarks, patents, and all other applicable intellectual property rights.
+
+THE SKILLS ARE PROVIDED “AS IS,” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SKILLS OR THE USE OR OTHER DEALINGS IN THE SKILLS.
diff --git a/skills/dynamic-tables-guidance/SKILL.md b/skills/dynamic-tables-guidance/SKILL.md
@@ -0,0 +1,117 @@
+---
+name: dynamic-tables-guidance
+title: Design Dynamic Tables
+summary: Decide when Dynamic Tables fit, design the pipeline, and ship it production-ready without the usual full-refresh traps.
+description: "Use when you need to decide between Dynamic Tables, materialized views, streams+tasks, or dbt for a Snowflake pipeline, design a multi-layer DT DAG, debug a DT that fell back to FULL refresh, or harden a DT pipeline for production. Triggers: dynamic tables, DT design, DT vs MV, DT vs streams tasks, DT vs dbt, DT pitfalls, DT best practices, target lag, downstream lag, refresh mode, INCREMENTAL, FULL refresh, IMMUTABLE WHERE, BACKFILL FROM, primary key RELY, DT monitoring, DT pipeline."
+tools:
+  - snowflake_sql_execute
+  - snowflake_object_search
+  - Read
+  - Write
+  - Edit
+  - Grep
+prompt: Help me design a Dynamic Tables pipeline for my bronze/silver/gold workflow and avoid the full-refresh trap.
+language: en
+status: Published
+author: Snowflake Solutions Team
+type: snowflake
+---
+
+# Design Dynamic Tables
+
+## Overview
+
+Dynamic Tables (DTs) are declarative, auto-refreshing materialized queries. You write a `SELECT`, set a `TARGET_LAG`, and Snowflake keeps the results fresh, picking INCREMENTAL or FULL refresh automatically. This skill helps you choose DTs over the alternatives, design a clean DAG, and ship it without the common gotchas.
+
+## Quick Decision
+
+| Need | Use |
+|------|-----|
+| Single-table query acceleration | Materialized View |
+| Multi-step SQL pipeline, continuous freshness | **Dynamic Tables** |
+| Stream-static joins, append-only patterns | Custom Incremental DTs (PrPr) |
+| Cross-warehouse portability, `dbt test` | dbt models |
+| Procedural logic, IF/ELSE, API calls, notifications | Streams + Tasks |
+| Sub-15-second latency | Streams + Tasks |
+
+DTs win when the work is pure SQL transforms inside Snowflake and you want self-orchestration. Reach for streams+tasks only when you hit procedural logic, side effects, or sub-15s latency.
+
+## Pipeline Pattern: Bronze → Silver → Gold
+
+```sql
+-- Intermediate layers: TARGET_LAG = DOWNSTREAM
+CREATE DYNAMIC TABLE bronze_events
+  TARGET_LAG = DOWNSTREAM
+  WAREHOUSE = pipeline_wh
+  AS
+    SELECT record_content:event_id::STRING AS event_id,
+           record_content:event_type::STRING AS event_type,
+           record_content:timestamp::TIMESTAMP_NTZ AS event_ts
+    FROM raw_events;
+
+-- Leaf layer: only one with a time-based lag
+CREATE DYNAMIC TABLE gold_hourly_sales
+  TARGET_LAG = '5 minutes'
+  WAREHOUSE = pipeline_wh
+  AS
+    SELECT DATE_TRUNC('hour', event_ts) AS sales_hour,
+           COUNT(*) AS order_count
+    FROM bronze_events
+    GROUP BY 1;
+```
+
+Rule: only the leaf DT gets a time-based `TARGET_LAG`; everything upstream uses `DOWNSTREAM`. Use a dedicated warehouse to isolate refresh cost from interactive queries.
+
+## Monitoring
+
+```sql
+SELECT name, scheduling_state, last_completed_refresh_state,
+       refresh_mode, time_within_target_lag_ratio
+FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLES())
+ORDER BY time_within_target_lag_ratio ASC;
+
+SELECT name, state, state_message, refresh_action
+FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY(
+  NAME_PREFIX => '<db>.<schema>', ERROR_ONLY => TRUE
+))
+ORDER BY refresh_start_time DESC LIMIT 10;
+```
+
+Alert when `time_within_target_lag_ratio < 0.95` or refresh failures appear in `SNOWFLAKE.ACCOUNT_USAGE.DYNAMIC_TABLE_REFRESH_HISTORY`.
+
+## Production Checklist
+
+- Explicit column lists (no `SELECT *` — adds break incremental)
+- Change tracking enabled on base tables
+- Intermediates use `TARGET_LAG = DOWNSTREAM`; leaf lag ≥ all upstream lags
+- `refresh_mode = INCREMENTAL` confirmed (check `refresh_mode_reason` if FULL)
+- Dedicated refresh warehouse; `INITIALIZATION_WAREHOUSE` for big first loads
+- `IMMUTABLE WHERE` on partitions that never change (compliance, cost)
+- `PRIMARY KEY ... RELY` set so downstream DTs stay incremental
+- Failure alerting wired up
+
+## Common Mistakes
+
+- **`SELECT *` everywhere.** Schema drift forces FULL refresh. Always list columns.
+- **Time-based lag on every layer.** Causes redundant refreshes. Only the leaf gets a time lag; intermediates use `DOWNSTREAM`.
+- **Leaf lag tighter than upstream.** Snowflake can't honor it. Leaf lag must be ≥ max upstream lag.
+- **Forgetting change tracking.** Without it, refreshes go FULL. Enable on base tables explicitly or let Snowflake auto-enable on first DT creation.
+- **No `PRIMARY KEY RELY`.** Causes `INSERT OVERWRITE` reprocessing and breaks incremental-after-full chains downstream.
+- **`DISTINCT` over wide rows.** Triggers fanout and FULL refresh. Pre-aggregate or use `QUALIFY ROW_NUMBER()`.
+- **Misusing `IMMUTABLE WHERE`.** It freezes rows; if upstream rows in that range change later, results drift silently.
+- **Treating DTs as a streaming engine.** Minimum lag is 15s (preview). Use streams+tasks for sub-second pipelines.
+- **Calling external functions with side effects.** Not supported. Wrap with a stream+task on the leaf DT.
+
+## Workflow
+
+1. Use the decision table to confirm DTs fit. Stop and route otherwise.
+2. Map layers (Bronze/Silver/Gold), pick `TARGET_LAG` per layer, assign warehouses.
+3. Apply the production checklist. Verify `refresh_mode = INCREMENTAL` after first refresh.
+4. For stream-static joins or append-only patterns, see `references/custom-incremental.md`.
+5. For git-native deployment via DCM, see `references/dcm-for-dts.md`.
+
+## References
+
+- `references/pitfalls-and-pks.md` — full pitfalls list, `PRIMARY KEY RELY`, `IMMUTABLE WHERE`, `BACKFILL FROM`
+- `references/custom-incremental.md` — Custom Incremental DTs (PrPr) syntax and patterns
+- `references/dcm-for-dts.md` — DCM `DEFINE DYNAMIC TABLE` workflow
diff --git a/skills/dynamic-tables-guidance/references/custom-incremental.md b/skills/dynamic-tables-guidance/references/custom-incremental.md
@@ -0,0 +1,176 @@
+# Custom Incremental Dynamic Tables (Private Preview)
+
+Custom incremental DTs let you define refresh logic using **imperative DML** (MERGE or INSERT INTO) instead of a declarative SELECT. This unlocks patterns that standard DTs can't express efficiently.
+
+**When to use:** Standard DTs should always be your first choice. Use custom incremental only when:
+- You need **stream-static joins** (fact stream + dimension snapshot)
+- You need **append-only pipelines** (only process inserts, ignore updates/deletes)
+- You need **user-defined semantics** (audit deletes, soft-delete, running aggregates)
+
+## Syntax
+
+```sql
+CREATE OR REPLACE DYNAMIC TABLE my_dt (
+  col1 TYPE, col2 TYPE  -- explicit columns required
+)
+  TARGET_LAG = '5 minutes'
+  WAREHOUSE = my_wh
+  REFRESH_MODE = CUSTOM_INCREMENTAL
+  [ BACKFILL FROM existing_table ]
+  REFRESH USING (
+    -- MERGE INTO SELF or INSERT INTO SELF
+  );
+```
+
+Key concepts:
+- `SELF` references the DT being created (you cannot use the DT's name)
+- `CHANGES(INFORMATION => { DEFAULT | APPEND_ONLY })` consumes changes since last refresh
+- Tables outside `CHANGES()` are read as static snapshots at refresh time
+- Explicit column schema is required (no `AS SELECT` inference)
+
+## Pattern: Stream-Static Join (Append-Only)
+
+Enrich new events with current dimension data. Only new events are processed — dimension changes don't trigger reprocessing.
+
+```sql
+CREATE OR REPLACE DYNAMIC TABLE enriched_clicks (
+  click_id INT, user_id INT, page_title STRING,
+  section STRING, click_ts TIMESTAMP
+)
+  TARGET_LAG = DOWNSTREAM
+  WAREHOUSE = my_wh
+  REFRESH USING (
+    INSERT INTO SELF
+    SELECT c.click_id, c.user_id, p.page_title, p.section, c.click_ts
+    FROM clicks CHANGES(INFORMATION => APPEND_ONLY) AS c
+    LEFT OUTER JOIN pages AS p ON c.page_id = p.page_id
+  );
+```
+
+## Pattern: Stream-Static Join (MERGE with Updates/Deletes)
+
+When the fact table has updates and deletes, use MERGE with `ROW_NUMBER()` dedup:
+
+```sql
+CREATE OR REPLACE DYNAMIC TABLE enriched_inventory (
+  sku_id INT, product_name STRING, category STRING,
+  warehouse_name STRING, region STRING, qty_on_hand INT
+)
+  TARGET_LAG = DOWNSTREAM
+  WAREHOUSE = my_wh
+  REFRESH USING (
+    MERGE INTO SELF AS tgt
+    USING (
+      SELECT sku_id, product_name, category, warehouse_name, region,
+             qty_on_hand, action
+      FROM (
+        SELECT s.sku_id, p.product_name, p.category,
+               w.warehouse_name, w.region, s.qty_on_hand,
+               s.METADATA$ACTION AS action,
+               ROW_NUMBER() OVER (
+                 PARTITION BY s.sku_id
+                 ORDER BY CASE s.METADATA$ACTION WHEN 'INSERT' THEN 0 ELSE 1 END
+               ) AS rn
+        FROM stock CHANGES(INFORMATION => DEFAULT) AS s
+        LEFT OUTER JOIN products AS p ON s.product_id = p.product_id
+        LEFT OUTER JOIN warehouses AS w ON s.warehouse_id = w.warehouse_id
+      )
+      WHERE rn = 1
+    ) AS src
+    ON tgt.sku_id = src.sku_id
+    WHEN MATCHED AND src.action = 'DELETE' THEN DELETE
+    WHEN MATCHED AND src.action = 'INSERT' THEN
+      UPDATE SET tgt.product_name = src.product_name,
+                 tgt.category = src.category,
+                 tgt.warehouse_name = src.warehouse_name,
+                 tgt.region = src.region,
+                 tgt.qty_on_hand = src.qty_on_hand
+    WHEN NOT MATCHED AND src.action = 'INSERT' THEN
+      INSERT (sku_id, product_name, category, warehouse_name, region, qty_on_hand)
+      VALUES (src.sku_id, src.product_name, src.category, src.warehouse_name,
+              src.region, src.qty_on_hand)
+  );
+```
+
+## Example: Stream-Static Join End-to-End
+
+A complete walkthrough showing how a stream-static join works in practice. Scenario: an IoT pipeline where sensor readings (high-volume, append-only) are enriched with device metadata (low-volume, rarely changes).
+
+```sql
+-- 1. Setup: fact table (append-only sensor readings) + dimension table (device registry)
+CREATE TABLE sensor_readings (
+  reading_id INT AUTOINCREMENT,
+  device_id INT,
+  temperature FLOAT,
+  humidity FLOAT,
+  reading_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
+);
+ALTER TABLE sensor_readings SET CHANGE_TRACKING = TRUE;
+
+CREATE TABLE devices (
+  device_id INT PRIMARY KEY,
+  device_name STRING,
+  location STRING,
+  floor INT
+);
+
+-- 2. Custom incremental DT: enrich readings with device info
+--    - sensor_readings is the STREAM side (CHANGES => APPEND_ONLY)
+--    - devices is the STATIC side (read in full at each refresh, changes ignored)
+CREATE OR REPLACE DYNAMIC TABLE enriched_readings (
+  reading_id INT,
+  device_id INT,
+  device_name STRING,
+  location STRING,
+  floor INT,
+  temperature FLOAT,
+  humidity FLOAT,
+  reading_ts TIMESTAMP
+)
+  TARGET_LAG = '1 minute'
+  WAREHOUSE = iot_wh
+  REFRESH USING (
+    INSERT INTO SELF
+    SELECT
+      r.reading_id, r.device_id,
+      d.device_name, d.location, d.floor,
+      r.temperature, r.humidity, r.reading_ts
+    FROM sensor_readings CHANGES(INFORMATION => APPEND_ONLY) AS r
+    LEFT OUTER JOIN devices AS d ON r.device_id = d.device_id
+  );
+```
+
+**What happens at each refresh:**
+1. `CHANGES(APPEND_ONLY)` returns only new sensor readings since last refresh
+2. Each new reading is joined to the **current** device metadata (static snapshot)
+3. Results are appended to the DT — previously enriched rows are never touched
+4. If a device name changes in `devices`, old readings keep the old name — only new readings pick up the update
+
+**Why this matters:** A standard DT would reprocess ALL readings whenever a device name changes (since it depends on `devices`). The custom incremental version only processes new readings, making it orders of magnitude cheaper for high-volume fact tables with slowly-changing dimensions.
+
+---
+
+## Pattern: Audit Deletes Log
+
+Append-only log of every deletion from a source table:
+
+```sql
+CREATE OR REPLACE DYNAMIC TABLE deletions_log (id INT, name STRING, email STRING)
+  TARGET_LAG = DOWNSTREAM
+  WAREHOUSE = my_wh
+  INITIALIZE = ON_SCHEDULE
+  REFRESH USING (
+    INSERT INTO SELF
+    SELECT * EXCLUDE (METADATA$ISUPDATE, METADATA$ACTION)
+    FROM users CHANGES(INFORMATION => DEFAULT)
+    WHERE NOT METADATA$ISUPDATE AND METADATA$ACTION = 'DELETE'
+  );
+```
+
+## Limitations (PrPr)
+
+- No cloning or replication
+- No DCM/dbt integration yet
+- No data governance policies on custom incremental DTs
+- No CREATE OR ALTER — must use CREATE OR REPLACE
+- Correctness is the user's responsibility (not delayed-view semantics)
diff --git a/skills/dynamic-tables-guidance/references/dcm-for-dts.md b/skills/dynamic-tables-guidance/references/dcm-for-dts.md
@@ -0,0 +1,70 @@
+# DCM for Dynamic Tables
+
+DCM (Database Change Management) provides **git-native infrastructure-as-code** for DT pipelines. Define your DTs declaratively, version them in git, and deploy with `snow dcm plan` → `snow dcm deploy`.
+
+## Why DCM for DTs
+
+- **Version controlled** — DT definitions live in git alongside your other infrastructure
+- **Repeatable deployments** — same definitions deploy to dev/staging/prod via templating
+- **Schema evolution** — change a DT definition, redeploy, DCM handles the diff
+- **Full pipeline IaC** — database, schema, warehouses, tables, DTs, roles, and grants in one project
+
+## DCM DT Syntax (DEFINE)
+
+```sql
+DEFINE DYNAMIC TABLE {{ database }}.{{ schema }}.BRONZE_EVENTS
+TARGET_LAG = DOWNSTREAM
+WAREHOUSE = {{ database }}_DT_WH
+AS
+  SELECT
+    record_content:event_id::STRING AS event_id,
+    record_content:event_type::STRING AS event_type,
+    record_content:user_id::STRING AS user_id,
+    record_content:timestamp::TIMESTAMP_NTZ AS event_ts,
+    record_content:payload AS payload
+  FROM {{ database }}.{{ schema }}.RAW_EVENTS_TOPIC;
+
+DEFINE DYNAMIC TABLE {{ database }}.{{ schema }}.GOLD_HOURLY_SALES
+TARGET_LAG = '5 minutes'
+WAREHOUSE = {{ database }}_DT_WH
+AS
+  SELECT
+    DATE_TRUNC('hour', event_ts) AS sales_hour,
+    category,
+    COUNT(DISTINCT event_id) AS order_count,
+    SUM(line_total) AS revenue
+  FROM {{ database }}.{{ schema }}.SILVER_PURCHASES
+  GROUP BY 1, 2;
+```
+
+## DCM Manifest (manifest.yml)
+
+```yaml
+manifest_version: 2
+type: DCM_PROJECT
+default_target: 'DEV'
+targets:
+  DEV:
+    project_name: '{{DATABASE}}.{{SCHEMA}}.MY_PROJECT'
+    project_owner: SYSADMIN
+    templating_config: 'DEV'
+templating:
+  defaults:
+    database: 'MY_DB'
+    schema: 'PUBLIC'
+  configurations:
+    DEV:
+      database: 'MY_DB_DEV'
+    PROD:
+      database: 'MY_DB_PROD'
+```
+
+## DCM Workflow
+
+```bash
+snow dcm raw-analyze dcm/ -c <connection>
+snow dcm plan dcm/ -c <connection> --save-output
+snow dcm deploy dcm/ -c <connection> --alias "v1-initial"
+```
+
+**Tip:** Put all DT definitions in a single `dynamic_tables.sql` file within `dcm/definitions/`. DCM processes all `.sql` files in that directory. Use Jinja templating (`{{ database }}`, `{{ schema }}`) for environment portability.
diff --git a/skills/dynamic-tables-guidance/references/pitfalls-and-pks.md b/skills/dynamic-tables-guidance/references/pitfalls-and-pks.md