Snowflake-Labs
diff --git a/‎skills/dynamic-tables-guidance/LICENSE‎
Lines changed: 22 additions & 0 deletions b/‎skills/dynamic-tables-guidance/LICENSE‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎skills/dynamic-tables-guidance/SKILL.md‎
Lines changed: 173 additions & 0 deletions b/‎skills/dynamic-tables-guidance/SKILL.md‎
Lines changed: 173 additions & 0 deletions
diff --git a/‎skills/dynamic-tables-guidance/references/custom-incremental.md‎
Lines changed: 176 additions & 0 deletions b/‎skills/dynamic-tables-guidance/references/custom-incremental.md‎
Lines changed: 176 additions & 0 deletions
@@ -0,0 +1,22 @@
+Snowflake Skills License 
+
+© 2026 Snowflake Inc. All rights reserved.
+
+LICENSE: Use of these materials (including all code, prompts, assets, files, and other components of these skills (collectively, “Skills”)) is governed by your agreement with Snowflake for the Service. If no separate agreement exists, use is governed by Snowflake’s Terms of Service (available at: https://www.snowflake.com/en/legal/terms-of-service/). 
+
+Your applicable agreement is referred to as the "Agreement." "Service" is as defined in the Agreement.
+
+ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the contrary, you may not:
+
+* Extract from the Service or retain copies of the Skills outside use with the Service;
+* Reproduce or copy the Skills , except for temporary copies created automatically during authorized use of the Service;
+* Create derivative works based on the Skills; 
+* Distribute, sublicense, or transfer the Skills to any third party;
+* Make, offer to sell, sell, or import any inventions embodied in the Skills; nor, 
+* Reverse engineer, decompile, or disassemble the Skills. 
+
+The receipt, viewing, or possession of the Skills does not convey or imply any license or right beyond those expressly granted above.
+
+Snowflake retains all rights, title, and interest in the Skills, including all copyrights, trademarks, patents, and all other applicable intellectual property rights.
+
+THE SKILLS ARE PROVIDED “AS IS,” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SKILLS OR THE USE OR OTHER DEALINGS IN THE SKILLS.
@@ -0,0 +1,173 @@
+---
+name: dynamic-tables-guidance
+title: Dynamic Tables Guidance
+summary: Decide when to use Dynamic Tables vs MVs, streams+tasks, or dbt, and design production-ready DT pipelines.
+description: "Use when choosing between Dynamic Tables and alternatives (materialized views, streams+tasks, dbt), designing multi-layer DT pipelines, debugging FULL-refresh fallback, or hardening DTs for production. Covers comparison matrices, decision flowcharts, common pitfalls, monitoring queries, and hybrid DT+task patterns. Triggers: dynamic tables guidance, when to use DT, DT vs MV, DT vs streams tasks, DT vs dbt, DT pitfalls, DT best practices, DT pipeline design, target lag, downstream lag, full refresh fallback."
+tools:
+  - snowflake_sql_execute
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Glob
+  - Grep
+prompt: Should I use Dynamic Tables or streams+tasks for my CDC pipeline?
+language: en
+status: Published
+author: Snowflake Solutions Team
+type: snowflake
+---
+
+# Dynamic Tables Guidance
+
+## Overview
+
+Dynamic Tables (DTs) are declarative, auto-refreshing materialized queries. You write a `SELECT`, set a `TARGET_LAG`, and Snowflake keeps results fresh on the schedule you pick. This skill helps you decide when DTs are the right tool, design multi-layer pipelines, and avoid the failure modes that catch real teams in production.
+
+Use this skill when picking between DTs, materialized views, streams+tasks, or dbt — or when an existing DT pipeline is misbehaving (full-refresh fallback, lag drift, runaway cost).
+
+## Quick Decision Flowchart
+
+```
+Need to transform data in Snowflake?
+  ├─ Single table, accelerate queries?           → Materialized View
+  ├─ Multi-step SQL pipeline, fresh data?        → Dynamic Tables
+  ├─ Stream-static joins / append-only?          → Custom Incremental DTs (PrPr)
+  ├─ Cross-warehouse portability or dbt tests?   → dbt models
+  ├─ Procedural logic, IF/ELSE, API calls?       → Streams + Tasks
+  └─ Sub-15-second latency?                      → Streams + Tasks
+```
+
+## Comparison Matrix
+
+| Dimension | Dynamic Tables | Materialized Views | Streams + Tasks | dbt |
+|---|---|---|---|---|
+| Refresh | Target lag (15s+) | Auto, near-real-time | Manual schedule/trigger | Batch (`dbt run`) |
+| SQL support | Full SELECT, JOINs, windows | Single table only | Full + procedural | Full + Jinja |
+| Chaining | `TARGET_LAG = DOWNSTREAM` | No | Manual DAG | Ref graph |
+| Incremental | Built-in for supported ops | Auto | You write it | Manual `is_incremental` |
+| Side effects | None | None | Email, API, externals | None |
+| Cost | Your warehouse | Serverless | Your warehouse | Your warehouse |
+
+**Rule of thumb:** Start with DTs. Reach for streams+tasks only when you need procedural logic, side effects, or sub-15s latency. Use dbt when you need its testing framework or cross-warehouse portability.
+
+## Pipeline Pattern: Bronze → Silver → Gold
+
+```sql
+-- Bronze: parse raw VARIANT
+CREATE DYNAMIC TABLE bronze_events
+  TARGET_LAG = DOWNSTREAM
+  WAREHOUSE = pipeline_wh
+  AS SELECT
+    record_content:event_id::STRING AS event_id,
+    record_content:event_type::STRING AS event_type,
+    record_content:timestamp::TIMESTAMP_NTZ AS event_ts
+  FROM raw_events_topic;
+
+-- Silver: business logic + joins
+CREATE DYNAMIC TABLE silver_purchases
+  TARGET_LAG = DOWNSTREAM
+  WAREHOUSE = pipeline_wh
+  AS SELECT e.event_id, e.event_ts, p.product_name, p.category
+     FROM bronze_events e
+     JOIN products p ON e.payload:product_id::STRING = p.product_id
+     WHERE e.event_type = 'purchase';
+
+-- Gold: only the leaf has a time-based lag
+CREATE DYNAMIC TABLE gold_hourly_sales
+  TARGET_LAG = '5 minutes'
+  WAREHOUSE = pipeline_wh
+  AS SELECT DATE_TRUNC('hour', event_ts) AS sales_hour, category,
+            COUNT(*) AS order_count
+     FROM silver_purchases
+     GROUP BY 1, 2;
+```
+
+**Key rule:** Only the leaf DT has a time-based `TARGET_LAG`. Intermediates use `DOWNSTREAM` so Snowflake derives their lag from the leaf.
+
+## Monitoring Essentials
+
+```sql
+-- Health check
+SELECT name, scheduling_state, last_completed_refresh_state,
+       refresh_mode, time_within_target_lag_ratio
+FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLES())
+ORDER BY time_within_target_lag_ratio ASC;
+
+-- Recent refresh history
+SELECT name, state, refresh_action,
+       DATEDIFF('second', refresh_start_time, refresh_end_time) AS duration_sec
+FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY(
+  NAME_PREFIX => '<db>.<schema>'))
+ORDER BY refresh_start_time DESC LIMIT 20;
+
+-- Errors only
+SELECT name, state, state_message
+FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY(
+  NAME_PREFIX => '<db>.<schema>', ERROR_ONLY => TRUE));
+```
+
+For account-wide DT cost, query `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` filtered to refresh queries.
+
+## Common Mistakes
+
+- **`SELECT *` in DT definition** — breaks incremental refresh on schema changes. Always use explicit column lists.
+- **Time-based lag on every layer** — only the leaf should have a time-based lag. Use `TARGET_LAG = DOWNSTREAM` on intermediates.
+- **Change tracking off on base tables** — DTs require `CHANGE_TRACKING = TRUE` for incremental refresh. Check with `SHOW TABLES`.
+- **Falling back to FULL refresh silently** — check `refresh_mode_reason` if `refresh_mode` shows `FULL` when you expected `INCREMENTAL`.
+- **Missing PRIMARY KEY RELY** — without it, `INSERT OVERWRITE` reprocesses everything downstream and you lose incremental chains.
+- **DISTINCT/UNION fanout** — these operators force full refresh. Refactor with `QUALIFY ROW_NUMBER()` or `UNION ALL` where possible.
+- **Sharing one warehouse with interactive queries** — DT refreshes will compete with user queries. Use a dedicated warehouse.
+- **No `INITIALIZATION_WAREHOUSE` for big initial loads** — first refresh on large DTs can OOM a small warehouse. Set a larger init WH, then unset.
+
+## Production Readiness Checklist
+
+- [ ] Explicit column lists (no `SELECT *`)
+- [ ] `CHANGE_TRACKING = TRUE` on all base tables
+- [ ] Intermediates use `TARGET_LAG = DOWNSTREAM`
+- [ ] Leaf target lag ≥ all upstream lags
+- [ ] `refresh_mode` is `INCREMENTAL` (verify `refresh_mode_reason`)
+- [ ] Dedicated warehouse for DT refreshes
+- [ ] Monitoring on `time_within_target_lag_ratio > 0.95`
+- [ ] Alerting on refresh failures
+- [ ] `INITIALIZATION_WAREHOUSE` set for large initial loads
+
+## Hybrid Pattern: DT + Task for Side Effects
+
+```sql
+CREATE STREAM gold_metrics_stream ON DYNAMIC TABLE gold_metrics;
+
+CREATE TASK notify_on_refresh
+  WAREHOUSE = ops_wh
+  WHEN SYSTEM$STREAM_HAS_DATA('gold_metrics_stream')
+AS
+BEGIN
+  LET change_count INT := (SELECT COUNT(*) FROM gold_metrics_stream);
+  CALL SYSTEM$SEND_EMAIL('team@co.com', 'Metrics Updated',
+    change_count || ' rows changed');
+  CREATE OR REPLACE TEMP TABLE _consume AS SELECT * FROM gold_metrics_stream;
+END;
+```
+
+## Workflow
+
+1. **Assess fit** — run the decision flowchart. If DTs aren't the right tool, stop here.
+2. **Pick refresh mode** — `AUTO` (default), `INCREMENTAL` (force, fail if ineligible), `FULL`, or `CUSTOM_INCREMENTAL` (PrPr).
+3. **Design layers** — Bronze→Silver→Gold with `DOWNSTREAM` on intermediates.
+4. **Harden** — apply the production checklist.
+
+⚠️ STOPPING POINT: Before running `CREATE OR REPLACE DYNAMIC TABLE` against existing pipelines, show the user the planned DDL and confirm. Replacing a DT triggers a full reload and may invalidate downstream incremental chains.
+
+⚠️ STOPPING POINT: Before applying `ALTER DYNAMIC TABLE ... SUSPEND` or `DROP DYNAMIC TABLE`, confirm with the user — downstream DTs depending on the target will stop refreshing.
+
+## Stopping Points
+
+- Workflow Step 4 — confirm DDL before `CREATE OR REPLACE DYNAMIC TABLE` on existing pipelines
+- Workflow Step 4 — confirm before `ALTER ... SUSPEND` or `DROP DYNAMIC TABLE`
+
+## References
+
+- `references/pitfalls-and-pks.md` — full pitfall deep-dive plus `PRIMARY KEY RELY`, `IMMUTABLE WHERE`, `BACKFILL FROM`
+- `references/custom-incremental.md` — Custom Incremental DTs (PrPr) syntax and patterns
+- `references/dcm-for-dts.md` — Database Change Management for git-native DT deployment
+- Built-in Cortex Code skill: `dynamic-tables`
@@ -0,0 +1,176 @@
+# Custom Incremental Dynamic Tables (Private Preview)
+
+Custom incremental DTs let you define refresh logic using **imperative DML** (MERGE or INSERT INTO) instead of a declarative SELECT. This unlocks patterns that standard DTs can't express efficiently.
+
+**When to use:** Standard DTs should always be your first choice. Use custom incremental only when:
+- You need **stream-static joins** (fact stream + dimension snapshot)
+- You need **append-only pipelines** (only process inserts, ignore updates/deletes)
+- You need **user-defined semantics** (audit deletes, soft-delete, running aggregates)
+
+## Syntax
+
+```sql
+CREATE OR REPLACE DYNAMIC TABLE my_dt (
+  col1 TYPE, col2 TYPE  -- explicit columns required
+)
+  TARGET_LAG = '5 minutes'
+  WAREHOUSE = my_wh
+  REFRESH_MODE = CUSTOM_INCREMENTAL
+  [ BACKFILL FROM existing_table ]
+  REFRESH USING (
+    -- MERGE INTO SELF or INSERT INTO SELF
+  );
+```
+
+Key concepts:
+- `SELF` references the DT being created (you cannot use the DT's name)
+- `CHANGES(INFORMATION => { DEFAULT | APPEND_ONLY })` consumes changes since last refresh
+- Tables outside `CHANGES()` are read as static snapshots at refresh time
+- Explicit column schema is required (no `AS SELECT` inference)
+
+## Pattern: Stream-Static Join (Append-Only)
+
+Enrich new events with current dimension data. Only new events are processed — dimension changes don't trigger reprocessing.
+
+```sql
+CREATE OR REPLACE DYNAMIC TABLE enriched_clicks (
+  click_id INT, user_id INT, page_title STRING,
+  section STRING, click_ts TIMESTAMP
+)
+  TARGET_LAG = DOWNSTREAM
+  WAREHOUSE = my_wh
+  REFRESH USING (
+    INSERT INTO SELF
+    SELECT c.click_id, c.user_id, p.page_title, p.section, c.click_ts
+    FROM clicks CHANGES(INFORMATION => APPEND_ONLY) AS c
+    LEFT OUTER JOIN pages AS p ON c.page_id = p.page_id
+  );
+```
+
+## Pattern: Stream-Static Join (MERGE with Updates/Deletes)
+
+When the fact table has updates and deletes, use MERGE with `ROW_NUMBER()` dedup:
+
+```sql
+CREATE OR REPLACE DYNAMIC TABLE enriched_inventory (
+  sku_id INT, product_name STRING, category STRING,
+  warehouse_name STRING, region STRING, qty_on_hand INT
+)
+  TARGET_LAG = DOWNSTREAM
+  WAREHOUSE = my_wh
+  REFRESH USING (
+    MERGE INTO SELF AS tgt
+    USING (
+      SELECT sku_id, product_name, category, warehouse_name, region,
+             qty_on_hand, action
+      FROM (
+        SELECT s.sku_id, p.product_name, p.category,
+               w.warehouse_name, w.region, s.qty_on_hand,
+               s.METADATA$ACTION AS action,
+               ROW_NUMBER() OVER (
+                 PARTITION BY s.sku_id
+                 ORDER BY CASE s.METADATA$ACTION WHEN 'INSERT' THEN 0 ELSE 1 END
+               ) AS rn
+        FROM stock CHANGES(INFORMATION => DEFAULT) AS s
+        LEFT OUTER JOIN products AS p ON s.product_id = p.product_id
+        LEFT OUTER JOIN warehouses AS w ON s.warehouse_id = w.warehouse_id
+      )
+      WHERE rn = 1
+    ) AS src
+    ON tgt.sku_id = src.sku_id
+    WHEN MATCHED AND src.action = 'DELETE' THEN DELETE
+    WHEN MATCHED AND src.action = 'INSERT' THEN
+      UPDATE SET tgt.product_name = src.product_name,
+                 tgt.category = src.category,
+                 tgt.warehouse_name = src.warehouse_name,
+                 tgt.region = src.region,
+                 tgt.qty_on_hand = src.qty_on_hand
+    WHEN NOT MATCHED AND src.action = 'INSERT' THEN
+      INSERT (sku_id, product_name, category, warehouse_name, region, qty_on_hand)
+      VALUES (src.sku_id, src.product_name, src.category, src.warehouse_name,
+              src.region, src.qty_on_hand)
+  );
+```
+
+## Example: Stream-Static Join End-to-End
+
+A complete walkthrough showing how a stream-static join works in practice. Scenario: an IoT pipeline where sensor readings (high-volume, append-only) are enriched with device metadata (low-volume, rarely changes).
+
+```sql
+-- 1. Setup: fact table (append-only sensor readings) + dimension table (device registry)
+CREATE TABLE sensor_readings (
+  reading_id INT AUTOINCREMENT,
+  device_id INT,
+  temperature FLOAT,
+  humidity FLOAT,
+  reading_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
+);
+ALTER TABLE sensor_readings SET CHANGE_TRACKING = TRUE;
+
+CREATE TABLE devices (
+  device_id INT PRIMARY KEY,
+  device_name STRING,
+  location STRING,
+  floor INT
+);
+
+-- 2. Custom incremental DT: enrich readings with device info
+--    - sensor_readings is the STREAM side (CHANGES => APPEND_ONLY)
+--    - devices is the STATIC side (read in full at each refresh, changes ignored)
+CREATE OR REPLACE DYNAMIC TABLE enriched_readings (
+  reading_id INT,
+  device_id INT,
+  device_name STRING,
+  location STRING,
+  floor INT,
+  temperature FLOAT,
+  humidity FLOAT,
+  reading_ts TIMESTAMP
+)
+  TARGET_LAG = '1 minute'
+  WAREHOUSE = iot_wh
+  REFRESH USING (
+    INSERT INTO SELF
+    SELECT
+      r.reading_id, r.device_id,
+      d.device_name, d.location, d.floor,
+      r.temperature, r.humidity, r.reading_ts
+    FROM sensor_readings CHANGES(INFORMATION => APPEND_ONLY) AS r
+    LEFT OUTER JOIN devices AS d ON r.device_id = d.device_id
+  );
+```
+
+**What happens at each refresh:**
+1. `CHANGES(APPEND_ONLY)` returns only new sensor readings since last refresh
+2. Each new reading is joined to the **current** device metadata (static snapshot)
+3. Results are appended to the DT — previously enriched rows are never touched
+4. If a device name changes in `devices`, old readings keep the old name — only new readings pick up the update
+
+**Why this matters:** A standard DT would reprocess ALL readings whenever a device name changes (since it depends on `devices`). The custom incremental version only processes new readings, making it orders of magnitude cheaper for high-volume fact tables with slowly-changing dimensions.
+
+---
+
+## Pattern: Audit Deletes Log
+
+Append-only log of every deletion from a source table:
+
+```sql
+CREATE OR REPLACE DYNAMIC TABLE deletions_log (id INT, name STRING, email STRING)
+  TARGET_LAG = DOWNSTREAM
+  WAREHOUSE = my_wh
+  INITIALIZE = ON_SCHEDULE
+  REFRESH USING (
+    INSERT INTO SELF
+    SELECT * EXCLUDE (METADATA$ISUPDATE, METADATA$ACTION)
+    FROM users CHANGES(INFORMATION => DEFAULT)
+    WHERE NOT METADATA$ISUPDATE AND METADATA$ACTION = 'DELETE'
+  );
+```
+
+## Limitations (PrPr)
+
+- No cloning or replication
+- No DCM/dbt integration yet
+- No data governance policies on custom incremental DTs
+- No CREATE OR ALTER — must use CREATE OR REPLACE
+- Correctness is the user's responsibility (not delayed-view semantics)