Skip to content

Snowflake-Labs/automated-intelligence-dev-day-2026-hol

Repository files navigation

Build an End-to-End AI Application on Snowflake

Overview

In this hands-on lab, you'll build a complete AI-powered retail analytics platform entirely within Snowflake — no external infrastructure required. Using Snowflake CoCo as your AI-assisted development environment, you'll work through the full data lifecycle: stream real-time orders via Snowpipe Streaming, MERGE them into production tables with Gen2 Warehouses, transform them through a 3-tier Dynamic Tables pipeline, and serve them with Interactive Tables for low-latency point lookups.

You'll build analytical models with dbt, monitor data quality with Data Metric Functions, and create custom CoCo skills for reusable workflows. Tie it all together with Snowflake CoWork — a conversational AI interface where a Cortex Agent orchestrates Cortex Analyst and Agentic Search to answer "what happened" and "why" from both structured and unstructured data. Finally, evaluate your agent with ground-truth datasets, implement row-level security, and expose your agent as a managed MCP server for external AI clients.

What You'll Learn

  • Accelerate development with Snowflake CoCo (AI-assisted SQL, deployment, and data exploration)
  • Stream real-time data with Snowpipe Streaming and transform with Dynamic Tables
  • Serve low-latency queries with Interactive Tables and Gen2 Warehouses
  • Build analytical models with dbt
  • Monitor data quality automatically with Data Metric Functions
  • Create and query managed Iceberg V3 tables (deletion vectors, row lineage) (optional)
  • Create custom CoCo skills for reusable team workflows
  • Build a Cortex Agent with Cortex Analyst (semantic view + verified queries) and Agentic Search (multi-index Cortex Search)
  • Evaluate agent quality with ground-truth datasets and LLM judges
  • Expose agents as managed MCP servers for external AI clients
  • Implement transparent row-level security with Row Access Policies

What You'll Build

A production-grade AI-powered retail analytics platform on Snowflake — from raw data to conversational AI insights, entirely within a single platform. You'll create dynamic transformation pipelines, interactive low-latency tables, dbt analytical models, a Cortex Agent that answers questions across structured and unstructured data, row-level security that works transparently through AI, and an MCP server that exposes your agent to external clients.

Prerequisites

  • Access to a Snowflake account
  • Python 3.8+ installed locally
  • Git installed locally
  • Basic familiarity with SQL and command-line tools

Setup

Install Snowflake CLI

The Snowflake CLI (snow) lets you run SQL, deploy apps, and manage Snowflake objects from your terminal.

macOS (using Homebrew):

If you don't have Homebrew installed, first install it by opening Terminal and running:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then install Snowflake CLI:

brew install snowflake-cli

Windows:

pip install snowflake-cli

Linux:

pip install snowflake-cli

Verify the installation:

snow --version

You should see output like Snowflake CLI version: 3.x.x.

Install Snowflake CoCo

Snowflake CoCo is an AI-powered coding assistant that runs in your terminal. It helps you write SQL, build pipelines, deploy apps, and explore your data using natural language prompts.

macOS (using Homebrew):

brew install cortex

Windows / Linux:

pip install cortex

Verify the installation:

cortex --version

Configure Snowflake Connection

Before you can run any commands against Snowflake, you need to configure a connection. This tells the CLI which Snowflake account to connect to and how to authenticate.

Run the interactive connection wizard:

snow connection add

You'll be prompted for the following values (enter them one at a time):

Prompt What to enter Example
Connection name A short name for this connection hol
Account identifier Your Snowflake account URL (without .snowflakecomputing.com) myorg-myaccount
User Your Snowflake username jsmith
Password Your Snowflake password (hidden)
Role ACCOUNTADMIN ACCOUNTADMIN
Warehouse Leave blank (will be created by setup) (leave empty)
Database Leave blank (will be created by setup) (leave empty)

Tip: Your account identifier is the part before .snowflakecomputing.com in your Snowflake URL. For example, if you log in at https://myorg-myaccount.snowflakecomputing.com, your account identifier is myorg-myaccount.

Test that your connection works:

snow connection test -c hol

You should see Status: OK.

Clone the Lab Repository

git clone https://github.com/Snowflake-Labs/automated-intelligence-dev-day-2026-hol.git
cd automated-intelligence-dev-day-2026-hol

Run Infrastructure Setup

Launch Snowflake CoCo and verify your connection:

cortex

What to expect: CoCo will start an interactive session in your terminal. You'll see your active connection, role, and warehouse displayed. You can type natural language prompts and CoCo will translate them into SQL or actions.

Then run the core infrastructure script (this takes ~10-15 minutes):

snow sql -f setup.sql -c hol

This creates the database, schemas, warehouses, tables, Dynamic Tables pipeline, Interactive Tables, Cortex Search Services, Semantic View, seed data (10M orders, 25M order items, 2M customers), and Row Access Policy.


Gen2 Warehouse: Optima Indexing

Demonstrate Gen2's Optima Indexing — automatic partition pruning without explicit clustering keys:

Prompt CoCo:

"Run a point lookup for customer_id 5000 on the Gen2 warehouse"

Open the query profile in Snowsight to see partition pruning — only a fraction of partitions scanned despite no explicit clustering key. This is Gen2's Optima Indexing in action.


Dynamic Tables Pipeline

Prompt CoCo:

"Show me the Dynamic Tables pipeline status — names, target lag, last refresh time, and row counts for each tier"

CoCo displays the 3-tier pipeline:

  • Tier 1 (1-min lag): enriched_orders (10M rows), enriched_order_items (25M rows)
  • Tier 2 (DOWNSTREAM): fact_orders (25M rows)
  • Tier 3 (DOWNSTREAM): daily_business_metrics (118 rows), product_performance_metrics (4 rows)

Note: Row counts shown are for the default data_scale = '10M'. If you chose '50M' in setup.sql, expect ~50M enriched_orders, ~161M order_items, and 365 daily metrics rows.

Explore Results

Ask CoCo:

"Show me a sample of the daily business metrics — top 5 days by revenue"

Expected: Top-5 days are in September 2025 (back-to-season peak), each with ~$183M revenue and ~117K orders.


Interactive Tables

Point Lookups

Run queries in Snowsight to observe sub-second latency:

USE WAREHOUSE hol_interactive_wh;
ALTER SESSION SET USE_CACHED_RESULT = FALSE;

-- Point lookup by customer ID
SELECT * FROM dash_automated_intelligence_db.interactive.customer_order_analytics
WHERE customer_id = 1;

-- Point lookup by order ID
-- Run: SELECT order_id FROM dash_automated_intelligence_db.interactive.order_lookup LIMIT 5; to get UUIDs to use
SELECT * FROM dash_automated_intelligence_db.interactive.order_lookup
WHERE order_id = '<any-order-uuid>';

Concurrency Load Test

Prompt CoCo:

"Run the interactive tables load test at interactive/load_test.py"

This fires 200 concurrent sessions (1000 queries total) against both Interactive and Standard warehouses, then compares P50/P90/P99 latencies. You should see notably lower latency and higher throughput on the Interactive warehouse.

Run the load test a second time to observe the effect of warm caches. Results may vary depending on account, region, and data scale.


Data Quality

The setup script injected ~200 NULL values into orders.total_amount and order_items.quantity, plus ~150 NULLs into order_items.product_name. DMFs detect the first two — but there's a gap.

Discover the Gap

Prompt CoCo:

"Check the data quality monitoring results and show me which columns have NULL violations"

CoCo shows that TOTAL_AMOUNT (200 NULLs) and QUANTITY (200 NULLs) have violations — but product_name NULLs are going undetected.

"Are there any NULL values in order_items.product_name? Is that column being monitored?"

CoCo finds ~150 NULLs and reveals the DMF is mis-attached to product_category instead of product_name.

Fix the Coverage

"Fix the DMF — remove the NULL check from product_category and add it to product_name instead"

ALTER TABLE order_items DROP DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT ON (product_category);
ALTER TABLE order_items ADD DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT ON (product_name);

This demonstrates the real-world workflow: monitor, discover gaps, fix coverage.


dbt Analytics

Prerequisites: dbt-core and dbt-snowflake must be installed (pip install dbt-snowflake). CoCo will handle this for you if not already installed.

Prompt CoCo:

"Install dbt dependencies and build all models in the dbt-analytics project"

CoCo runs dbt deps then dbt build to create all staging views and mart tables (9+ models). CoCo automatically injects your active Snowflake connection into the dbt profile — no manual configuration needed.

Expected output: 71 tests pass, 1 warning (the source_not_null_raw_orders_total_amount test detects the 200 NULLs we injected for the Data Quality exercise — this is working as designed).

Explore Results

"Show me the customer lifetime value segments — how many customers are in each value tier?"


CoCo Custom Skill

Create a reusable skill that automates table profiling:

Prompt CoCo:

"Create a custom CoCo skill called 'profile-table' that takes a table name, counts rows, checks for NULL columns, shows distinct value counts, and flags potential data quality issues"

CoCo creates .cortex/skills/profile-table/SKILL.md with the skill definition, triggers, and step-by-step instructions.

Note: After creating the skill, restart CoCo (type /quit then cortex) for the new skill to become active.

Test It

"$profile-table DASH_AUTOMATED_INTELLIGENCE_DB.RAW.ORDERS"

This demonstrates how teams package repeatable workflows as shareable CoCo skills.


Snowflake CoWork

Create the Agent

Prompt CoCo:

"Run snowflake-cowork/create_agent.sql to create the Business Insights Agent"

Test Agent Routing

Open the Snowflake CoWork interface in Snowsight: navigate to AI & ML → CoWork (or search "CoWork" in the global search bar). Select the BUSINESS_INSIGHTS_AGENT agent. Then try each question to demonstrate different tool routing:

Question Tools Used
"Show me monthly revenue trend from June to September 2025" Cortex Analyst (text-to-SQL)
"Which month had the lowest revenue, and what do customer reviews say about that period?" Cortex Analyst + Agentic Search
"Find reviews mentioning wrong size with a rating below 3" Agentic Search (filtered)
"Why are customers returning ski boots?" Agentic Search (reviews + tickets)
"What is our total revenue and customer count by state?" Cortex Analyst (text-to-SQL)
"What are the top complaint themes in support tickets?" Agentic Search (filter + AI_AGG)
"How many reviews mention sizing issues, and which products are most affected?" Agentic Search (search + breakdown)

This is the capstone moment — the agent routes across structured data (text-to-SQL) and unstructured data (Cortex Search) to answer "what happened" and "why."


Security and Governance

The Row Access Policy and WEST_COAST_MANAGER role were created by setup.sql. Demonstrate how row-level security transparently filters data based on role:

As ACCOUNTADMIN (full access):

USE ROLE ACCOUNTADMIN;
USE WAREHOUSE HOL_WH;
SELECT c.state, SUM(o.total_amount) AS total_revenue, COUNT(DISTINCT c.customer_id) AS customer_count
FROM dash_automated_intelligence_db.raw.orders o
JOIN dash_automated_intelligence_db.raw.customers c ON o.customer_id = c.customer_id
GROUP BY c.state ORDER BY total_revenue DESC;

Result: all 10 states visible.

As WEST_COAST_MANAGER (restricted):

USE ROLE WEST_COAST_MANAGER;
USE WAREHOUSE HOL_WH;
SELECT c.state, SUM(o.total_amount) AS total_revenue, COUNT(DISTINCT c.customer_id) AS customer_count
FROM dash_automated_intelligence_db.raw.orders o
JOIN dash_automated_intelligence_db.raw.customers c ON o.customer_id = c.customer_id
GROUP BY c.state ORDER BY total_revenue DESC;

Result: only CA, OR, WA appear — the Row Access Policy transparently filters data.

Key insight: Same query, same tables — different results based on who's asking. Row-level security enforces data boundaries without changing application logic.

Verify Through CoWork

Switch to the WEST_COAST_MANAGER role in Snowsight, then open CoWork and ask the agent:

"What is our total revenue and customer count by state?"

The agent returns results for only CA, OR, and WA — the Row Access Policy filters data transparently, even through AI-generated SQL.


Streamlit Dashboard

Prerequisites: The dbt models must be built first (previous section). The dashboard queries DBT_ANALYTICS and DBT_STAGING tables.

Prompt CoCo:

"Deploy the Streamlit dashboard to Snowflake"

CoCo runs snow streamlit deploy from the streamlit-dashboard/ directory. Once deployed, open the app URL in Snowsight to explore:

  • Summary — Revenue KPIs, order trends, customer counts
  • Customer & Product Analytics — Lifetime value segments, product performance
  • Pipeline Health — Dynamic Tables refresh status, data freshness monitoring

Agent Evaluation

The evaluation dataset (7 questions + ground truth) was created by setup.sql. Run the evaluation in Snowsight:

Run via Snowsight UI

  1. Switch to the ACCOUNTADMIN role in Snowsight (top-left role selector)
  2. Navigate to AI and ML > Agents > BUSINESS_INSIGHTS_AGENT > Evaluations tab
  3. Click New evaluation run, name it (e.g. hol-eval-run-1), click Next
  4. Select Create new dataset from table
  5. Under Source table, set Database and schema to DASH_AUTOMATED_INTELLIGENCE_DB.SEMANTIC, then select AGENT_EVALUATION_DATA
  6. Under New dataset location, keep DASH_AUTOMATED_INTELLIGENCE_DB.SEMANTIC
  7. Set Dataset name: hol_eval_dataset
  8. Click Next
  9. Under Define metrics, confirm Input query = INPUT_QUERY
  10. Toggle on Answer Correctness, set Expected answer = GROUND_TRUTH
  11. Toggle on Logical Consistency
  12. Click Create — evaluation starts automatically (~3 min)

Interpret Results

  • Answer Correctness — Did the agent's response match ground truth? Scored 0-1 per question.
  • Logical Consistency — Were planning steps, tool calls, and response internally consistent? (Reference-free.)
  • Per-question drill-down — Select any row to see the full thread: planning, tool invocations, response generation.

Improve Scores (Stretch)

If questions score low on logical consistency:

  1. Click a low-scoring row and view Thread details
  2. Look for vague reasoning about tool selection in the Planning step
  3. Update the agent's instructions to be more explicit
  4. Recreate the agent and re-run the evaluation

MCP Server

Expose the Business Insights Agent as a managed MCP server:

Prompt CoCo:

"Create a Snowflake-managed MCP server that exposes our Business Insights Agent, semantic view, and customer feedback search as tools"

CoCo creates the MCP server:

Note: CoCo generates tool names (like revenue-analytics, customer-feedback-search) based on your prompt. Your names may differ — what matters is that the type and identifier point to the correct objects.

CREATE MCP SERVER business_insights_mcp
  FROM SPECIFICATION $$
    tools:
      - name: "business-insights-agent"
        type: "CORTEX_AGENT_RUN"
        identifier: "DASH_AUTOMATED_INTELLIGENCE_DB.SEMANTIC.BUSINESS_INSIGHTS_AGENT"
        description: "AI agent that answers business questions using structured data and customer feedback"
        title: "Business Insights Agent"

      - name: "revenue-analytics"
        type: "CORTEX_ANALYST_MESSAGE"
        identifier: "DASH_AUTOMATED_INTELLIGENCE_DB.SEMANTIC.BUSINESS_ANALYTICS_SEMANTIC"
        description: "Text-to-SQL for revenue, orders, customers, and product metrics"
        title: "Revenue Analytics"

      - name: "customer-feedback-search"
        type: "CORTEX_SEARCH_SERVICE_QUERY"
        identifier: "DASH_AUTOMATED_INTELLIGENCE_DB.RAW.CUSTOMER_FEEDBACK_SEARCH"
        description: "Search across product reviews and support tickets"
        title: "Customer Feedback Search"
  $$;

Connect from CoCo

cortex mcp add business-insights https://<account_url>/api/v2/databases/DASH_AUTOMATED_INTELLIGENCE_DB/schemas/SEMANTIC/mcp-servers/BUSINESS-INSIGHTS-MCP --type http

Now any MCP-compatible client (CoCo, Claude Desktop, custom apps) can discover and call these tools via the standard MCP protocol.


Optional: Iceberg V3 Features

Note: This section is optional. It demonstrates Iceberg V3 capabilities (deletion vectors, default values) using CoCo-generated SQL. No other sections depend on it.

Note: CoCo may take a few attempts to generate correct SQL for Iceberg V3 features (these are newer APIs). If you see "error executing SQL," let CoCo retry — it will self-correct and the end result will work.

Create a Managed Iceberg Table

Prompt CoCo:

"Create a managed Iceberg table from RAW.ORDERS with clustering by year and month, then query it to show partition pruning"

CoCo creates the table with CATALOG='SNOWFLAKE' (no external volume needed) and demonstrates partition pruning on filtered queries.

Explore V3: Deletion Vectors

"Create an Iceberg V3 table from RAW.ORDERS (ICEBERG_VERSION=3) with merge-on-read enabled, insert 1000 rows, then update 10 of them to demonstrate deletion vectors"

CoCo creates a V3 table with ENABLE_ICEBERG_MERGE_ON_READ = TRUE, inserts data, then runs an UPDATE that uses deletion vectors instead of full file rewrites.

Explore V3: Default Values

"Add a new column 'priority' with default value 'STANDARD' to the V3 table and show that existing rows get the default without a backfill"

This demonstrates V3 schema evolution without rewriting data files.


Optional: Streaming Ingestion

Note: This section is optional. The setup.sql script already loads all 10M orders directly. This section demonstrates how you would stream data in production using the Snowpipe Streaming Python SDK.

Generate RSA Key Pair

Generate keys for Snowpipe Streaming authentication:

# Generate private key (unencrypted PEM)
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt

# Generate public key
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

# Upload public key to your Snowflake user (replace <your-username>)
snow sql -q "ALTER USER <your-username> SET RSA_PUBLIC_KEY='$(grep -v -- '-----' rsa_key.pub | tr -d '\n')'" -c hol

# Verify
snow sql -q "DESC USER <your-username>" -c hol | grep RSA_PUBLIC_KEY_FP

Stream Data

cd snowpipe-streaming-python
pip install -r requirements.txt

# Copy and configure profile
cp profile.json.template profile.json

Edit profile.json and set your account, user, private_key (contents of rsa_key.p8), and role.

# Stream 10,000 orders
python src/automated_intelligence_streaming.py 10000

Verify Data Landed

SELECT COUNT(*) FROM dash_automated_intelligence_db.staging.orders_staging;
SELECT COUNT(*) FROM dash_automated_intelligence_db.staging.order_items_staging;

You should see 10,000 orders and ~50,000 order items in staging.

Merge into Production

Use CoCo to merge the streamed data:

"Switch to the Gen2 warehouse, check how many rows are in staging, then merge them into RAW and show me the results"


Cleanup

To remove all objects created during this lab:

snow sql -f cleanup.sql -c hol

Resources

Documentation:


License

This project is licensed under the Apache License, Version 2.0.

About

75-minute hands-on lab: Automated Intelligence with Snowflake

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors