Skip to content

Commit efb0470

Browse files
committed
Promote manage-external-lineage with v1.1.1 audit pass
Re-staged with v1.1.1 holistic prompt that adds stopping-point markers, correct INSTRUCTIONS.md sub-flow cross-refs, and drops invalid tool snowflake_object_search.
1 parent be318e8 commit efb0470

10 files changed

Lines changed: 1211 additions & 0 deletions

File tree

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Snowflake Skills License
2+
3+
© 2026 Snowflake Inc. All rights reserved.
4+
5+
LICENSE: Use of these materials (including all code, prompts, assets, files, and other components of these skills (collectively, “Skills”)) is governed by your agreement with Snowflake for the Service. If no separate agreement exists, use is governed by Snowflake’s Terms of Service (available at: https://www.snowflake.com/en/legal/terms-of-service/).
6+
7+
Your applicable agreement is referred to as the "Agreement." "Service" is as defined in the Agreement.
8+
9+
ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the contrary, you may not:
10+
11+
* Extract from the Service or retain copies of the Skills outside use with the Service;
12+
* Reproduce or copy the Skills , except for temporary copies created automatically during authorized use of the Service;
13+
* Create derivative works based on the Skills;
14+
* Distribute, sublicense, or transfer the Skills to any third party;
15+
* Make, offer to sell, sell, or import any inventions embodied in the Skills; nor,
16+
* Reverse engineer, decompile, or disassemble the Skills.
17+
18+
The receipt, viewing, or possession of the Skills does not convey or imply any license or right beyond those expressly granted above.
19+
20+
Snowflake retains all rights, title, and interest in the Skills, including all copyrights, trademarks, patents, and all other applicable intellectual property rights.
21+
22+
THE SKILLS ARE PROVIDED “AS IS,” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SKILLS OR THE USE OR OTHER DEALINGS IN THE SKILLS.
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
name: manage-external-lineage
3+
title: Manage External Lineage
4+
summary: Send and delete OpenLineage COMPLETE events to connect external systems to Snowflake's lineage graph.
5+
description: |
6+
Use when you need to surface external systems (Postgres, MySQL, S3, Kafka, etc.) in Snowflake's lineage view, send OpenLineage COMPLETE events via REST, or remove existing external lineage links. Triggers: external lineage, openlineage event, send lineage, establish lineage, delete lineage, connect postgres to snowflake lineage, connect mysql to snowflake lineage, connect s3 to snowflake lineage, document data pipeline, lineage api, ingest lineage.
7+
tools:
8+
- snowflake_sql_execute
9+
- Bash
10+
- Read
11+
- Write
12+
- Edit
13+
prompt: Create a COMPLETE external lineage event from postgres://prod-db:5432 public.orders into MYDB.PUBLIC.ORDERS.
14+
language: en
15+
status: Published
16+
author: Snowflake Solutions Team
17+
type: snowflake
18+
---
19+
20+
# Manage External Lineage
21+
22+
## Overview
23+
24+
Snowflake's lineage graph natively tracks objects inside the account. To show data flowing in from (or out to) external systems — Postgres, MySQL, S3, Kafka, DB2, Trino, etc. — you POST OpenLineage `COMPLETE` events to the external lineage REST endpoint. Once accepted, those external nodes appear in Snowsight under **Catalog → Database Explorer → [Table] → Lineage**.
25+
26+
This skill helps you:
27+
28+
- Build a valid OpenLineage payload
29+
- Send it using your existing Snowflake connection (no token juggling)
30+
- Delete external lineage relationships when sources are retired
31+
32+
## Prerequisites
33+
34+
- `INGEST LINEAGE` privilege on the account (and `DELETE LINEAGE` for deletes)
35+
- An active `cortex` connection, OR a Programmatic Access Token (PAT)
36+
- Python deps: `requests`, `snowflake-connector-python`
37+
38+
## Workflow
39+
40+
### Step 1: Verify privileges and target
41+
42+
```sql
43+
SHOW GRANTS ON ACCOUNT;
44+
-- Look for INGEST LINEAGE granted to your role
45+
DESCRIBE TABLE <db>.<schema>.<table>;
46+
```
47+
48+
If missing: `GRANT INGEST LINEAGE ON ACCOUNT TO ROLE <role>;`
49+
50+
### Step 2: Build the payload
51+
52+
```json
53+
{
54+
"eventType": "COMPLETE",
55+
"eventTime": "2026-02-20T19:00:00.000Z",
56+
"job": {"namespace": "external-etl", "name": "orders_pipeline"},
57+
"run": {"runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"},
58+
"producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
59+
"schemaURL": "https://openlineage.io/spec/0-0-1/OpenLineage.json",
60+
"inputs": [
61+
{"namespace": "postgres://prod-db:5432", "name": "public.orders"}
62+
],
63+
"outputs": [
64+
{"namespace": "snowflake://<ORG>-<ACCOUNT>", "name": "<DB>.<SCHEMA>.<TABLE>"}
65+
]
66+
}
67+
```
68+
69+
Rules:
70+
- `eventType` must be `COMPLETE` — other types are ignored.
71+
- `inputs` and `outputs` must mix Snowflake and external objects.
72+
- Do NOT include `facets` for external objects — they render as "External Node" by default.
73+
- See `namespace_conventions.md` for per-source namespace formats.
74+
75+
⚠️ STOPPING POINT: Show the payload to the user and wait for confirmation before sending.
76+
77+
### Step 3: Send the event
78+
79+
Preferred — use your Cortex Code connection:
80+
81+
```bash
82+
SNOWFLAKE_CONNECTION_NAME=<connection> python <SKILL_DIR>/send_lineage_via_connection.py -p payload.json
83+
```
84+
85+
Or generate + send in one go:
86+
87+
```bash
88+
<SKILL_DIR>/generate_payload.sh -a <ACCOUNT> -o <DB>.<SCHEMA>.<TABLE> \
89+
-i 'postgres://host:5432::db.schema.source' -f /tmp/payload.json
90+
SNOWFLAKE_CONNECTION_NAME=<connection> python <SKILL_DIR>/send_lineage_via_connection.py -p /tmp/payload.json
91+
```
92+
93+
PAT alternative: `<SKILL_DIR>/send_lineage.sh -a <ACCOUNT> -t token.txt -p payload.json`
94+
95+
### Step 4: Verify
96+
97+
Open Snowsight → Catalog → Database Explorer → your table → Lineage tab. Allow 1–2 minutes for propagation.
98+
99+
### Step 5: Delete external lineage (optional)
100+
101+
⚠️ STOPPING POINT: Confirm the source/target before sending DELETE. The endpoint always returns HTTP 200 — verify removal in Snowsight.
102+
103+
```bash
104+
curl --globoff -X DELETE \
105+
-H "Authorization: Bearer $API_KEY" \
106+
"https://<ACCOUNT>.snowflakecomputing.com/api/v2/lineage/external-lineage?sourceNamespace=<NS>&sourceName=<NAME>&sourceDatasetType=External%20Node&targetName=<DB>.<SCHEMA>.<TABLE>&targetDatasetType=TABLE"
107+
```
108+
109+
Delete scopes:
110+
- Source + target → break that one link
111+
- Source only → break all downstream from that source
112+
- Target only → strip the target from the graph
113+
114+
## Common Mistakes
115+
116+
- Using `eventType` other than `COMPLETE` — silently dropped.
117+
- Underscores in the account URL — use hyphens (`ORG-ACCOUNT`).
118+
- Forgetting `--globoff` on curl — it mangles `External%20Node`.
119+
- Including `facets` on external nodes — breaks the "External Node" rendering.
120+
- Treating DELETE's `200` as success — always verify in Snowsight.
121+
- Mismatched delete direction — if external was the INPUT on create, it must be the source on delete.
122+
- Case-insensitive matching — namespaces and names are case-sensitive.
123+
124+
## Limitations
125+
126+
- 1-year retention, 10,000 events per account
127+
- 1000-char max FQN
128+
- No column-level lineage
129+
- External lineage isn't returned by `GET_LINEAGE`
130+
131+
## Stopping Points
132+
133+
- Step 2 — wait for payload review before sending
134+
- Step 5 — confirm targets before DELETE
135+
- Step 5 — verify in Snowsight (HTTP 200 does not confirm deletion)
136+
137+
## Reference files
138+
139+
- `namespace_conventions.md` — namespace formats per source
140+
- `token_setup.md` — creating a PAT
141+
- `troubleshooting.md` — 401 / 403 / 404 fixes
142+
- `send_lineage_via_connection.py` — recommended sender
143+
- `send_lineage.sh` — PAT-based sender
144+
- `generate_payload.sh` — payload builder
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
#!/bin/bash
2+
set -e
3+
4+
usage() {
5+
echo "Usage: $0 -a ACCOUNT -o OUTPUT_TABLE [-i INPUT...]"
6+
echo ""
7+
echo "Generate OpenLineage payload JSON for external lineage"
8+
echo ""
9+
echo "Required:"
10+
echo " -a ACCOUNT Snowflake account (ORG-ACCOUNT format)"
11+
echo " -o OUTPUT Output table (DATABASE.SCHEMA.TABLE)"
12+
echo ""
13+
echo "Optional:"
14+
echo " -i INPUT Input source (namespace::name format, can repeat)"
15+
echo " -j JOB_NAME Job name (default: auto-generated)"
16+
echo " -n JOB_NAMESPACE Job namespace (default: external-etl)"
17+
echo " -f OUTPUT_FILE Output file (default: stdout)"
18+
echo " -h Show this help"
19+
echo ""
20+
echo "Examples:"
21+
echo " # Single input"
22+
echo " $0 -a MYORG-MYACCOUNT -o DB.SCHEMA.TABLE \\"
23+
echo " -i 'postgres://host:5432::db.schema.table'"
24+
echo ""
25+
echo " # Multiple inputs"
26+
echo " $0 -a MYORG-MYACCOUNT -o DB.SCHEMA.TABLE \\"
27+
echo " -i 'postgres://host:5432::public.users' \\"
28+
echo " -i 's3://bucket::path/to/file.parquet' \\"
29+
echo " -f payload.json"
30+
exit 1
31+
}
32+
33+
INPUTS=()
34+
JOB_NAMESPACE="external-etl"
35+
JOB_NAME=""
36+
OUTPUT_FILE=""
37+
38+
while getopts "a:o:i:j:n:f:h" opt; do
39+
case $opt in
40+
a) ACCOUNT="$OPTARG" ;;
41+
o) OUTPUT_TABLE="$OPTARG" ;;
42+
i) INPUTS+=("$OPTARG") ;;
43+
j) JOB_NAME="$OPTARG" ;;
44+
n) JOB_NAMESPACE="$OPTARG" ;;
45+
f) OUTPUT_FILE="$OPTARG" ;;
46+
h) usage ;;
47+
*) usage ;;
48+
esac
49+
done
50+
51+
if [[ -z "$ACCOUNT" || -z "$OUTPUT_TABLE" ]]; then
52+
echo "Error: -a ACCOUNT and -o OUTPUT_TABLE are required"
53+
usage
54+
fi
55+
56+
if [[ ${#INPUTS[@]} -eq 0 ]]; then
57+
echo "Error: At least one -i INPUT is required"
58+
usage
59+
fi
60+
61+
if [[ -z "$JOB_NAME" ]]; then
62+
TABLE_NAME=$(echo "$OUTPUT_TABLE" | tr '.' '_' | tr '[:upper:]' '[:lower:]')
63+
JOB_NAME="${TABLE_NAME}_pipeline"
64+
fi
65+
66+
RUN_ID=$(uuidgen | tr '[:upper:]' '[:lower:]')
67+
EVENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S.000Z")
68+
69+
INPUT_JSON=""
70+
for input in "${INPUTS[@]}"; do
71+
NAMESPACE=$(echo "$input" | cut -d':' -f1-3)
72+
NAME=$(echo "$input" | cut -d':' -f4-)
73+
74+
if [[ -n "$INPUT_JSON" ]]; then
75+
INPUT_JSON="$INPUT_JSON,"
76+
fi
77+
INPUT_JSON="$INPUT_JSON
78+
{\"namespace\": \"$NAMESPACE\", \"name\": \"$NAME\"}"
79+
done
80+
81+
PAYLOAD=$(cat <<EOF
82+
{
83+
"eventType": "COMPLETE",
84+
"eventTime": "$EVENT_TIME",
85+
"job": {
86+
"namespace": "$JOB_NAMESPACE",
87+
"name": "$JOB_NAME"
88+
},
89+
"run": {
90+
"runId": "$RUN_ID"
91+
},
92+
"producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
93+
"schemaURL": "https://openlineage.io/spec/0-0-1/OpenLineage.json",
94+
"inputs": [$INPUT_JSON
95+
],
96+
"outputs": [
97+
{
98+
"namespace": "snowflake://$ACCOUNT",
99+
"name": "$OUTPUT_TABLE"
100+
}
101+
]
102+
}
103+
EOF
104+
)
105+
106+
if [[ -n "$OUTPUT_FILE" ]]; then
107+
echo "$PAYLOAD" > "$OUTPUT_FILE"
108+
echo "Payload written to: $OUTPUT_FILE"
109+
else
110+
echo "$PAYLOAD"
111+
fi
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# OpenLineage Namespace Naming Conventions
2+
3+
Datasets and jobs have their own namespaces. Dataset namespaces are derived from datasources, job namespaces from schedulers.
4+
5+
## Dataset Namespace & Name Format
6+
7+
| Source | Type | Namespace Format | Name Format |
8+
|--------|------|------------------|-------------|
9+
| **Snowflake** | Warehouse | `snowflake://{org name}-{account name}` or `snowflake://{account-locator}(.{region})(.{cloud})` | `{database}.{schema}.{table}` |
10+
| **Postgres** | Warehouse | `postgres://{host}:{port}` | `{database}.{schema}.{table}` |
11+
| **MySQL** | Warehouse | `mysql://{host}:{port}` | `{database}.{table}` |
12+
| **MSSQL** | Warehouse | `mssql://{host}:{port}` | `{database}.{schema}.{table}` |
13+
| **Oracle** | Warehouse | `oracle://{host}:{port}` | `{serviceName}.{schema}.{table}` |
14+
| **BigQuery** | Warehouse | `bigquery` | `{project id}.{dataset name}.{table name}` |
15+
| **Redshift** | Warehouse | `redshift://{cluster_identifier}.{region_name}:{port}` | `{database}.{schema}.{table}` |
16+
| **Athena** | Warehouse | `awsathena://athena.{region_name}.amazonaws.com` | `{catalog}.{database}.{table}` |
17+
| **AWS Glue** | Data catalog | `arn:aws:glue:{region}:{account id}` | `table/{database name}/{table name}` |
18+
| **Hive** | Warehouse | `hive://{host}:{port}` | `{database}.{table}` |
19+
| **Trino** | Warehouse | `trino://{host}:{port}` | `{catalog}.{schema}.{table}` |
20+
| **Cassandra** | Warehouse | `cassandra://{host}:{port}` | `{keyspace}.{table}` |
21+
| **DB2** | Warehouse | `db2://{host}:{port}` | `{database}.{schema}.{table}` |
22+
| **Teradata** | Warehouse | `teradata://{host}:{port}` | `{database}.{table}` |
23+
| **Azure Synapse** | Warehouse | `sqlserver://{host}:{port}` | `{schema}.{table}` |
24+
| **Azure Cosmos DB** | Warehouse | `azurecosmos://{host}/dbs/{database}` | `colls/{table}` |
25+
| **Azure Data Explorer** | Warehouse | `azurekusto://{host}.kusto.windows.net` | `{database}/{table}` |
26+
| **Spanner** | Warehouse | `spanner://{projectId}:{instanceId}` | `{database}.{schema}.{table}` |
27+
| **S3** | Blob Storage | `s3://{bucket name}` | `{object key}` |
28+
| **GCS** | Blob Storage | `gs://{bucket name}` | `{object key}` |
29+
| **ABFSS** | Data Lake | `abfss://{container}@{service}.dfs.core.windows.net` | `{path}` |
30+
| **WASBS** | Blob Storage | `wasbs://{container}@{service}.dfs.core.windows.net` | `{object key}` |
31+
| **HDFS** | Distributed FS | `hdfs://{namenode host}:{namenode port}` | `{path}` |
32+
| **DBFS** | Distributed FS | `dbfs://{workspace name}` | `{path}` |
33+
| **Kafka** | Event Streaming | `kafka://{bootstrap server host}:{port}` | `{topic}` |
34+
| **PubSub** | Event Streaming | `pubsub` | `topic:{projectId}:{topicId}` or `subscription:{projectId}:{subscriptionId}` |
35+
| **Local File** | File System | `file` | `{path}` |
36+
| **Remote File** | File System | `file://{host}` | `{path}` |
37+
38+
## Snowflake Namespace - Important Notes
39+
40+
Snowflake has two namespace formats:
41+
1. **Preferred:** `snowflake://{org name}-{account name}` (e.g., `snowflake://MYORG-MYACCOUNT`)
42+
2. **Legacy:** `snowflake://{account-locator}.{region}.{cloud}` (e.g., `snowflake://xy12345.us-east-1.aws`)
43+
44+
**Warning:** Using legacy account locator format creates dataset IDs that won't match IDs created with the org-account format. If you switch formats later, existing lineage nodes won't connect to new ones. Use the org-account format when possible.
45+
46+
## Job Namespace & Name Format
47+
48+
| Scheduler | Name Format | Example |
49+
|-----------|-------------|---------|
50+
| Airflow task | `{dag_id}.{task_id}` | `orders_etl.count_orders` |
51+
| Spark job | `{appName}.{command}.{table}` | `my_app.execute_insert_into_hive_table.mydb_mytable` |
52+
| SQL | `{schema}.{table}` | `gx.validate_datasets` |
53+
| Debezium | `{topic.prefix}.{taskId}` | `inventory.0` |
54+
55+
## Run ID Format
56+
Runs use client-generated UUIDs (e.g., `f47ac10b-58cc-4372-a567-0e02b2c3d479`)
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
[project]
2+
name = "manage-external-lineage"
3+
version = "0.1.0"
4+
description = "Send and manage OpenLineage external lineage events for Snowflake"
5+
requires-python = ">=3.11"
6+
dependencies = [
7+
"requests>=2.32.0",
8+
"snowflake-connector-python>=3.6.0",
9+
]

0 commit comments

Comments
 (0)