Skip to content

Commit 97a0a17

Browse files
committed
Stage manage-external-lineage from Snowflake-Solutions
1 parent be318e8 commit 97a0a17

10 files changed

Lines changed: 1210 additions & 0 deletions

File tree

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Snowflake Skills License
2+
3+
© 2026 Snowflake Inc. All rights reserved.
4+
5+
LICENSE: Use of these materials (including all code, prompts, assets, files, and other components of these skills (collectively, “Skills”)) is governed by your agreement with Snowflake for the Service. If no separate agreement exists, use is governed by Snowflake’s Terms of Service (available at: https://www.snowflake.com/en/legal/terms-of-service/).
6+
7+
Your applicable agreement is referred to as the "Agreement." "Service" is as defined in the Agreement.
8+
9+
ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the contrary, you may not:
10+
11+
* Extract from the Service or retain copies of the Skills outside use with the Service;
12+
* Reproduce or copy the Skills , except for temporary copies created automatically during authorized use of the Service;
13+
* Create derivative works based on the Skills;
14+
* Distribute, sublicense, or transfer the Skills to any third party;
15+
* Make, offer to sell, sell, or import any inventions embodied in the Skills; nor,
16+
* Reverse engineer, decompile, or disassemble the Skills.
17+
18+
The receipt, viewing, or possession of the Skills does not convey or imply any license or right beyond those expressly granted above.
19+
20+
Snowflake retains all rights, title, and interest in the Skills, including all copyrights, trademarks, patents, and all other applicable intellectual property rights.
21+
22+
THE SKILLS ARE PROVIDED “AS IS,” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SKILLS OR THE USE OR OTHER DEALINGS IN THE SKILLS.
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
---
2+
name: manage-external-lineage
3+
title: Manage External Lineage
4+
summary: Create and delete OpenLineage events to connect external systems to Snowflake's lineage graph.
5+
description: "Use when you need to connect external data sources (Postgres, MySQL, S3, Kafka, etc.) to Snowflake's lineage graph via the OpenLineage REST API, or when you need to delete external lineage relationships. Triggers: external lineage, openlineage event, send lineage, establish lineage, delete lineage, create lineage event, connect postgres to snowflake lineage, connect mysql to snowflake lineage, connect s3 to snowflake lineage, track data flow, document data pipeline, lineage api, ingest lineage."
6+
tools:
7+
- snowflake_sql_execute
8+
- snowflake_object_search
9+
- Bash
10+
- Read
11+
- Write
12+
- Edit
13+
prompt: Create an external lineage event linking my Postgres source table to a Snowflake table.
14+
language: en
15+
status: Published
16+
author: Snowflake Solutions Team
17+
type: snowflake
18+
---
19+
20+
# Manage External Lineage
21+
22+
## Overview
23+
24+
This skill creates and deletes OpenLineage `COMPLETE` events through Snowflake's external lineage REST API so external systems (Postgres, MySQL, S3, Kafka, DB2, Trino, etc.) appear in Snowsight's lineage graph alongside Snowflake objects. Use it to document cross-platform pipelines, show upstream sources feeding Snowflake tables, or show downstream destinations Snowflake feeds.
25+
26+
## Prerequisites
27+
28+
- `INGEST LINEAGE` privilege on the account (and `DELETE LINEAGE` for deletes).
29+
- Active Snowflake connection in your `cortex` session, OR a Programmatic Access Token (PAT) / JWT.
30+
- Python deps: `requests`, `snowflake-connector-python`.
31+
32+
## Workflow
33+
34+
### 1. Verify privileges
35+
36+
```sql
37+
SHOW GRANTS ON ACCOUNT;
38+
-- GRANT INGEST LINEAGE ON ACCOUNT TO ROLE <role_name>;
39+
```
40+
41+
### 2. Verify the Snowflake target exists
42+
43+
```sql
44+
DESCRIBE TABLE <database>.<schema>.<table_name>;
45+
```
46+
47+
### 3. Build the payload
48+
49+
```json
50+
{
51+
"eventType": "COMPLETE",
52+
"eventTime": "<ISO8601>",
53+
"job": {"namespace": "<job_namespace>", "name": "<job_name>"},
54+
"run": {"runId": "<UUID>"},
55+
"producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
56+
"schemaURL": "https://openlineage.io/spec/0-0-1/OpenLineage.json",
57+
"inputs": [{"namespace": "<source_ns>", "name": "<source_object>"}],
58+
"outputs": [{"namespace": "snowflake://<ORG>-<ACCOUNT>", "name": "<DB>.<SCHEMA>.<TABLE>"}]
59+
}
60+
```
61+
62+
Stop and show the payload to the user before sending.
63+
64+
### 4. Send the event
65+
66+
Recommended (uses your active `cortex` connection, no token wrangling):
67+
68+
```bash
69+
SNOWFLAKE_CONNECTION_NAME=<connection> \
70+
python <SKILL_DIR>/send_lineage_via_connection.py -p payload.json
71+
```
72+
73+
PAT/JWT alternative:
74+
75+
```bash
76+
<SKILL_DIR>/send_lineage.sh -a <ACCOUNT> -t /path/to/token.txt -p payload.json
77+
```
78+
79+
### 5. Verify in Snowsight
80+
81+
Catalog → Database Explorer → your table → **Lineage** tab. May take 1–2 minutes to reflect.
82+
83+
## Deleting external lineage
84+
85+
| Scenario | Params | Effect |
86+
|---|---|---|
87+
| Break specific edge | source + target | Removes that edge only |
88+
| Break all downstream | source only | Removes source → all targets |
89+
| Remove from graph | target only | Removes target regardless of source |
90+
91+
```bash
92+
curl --globoff -X DELETE \
93+
-H "Authorization: Bearer $API_KEY" \
94+
"https://<ACCOUNT>.snowflakecomputing.com/api/v2/lineage/external-lineage?sourceNamespace=<SRC_NS>&sourceName=<SRC>&sourceDatasetType=External%20Node&targetName=<DB>.<SCHEMA>.<TABLE>&targetDatasetType=TABLE"
95+
```
96+
97+
DELETE always returns HTTP 200 — confirm in Snowsight.
98+
99+
## Example: Postgres + MySQL → Snowflake
100+
101+
```json
102+
{
103+
"eventType": "COMPLETE",
104+
"eventTime": "2026-02-20T19:00:00.000Z",
105+
"job": {"namespace": "external-etl", "name": "customer_data_pipeline"},
106+
"run": {"runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"},
107+
"producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
108+
"schemaURL": "https://openlineage.io/spec/0-0-1/OpenLineage.json",
109+
"inputs": [
110+
{"namespace": "postgres://prod-db.example.com:5432", "name": "public.customer_signups"},
111+
{"namespace": "mysql://warehouse.example.com:3306", "name": "raw.customer_raw"}
112+
],
113+
"outputs": [
114+
{"namespace": "snowflake://<ORG>-<ACCOUNT>", "name": "<DB>.<SCHEMA>.<TABLE>"}
115+
]
116+
}
117+
```
118+
119+
## Common Mistakes
120+
121+
- **Wrong `eventType`.** Only `COMPLETE` is processed; `START`, `RUNNING`, `FAIL` are silently ignored.
122+
- **Including `facets` on external objects.** Omit them — externals render as "External Node" automatically.
123+
- **Underscores in the account identifier.** Use `ORG-ACCOUNT`, not `ORG_ACCOUNT`.
124+
- **Not using `--globoff` with curl.** Without it, curl re-encodes `External%20Node` and the DELETE matches nothing.
125+
- **Trusting HTTP 200 from DELETE.** It always returns 200; verify in Snowsight.
126+
- **Case-mismatched namespaces.** Namespace and name values are case-sensitive; a typo creates a new orphan node.
127+
- **Mismatched delete direction.** If lineage was created with the external object as `inputs`, the delete `source` must be that same external node.
128+
- **Expecting external nodes in `GET_LINEAGE`.** They only appear in the Snowsight UI.
129+
130+
## Limits
131+
132+
- 1-year retention
133+
- 10,000 events per account
134+
- 1000-char max FQN
135+
- No column-level lineage
136+
137+
## Reference files
138+
139+
- `namespace_conventions.md` — namespace formats per source type
140+
- `token_setup.md` — PAT setup
141+
- `troubleshooting.md` — 401/403/404 fixes
142+
- `send_lineage_via_connection.py` — recommended sender
143+
- `send_lineage.sh`, `generate_payload.sh` — PAT-based alternatives
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
#!/bin/bash
2+
set -e
3+
4+
usage() {
5+
echo "Usage: $0 -a ACCOUNT -o OUTPUT_TABLE [-i INPUT...]"
6+
echo ""
7+
echo "Generate OpenLineage payload JSON for external lineage"
8+
echo ""
9+
echo "Required:"
10+
echo " -a ACCOUNT Snowflake account (ORG-ACCOUNT format)"
11+
echo " -o OUTPUT Output table (DATABASE.SCHEMA.TABLE)"
12+
echo ""
13+
echo "Optional:"
14+
echo " -i INPUT Input source (namespace::name format, can repeat)"
15+
echo " -j JOB_NAME Job name (default: auto-generated)"
16+
echo " -n JOB_NAMESPACE Job namespace (default: external-etl)"
17+
echo " -f OUTPUT_FILE Output file (default: stdout)"
18+
echo " -h Show this help"
19+
echo ""
20+
echo "Examples:"
21+
echo " # Single input"
22+
echo " $0 -a MYORG-MYACCOUNT -o DB.SCHEMA.TABLE \\"
23+
echo " -i 'postgres://host:5432::db.schema.table'"
24+
echo ""
25+
echo " # Multiple inputs"
26+
echo " $0 -a MYORG-MYACCOUNT -o DB.SCHEMA.TABLE \\"
27+
echo " -i 'postgres://host:5432::public.users' \\"
28+
echo " -i 's3://bucket::path/to/file.parquet' \\"
29+
echo " -f payload.json"
30+
exit 1
31+
}
32+
33+
INPUTS=()
34+
JOB_NAMESPACE="external-etl"
35+
JOB_NAME=""
36+
OUTPUT_FILE=""
37+
38+
while getopts "a:o:i:j:n:f:h" opt; do
39+
case $opt in
40+
a) ACCOUNT="$OPTARG" ;;
41+
o) OUTPUT_TABLE="$OPTARG" ;;
42+
i) INPUTS+=("$OPTARG") ;;
43+
j) JOB_NAME="$OPTARG" ;;
44+
n) JOB_NAMESPACE="$OPTARG" ;;
45+
f) OUTPUT_FILE="$OPTARG" ;;
46+
h) usage ;;
47+
*) usage ;;
48+
esac
49+
done
50+
51+
if [[ -z "$ACCOUNT" || -z "$OUTPUT_TABLE" ]]; then
52+
echo "Error: -a ACCOUNT and -o OUTPUT_TABLE are required"
53+
usage
54+
fi
55+
56+
if [[ ${#INPUTS[@]} -eq 0 ]]; then
57+
echo "Error: At least one -i INPUT is required"
58+
usage
59+
fi
60+
61+
if [[ -z "$JOB_NAME" ]]; then
62+
TABLE_NAME=$(echo "$OUTPUT_TABLE" | tr '.' '_' | tr '[:upper:]' '[:lower:]')
63+
JOB_NAME="${TABLE_NAME}_pipeline"
64+
fi
65+
66+
RUN_ID=$(uuidgen | tr '[:upper:]' '[:lower:]')
67+
EVENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S.000Z")
68+
69+
INPUT_JSON=""
70+
for input in "${INPUTS[@]}"; do
71+
NAMESPACE=$(echo "$input" | cut -d':' -f1-3)
72+
NAME=$(echo "$input" | cut -d':' -f4-)
73+
74+
if [[ -n "$INPUT_JSON" ]]; then
75+
INPUT_JSON="$INPUT_JSON,"
76+
fi
77+
INPUT_JSON="$INPUT_JSON
78+
{\"namespace\": \"$NAMESPACE\", \"name\": \"$NAME\"}"
79+
done
80+
81+
PAYLOAD=$(cat <<EOF
82+
{
83+
"eventType": "COMPLETE",
84+
"eventTime": "$EVENT_TIME",
85+
"job": {
86+
"namespace": "$JOB_NAMESPACE",
87+
"name": "$JOB_NAME"
88+
},
89+
"run": {
90+
"runId": "$RUN_ID"
91+
},
92+
"producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
93+
"schemaURL": "https://openlineage.io/spec/0-0-1/OpenLineage.json",
94+
"inputs": [$INPUT_JSON
95+
],
96+
"outputs": [
97+
{
98+
"namespace": "snowflake://$ACCOUNT",
99+
"name": "$OUTPUT_TABLE"
100+
}
101+
]
102+
}
103+
EOF
104+
)
105+
106+
if [[ -n "$OUTPUT_FILE" ]]; then
107+
echo "$PAYLOAD" > "$OUTPUT_FILE"
108+
echo "Payload written to: $OUTPUT_FILE"
109+
else
110+
echo "$PAYLOAD"
111+
fi
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# OpenLineage Namespace Naming Conventions
2+
3+
Datasets and jobs have their own namespaces. Dataset namespaces are derived from datasources, job namespaces from schedulers.
4+
5+
## Dataset Namespace & Name Format
6+
7+
| Source | Type | Namespace Format | Name Format |
8+
|--------|------|------------------|-------------|
9+
| **Snowflake** | Warehouse | `snowflake://{org name}-{account name}` or `snowflake://{account-locator}(.{region})(.{cloud})` | `{database}.{schema}.{table}` |
10+
| **Postgres** | Warehouse | `postgres://{host}:{port}` | `{database}.{schema}.{table}` |
11+
| **MySQL** | Warehouse | `mysql://{host}:{port}` | `{database}.{table}` |
12+
| **MSSQL** | Warehouse | `mssql://{host}:{port}` | `{database}.{schema}.{table}` |
13+
| **Oracle** | Warehouse | `oracle://{host}:{port}` | `{serviceName}.{schema}.{table}` |
14+
| **BigQuery** | Warehouse | `bigquery` | `{project id}.{dataset name}.{table name}` |
15+
| **Redshift** | Warehouse | `redshift://{cluster_identifier}.{region_name}:{port}` | `{database}.{schema}.{table}` |
16+
| **Athena** | Warehouse | `awsathena://athena.{region_name}.amazonaws.com` | `{catalog}.{database}.{table}` |
17+
| **AWS Glue** | Data catalog | `arn:aws:glue:{region}:{account id}` | `table/{database name}/{table name}` |
18+
| **Hive** | Warehouse | `hive://{host}:{port}` | `{database}.{table}` |
19+
| **Trino** | Warehouse | `trino://{host}:{port}` | `{catalog}.{schema}.{table}` |
20+
| **Cassandra** | Warehouse | `cassandra://{host}:{port}` | `{keyspace}.{table}` |
21+
| **DB2** | Warehouse | `db2://{host}:{port}` | `{database}.{schema}.{table}` |
22+
| **Teradata** | Warehouse | `teradata://{host}:{port}` | `{database}.{table}` |
23+
| **Azure Synapse** | Warehouse | `sqlserver://{host}:{port}` | `{schema}.{table}` |
24+
| **Azure Cosmos DB** | Warehouse | `azurecosmos://{host}/dbs/{database}` | `colls/{table}` |
25+
| **Azure Data Explorer** | Warehouse | `azurekusto://{host}.kusto.windows.net` | `{database}/{table}` |
26+
| **Spanner** | Warehouse | `spanner://{projectId}:{instanceId}` | `{database}.{schema}.{table}` |
27+
| **S3** | Blob Storage | `s3://{bucket name}` | `{object key}` |
28+
| **GCS** | Blob Storage | `gs://{bucket name}` | `{object key}` |
29+
| **ABFSS** | Data Lake | `abfss://{container}@{service}.dfs.core.windows.net` | `{path}` |
30+
| **WASBS** | Blob Storage | `wasbs://{container}@{service}.dfs.core.windows.net` | `{object key}` |
31+
| **HDFS** | Distributed FS | `hdfs://{namenode host}:{namenode port}` | `{path}` |
32+
| **DBFS** | Distributed FS | `dbfs://{workspace name}` | `{path}` |
33+
| **Kafka** | Event Streaming | `kafka://{bootstrap server host}:{port}` | `{topic}` |
34+
| **PubSub** | Event Streaming | `pubsub` | `topic:{projectId}:{topicId}` or `subscription:{projectId}:{subscriptionId}` |
35+
| **Local File** | File System | `file` | `{path}` |
36+
| **Remote File** | File System | `file://{host}` | `{path}` |
37+
38+
## Snowflake Namespace - Important Notes
39+
40+
Snowflake has two namespace formats:
41+
1. **Preferred:** `snowflake://{org name}-{account name}` (e.g., `snowflake://MYORG-MYACCOUNT`)
42+
2. **Legacy:** `snowflake://{account-locator}.{region}.{cloud}` (e.g., `snowflake://xy12345.us-east-1.aws`)
43+
44+
**Warning:** Using legacy account locator format creates dataset IDs that won't match IDs created with the org-account format. If you switch formats later, existing lineage nodes won't connect to new ones. Use the org-account format when possible.
45+
46+
## Job Namespace & Name Format
47+
48+
| Scheduler | Name Format | Example |
49+
|-----------|-------------|---------|
50+
| Airflow task | `{dag_id}.{task_id}` | `orders_etl.count_orders` |
51+
| Spark job | `{appName}.{command}.{table}` | `my_app.execute_insert_into_hive_table.mydb_mytable` |
52+
| SQL | `{schema}.{table}` | `gx.validate_datasets` |
53+
| Debezium | `{topic.prefix}.{taskId}` | `inventory.0` |
54+
55+
## Run ID Format
56+
Runs use client-generated UUIDs (e.g., `f47ac10b-58cc-4372-a567-0e02b2c3d479`)
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
[project]
2+
name = "manage-external-lineage"
3+
version = "0.1.0"
4+
description = "Send and manage OpenLineage external lineage events for Snowflake"
5+
requires-python = ">=3.11"
6+
dependencies = [
7+
"requests>=2.32.0",
8+
"snowflake-connector-python>=3.6.0",
9+
]

0 commit comments

Comments
 (0)