|
| 1 | +--- |
| 2 | +name: manage-external-lineage |
| 3 | +title: Manage External Lineage |
| 4 | +summary: Send and delete OpenLineage COMPLETE events to connect external systems to Snowflake's lineage graph. |
| 5 | +description: | |
| 6 | + Use when you need to surface external systems (Postgres, MySQL, S3, Kafka, etc.) in Snowflake's lineage view, send OpenLineage COMPLETE events via REST, or remove existing external lineage links. Triggers: external lineage, openlineage event, send lineage, establish lineage, delete lineage, connect postgres to snowflake lineage, connect mysql to snowflake lineage, connect s3 to snowflake lineage, document data pipeline, lineage api, ingest lineage. |
| 7 | +tools: |
| 8 | + - snowflake_sql_execute |
| 9 | + - Bash |
| 10 | + - Read |
| 11 | + - Write |
| 12 | + - Edit |
| 13 | +prompt: Create a COMPLETE external lineage event from postgres://prod-db:5432 public.orders into MYDB.PUBLIC.ORDERS. |
| 14 | +language: en |
| 15 | +status: Published |
| 16 | +author: Snowflake Solutions Team |
| 17 | +type: snowflake |
| 18 | +--- |
| 19 | + |
| 20 | +# Manage External Lineage |
| 21 | + |
| 22 | +## Overview |
| 23 | + |
| 24 | +Snowflake's lineage graph natively tracks objects inside the account. To show data flowing in from (or out to) external systems — Postgres, MySQL, S3, Kafka, DB2, Trino, etc. — you POST OpenLineage `COMPLETE` events to the external lineage REST endpoint. Once accepted, those external nodes appear in Snowsight under **Catalog → Database Explorer → [Table] → Lineage**. |
| 25 | + |
| 26 | +This skill helps you: |
| 27 | + |
| 28 | +- Build a valid OpenLineage payload |
| 29 | +- Send it using your existing Snowflake connection (no token juggling) |
| 30 | +- Delete external lineage relationships when sources are retired |
| 31 | + |
| 32 | +## Prerequisites |
| 33 | + |
| 34 | +- `INGEST LINEAGE` privilege on the account (and `DELETE LINEAGE` for deletes) |
| 35 | +- An active `cortex` connection, OR a Programmatic Access Token (PAT) |
| 36 | +- Python deps: `requests`, `snowflake-connector-python` |
| 37 | + |
| 38 | +## Workflow |
| 39 | + |
| 40 | +### Step 1: Verify privileges and target |
| 41 | + |
| 42 | +```sql |
| 43 | +SHOW GRANTS ON ACCOUNT; |
| 44 | +-- Look for INGEST LINEAGE granted to your role |
| 45 | +DESCRIBE TABLE <db>.<schema>.<table>; |
| 46 | +``` |
| 47 | + |
| 48 | +If missing: `GRANT INGEST LINEAGE ON ACCOUNT TO ROLE <role>;` |
| 49 | + |
| 50 | +### Step 2: Build the payload |
| 51 | + |
| 52 | +```json |
| 53 | +{ |
| 54 | + "eventType": "COMPLETE", |
| 55 | + "eventTime": "2026-02-20T19:00:00.000Z", |
| 56 | + "job": {"namespace": "external-etl", "name": "orders_pipeline"}, |
| 57 | + "run": {"runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"}, |
| 58 | + "producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client", |
| 59 | + "schemaURL": "https://openlineage.io/spec/0-0-1/OpenLineage.json", |
| 60 | + "inputs": [ |
| 61 | + {"namespace": "postgres://prod-db:5432", "name": "public.orders"} |
| 62 | + ], |
| 63 | + "outputs": [ |
| 64 | + {"namespace": "snowflake://<ORG>-<ACCOUNT>", "name": "<DB>.<SCHEMA>.<TABLE>"} |
| 65 | + ] |
| 66 | +} |
| 67 | +``` |
| 68 | + |
| 69 | +Rules: |
| 70 | +- `eventType` must be `COMPLETE` — other types are ignored. |
| 71 | +- `inputs` and `outputs` must mix Snowflake and external objects. |
| 72 | +- Do NOT include `facets` for external objects — they render as "External Node" by default. |
| 73 | +- See `namespace_conventions.md` for per-source namespace formats. |
| 74 | + |
| 75 | +⚠️ STOPPING POINT: Show the payload to the user and wait for confirmation before sending. |
| 76 | + |
| 77 | +### Step 3: Send the event |
| 78 | + |
| 79 | +Preferred — use your Cortex Code connection: |
| 80 | + |
| 81 | +```bash |
| 82 | +SNOWFLAKE_CONNECTION_NAME=<connection> python <SKILL_DIR>/send_lineage_via_connection.py -p payload.json |
| 83 | +``` |
| 84 | + |
| 85 | +Or generate + send in one go: |
| 86 | + |
| 87 | +```bash |
| 88 | +<SKILL_DIR>/generate_payload.sh -a <ACCOUNT> -o <DB>.<SCHEMA>.<TABLE> \ |
| 89 | + -i 'postgres://host:5432::db.schema.source' -f /tmp/payload.json |
| 90 | +SNOWFLAKE_CONNECTION_NAME=<connection> python <SKILL_DIR>/send_lineage_via_connection.py -p /tmp/payload.json |
| 91 | +``` |
| 92 | + |
| 93 | +PAT alternative: `<SKILL_DIR>/send_lineage.sh -a <ACCOUNT> -t token.txt -p payload.json` |
| 94 | + |
| 95 | +### Step 4: Verify |
| 96 | + |
| 97 | +Open Snowsight → Catalog → Database Explorer → your table → Lineage tab. Allow 1–2 minutes for propagation. |
| 98 | + |
| 99 | +### Step 5: Delete external lineage (optional) |
| 100 | + |
| 101 | +⚠️ STOPPING POINT: Confirm the source/target before sending DELETE. The endpoint always returns HTTP 200 — verify removal in Snowsight. |
| 102 | + |
| 103 | +```bash |
| 104 | +curl --globoff -X DELETE \ |
| 105 | + -H "Authorization: Bearer $API_KEY" \ |
| 106 | + "https://<ACCOUNT>.snowflakecomputing.com/api/v2/lineage/external-lineage?sourceNamespace=<NS>&sourceName=<NAME>&sourceDatasetType=External%20Node&targetName=<DB>.<SCHEMA>.<TABLE>&targetDatasetType=TABLE" |
| 107 | +``` |
| 108 | + |
| 109 | +Delete scopes: |
| 110 | +- Source + target → break that one link |
| 111 | +- Source only → break all downstream from that source |
| 112 | +- Target only → strip the target from the graph |
| 113 | + |
| 114 | +## Common Mistakes |
| 115 | + |
| 116 | +- Using `eventType` other than `COMPLETE` — silently dropped. |
| 117 | +- Underscores in the account URL — use hyphens (`ORG-ACCOUNT`). |
| 118 | +- Forgetting `--globoff` on curl — it mangles `External%20Node`. |
| 119 | +- Including `facets` on external nodes — breaks the "External Node" rendering. |
| 120 | +- Treating DELETE's `200` as success — always verify in Snowsight. |
| 121 | +- Mismatched delete direction — if external was the INPUT on create, it must be the source on delete. |
| 122 | +- Case-insensitive matching — namespaces and names are case-sensitive. |
| 123 | + |
| 124 | +## Limitations |
| 125 | + |
| 126 | +- 1-year retention, 10,000 events per account |
| 127 | +- 1000-char max FQN |
| 128 | +- No column-level lineage |
| 129 | +- External lineage isn't returned by `GET_LINEAGE` |
| 130 | + |
| 131 | +## Stopping Points |
| 132 | + |
| 133 | +- Step 2 — wait for payload review before sending |
| 134 | +- Step 5 — confirm targets before DELETE |
| 135 | +- Step 5 — verify in Snowsight (HTTP 200 does not confirm deletion) |
| 136 | + |
| 137 | +## Reference files |
| 138 | + |
| 139 | +- `namespace_conventions.md` — namespace formats per source |
| 140 | +- `token_setup.md` — creating a PAT |
| 141 | +- `troubleshooting.md` — 401 / 403 / 404 fixes |
| 142 | +- `send_lineage_via_connection.py` — recommended sender |
| 143 | +- `send_lineage.sh` — PAT-based sender |
| 144 | +- `generate_payload.sh` — payload builder |
0 commit comments