Skip to content

Commit c9801c0

Browse files
implement get_lineage tool for complete lineage across dbt resources (#502)
## Summary A User building a catalog-like experience with dbt and Snowflake via Claude Desktop requested full lineage capabilities #110: > "Get the full lineage of a model, not only its children/parents. To parse the full lineage with the official MCP it would require quite a few calls to the tool to parse through all layers, which would be nice to get the server doing async instead providing the full list as a return." --- ## What Changed Added a new `get_lineage` tool that enables efficient lineage traversal across all dbt resource types using the Discovery API's `lineage()` endpoint. ### Key Implementation Details **Query Approach:** - Uses Discovery API's `lineage()` endpoint to fetch the entire lineage graph in a single API call - Requests minimal fields: `name`, `uniqueId`, `resourceType`, and `parentIds` - Pre-builds a children adjacency map for descendant lookups **Why Not Use Selector Syntax?** The selector parameters (`select: "model+"`, `exclude: "test_*"`) does not seem to be available in the public API: ## Test Case <img width="1264" height="783" alt="image" src="https://github.com/user-attachments/assets/ee045546-9979-48e2-b31a-7780f85c81e1" /> Search for any exposures: <img width="1264" height="833" alt="image" src="https://github.com/user-attachments/assets/8ba63f3d-cc99-4e7f-a63a-811881a6f847" /> ## Why Implementing to close #110 ## Related Issues <!-- Link any related issues using #issue_number --> Closes ##110 Related to ##110 ## Checklist - [x] I have performed a self-review of my code - [x] I have made corresponding changes to the documentation (in https://github.com/dbt-labs/docs.getdbt.com) if required -- Mention it here dbt-labs/docs.getdbt.com#8233 - [x] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes ## Additional Notes <!-- Any additional information that would be helpful for reviewers --> ## Related Issues <!-- Link any related issues using #issue_number --> Closes # Related to # ## Checklist - [x] I have performed a self-review of my code - [x] I have made corresponding changes to the documentation (in https://github.com/dbt-labs/docs.getdbt.com) if required -- Mention it here dbt-labs/docs.getdbt.com#8233 - [x] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes ## Additional Notes <!-- Any additional information that would be helpful for reviewers --> --------- Co-authored-by: Devon Fulcher <24593113+DevonFulcher@users.noreply.github.com>
1 parent 252f33c commit c9801c0

9 files changed

Lines changed: 847 additions & 1 deletion

File tree

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kind: Enhancement or New Feature
2+
body: Adding get_lineage discovery mcp tool.
3+
time: 2026-01-07T13:34:17.425226Z

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ The dbt MCP server architecture allows for your agent to connect to a variety of
3535
- `get_all_sources`
3636
- `get_exposure_details`
3737
- `get_exposures`
38+
- `get_lineage`
3839
- `get_macro_details`
3940
- `get_mart_models`
4041
- `get_model_children`

src/dbt_mcp/discovery/client.py

Lines changed: 122 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
from dbt_mcp.config.config_providers import ConfigProvider, DiscoveryConfig
1010
from dbt_mcp.discovery.graphql import load_query
11-
from dbt_mcp.errors import InvalidParameterError
11+
from dbt_mcp.errors import InvalidParameterError, ToolCallError
1212
from dbt_mcp.gql.errors import raise_gql_error
1313

1414
DEFAULT_PAGE_SIZE = 100
@@ -302,6 +302,9 @@ class GraphQLQueries:
302302
}
303303
""")
304304

305+
# Lineage query
306+
GET_FULL_LINEAGE = load_query("get_full_lineage.gql")
307+
305308

306309
class MetadataAPIClient:
307310
def __init__(self, config_provider: ConfigProvider[DiscoveryConfig]):
@@ -667,3 +670,121 @@ async def fetch_details(
667670
if not edges:
668671
return []
669672
return [e["node"] for e in edges]
673+
674+
675+
class LineageResourceType(StrEnum):
676+
"""Resource types supported by the lineage API."""
677+
678+
MODEL = "Model"
679+
SOURCE = "Source"
680+
SEED = "Seed"
681+
SNAPSHOT = "Snapshot"
682+
EXPOSURE = "Exposure"
683+
METRIC = "Metric"
684+
SEMANTIC_MODEL = "SemanticModel"
685+
SAVED_QUERY = "SavedQuery"
686+
TEST = "Test"
687+
688+
689+
class LineageFetcher:
690+
"""Fetcher for lineage data. Returns nodes connected to the target."""
691+
692+
def __init__(self, api_client: MetadataAPIClient):
693+
self.api_client = api_client
694+
695+
async def fetch_lineage(
696+
self,
697+
unique_id: str,
698+
depth: int,
699+
types: list[LineageResourceType] | None = None,
700+
) -> list[dict]:
701+
"""Fetch lineage graph filtered to nodes connected to unique_id.
702+
703+
Args:
704+
unique_id: The dbt unique ID of the resource to get lineage for.
705+
types: List of resource types to include. If None, includes all types.
706+
707+
Returns:
708+
List of nodes connected to unique_id (upstream + downstream).
709+
"""
710+
if depth <= 0:
711+
raise ToolCallError("Depth must be greater than 0")
712+
config = await self.api_client.config_provider.get_config()
713+
type_filter = [
714+
t.value for t in (types if types is not None else LineageResourceType)
715+
]
716+
variables = {
717+
"environmentId": config.environment_id,
718+
"types": type_filter,
719+
# uniqueId removed - not used by GraphQL
720+
}
721+
722+
result = await self.api_client.execute_query(
723+
GraphQLQueries.GET_FULL_LINEAGE, variables
724+
)
725+
raise_gql_error(result)
726+
727+
all_nodes = (
728+
result.get("data", {})
729+
.get("environment", {})
730+
.get("applied", {})
731+
.get("lineage", [])
732+
)
733+
734+
# Filter to connected nodes only
735+
return self._filter_connected_nodes(all_nodes, unique_id, depth)
736+
737+
def _filter_connected_nodes(
738+
self, nodes: list[dict], target_id: str, depth: int
739+
) -> list[dict]:
740+
"""Return only nodes connected to target_id (upstream and downstream).
741+
742+
Uses BFS to find all nodes reachable from target in both directions.
743+
"""
744+
node_map = {
745+
n["uniqueId"]: n
746+
for n in nodes
747+
if (resource_type := n.get("resourceType"))
748+
and isinstance(resource_type, str)
749+
# Filtering out macros because they have large
750+
# dependency graphs that aren't always useful.
751+
and resource_type.strip().lower() != "macro"
752+
}
753+
754+
if target_id not in node_map:
755+
return []
756+
757+
# BFS to find all connected nodes
758+
connected = {target_id}
759+
queue = [(target_id, 0)]
760+
761+
while queue:
762+
current_id, current_depth = queue.pop(0)
763+
node = node_map.get(current_id)
764+
if not node:
765+
continue
766+
767+
# Stop traversing beyond the depth limit
768+
if current_depth >= depth:
769+
continue
770+
771+
# Traverse upstream (parents)
772+
for parent_id in node.get("parentIds", []):
773+
if parent_id not in connected and parent_id in node_map:
774+
connected.add(parent_id)
775+
queue.append((parent_id, current_depth + 1))
776+
777+
# Traverse downstream (children)
778+
for candidate in nodes:
779+
candidate_id = candidate.get("uniqueId")
780+
if not candidate_id or candidate_id not in node_map:
781+
continue
782+
if (
783+
current_id in candidate.get("parentIds", [])
784+
and candidate_id not in connected
785+
):
786+
connected.add(candidate_id)
787+
queue.append((candidate_id, current_depth + 1))
788+
789+
# Return in original order
790+
return [node_map[uid] for uid in connected]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
query GetFullLineage($environmentId: BigInt!, $types: [ResourceNodeType!]!) {
2+
environment(id: $environmentId) {
3+
applied {
4+
lineage(filter: { types: $types }) {
5+
name
6+
uniqueId
7+
resourceType
8+
... on LineageNodeWithParents {
9+
parentIds
10+
}
11+
}
12+
}
13+
}
14+
}

src/dbt_mcp/discovery/tools.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
from dbt_mcp.discovery.client import (
99
AppliedResourceType,
1010
ExposuresFetcher,
11+
LineageFetcher,
12+
LineageResourceType,
1113
MetadataAPIClient,
1214
ModelsFetcher,
1315
PaginatedResourceFetcher,
@@ -35,6 +37,13 @@
3537
"This is not required if `unique_id` is provided. "
3638
"Only use name when `unique_id` is unknown.",
3739
)
40+
TYPES_FIELD = Field(
41+
default=None,
42+
description="List of resource types to include in lineage results. "
43+
"If not provided, includes all types. "
44+
"Valid types: Model, Source, Seed, Snapshot, Exposure, Metric, SemanticModel, SavedQuery, Test.",
45+
)
46+
DEPTH_FIELD = Field(default=5, description="The depth of the lineage graph to return.")
3847

3948

4049
@dataclass
@@ -43,6 +52,7 @@ class DiscoveryToolContext:
4352
exposures_fetcher: ExposuresFetcher
4453
sources_fetcher: SourcesFetcher
4554
resource_details_fetcher: ResourceDetailsFetcher
55+
lineage_fetcher: LineageFetcher
4656

4757
def __init__(self, config_provider: ConfigProvider[DiscoveryConfig]):
4858
api_client = MetadataAPIClient(config_provider=config_provider)
@@ -83,6 +93,7 @@ def __init__(self, config_provider: ConfigProvider[DiscoveryConfig]):
8393
),
8494
)
8595
self.resource_details_fetcher = ResourceDetailsFetcher(api_client=api_client)
96+
self.lineage_fetcher = LineageFetcher(api_client=api_client)
8697

8798

8899
@dbt_mcp_tool(
@@ -174,6 +185,24 @@ async def get_model_health(
174185
return await context.models_fetcher.fetch_model_health(name, unique_id)
175186

176187

188+
@dbt_mcp_tool(
189+
description=get_prompt("discovery/get_lineage"),
190+
title="Get Lineage",
191+
read_only_hint=True,
192+
destructive_hint=False,
193+
idempotent_hint=True,
194+
)
195+
async def get_lineage(
196+
context: DiscoveryToolContext,
197+
unique_id: str,
198+
types: list[LineageResourceType] | None = TYPES_FIELD,
199+
depth: int = DEPTH_FIELD,
200+
) -> list[dict]:
201+
return await context.lineage_fetcher.fetch_lineage(
202+
unique_id=unique_id, types=types, depth=depth
203+
)
204+
205+
177206
@dbt_mcp_tool(
178207
description=get_prompt("discovery/get_exposures"),
179208
title="Get Exposures",
@@ -340,6 +369,7 @@ async def get_test_details(
340369
get_model_parents,
341370
get_model_children,
342371
get_model_health,
372+
get_lineage,
343373
get_exposures,
344374
get_exposure_details,
345375
get_all_sources,
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
Retrieves the lineage graph for a dbt resource.
2+
3+
Returns all nodes connected to the specified resource, including both upstream dependencies (ancestors) and downstream dependents (descendants).
4+
5+
**Parameters:**
6+
- `unique_id`: **Required** - Full unique ID of the resource (e.g., "model.my_project.customers")
7+
- `types`: *Optional* - List of resource types to include in results. If not provided, includes all types.
8+
- Valid types: `Model`, `Source`, `Seed`, `Snapshot`, `Exposure`, `Metric`, `SemanticModel`, `SavedQuery`, `Test`
9+
- `depth`: *Optional* - The depth of the lineage graph to return (default: 5). Controls how many levels upstream and downstream to traverse from the target node.
10+
11+
**Returns:**
12+
A list of all nodes in the connected subgraph, where each node contains:
13+
- `uniqueId`: The resource's unique identifier
14+
- `name`: The resource name
15+
- `resourceType`: The type of resource (Model, Source, etc.)
16+
- `parentIds`: List of unique IDs that this resource directly depends on
17+
18+
**Example Response:**
19+
```json
20+
[
21+
{
22+
"uniqueId": "source.raw.users",
23+
"name": "users",
24+
"resourceType": "Source",
25+
"parentIds": []
26+
},
27+
{
28+
"uniqueId": "model.stg_customers",
29+
"name": "stg_customers",
30+
"resourceType": "Model",
31+
"parentIds": ["source.raw.users"]
32+
},
33+
{
34+
"uniqueId": "model.customers",
35+
"name": "customers",
36+
"resourceType": "Model",
37+
"parentIds": ["model.stg_customers"]
38+
}
39+
]
40+
```
41+
42+
**Usage Examples:**
43+
```python
44+
# Get complete lineage (all connected nodes, all types, default depth of 5)
45+
get_lineage(unique_id="model.analytics.customers")
46+
47+
# Get lineage filtered to only models and sources
48+
get_lineage(unique_id="model.analytics.customers", types=["Model", "Source"])
49+
50+
# Get only immediate neighbors (depth=1)
51+
get_lineage(unique_id="model.analytics.customers", depth=1)
52+
53+
# Get deeper lineage for comprehensive analysis
54+
get_lineage(unique_id="model.analytics.customers", depth=10)
55+
```
56+
57+
**Traversing the Graph:**
58+
59+
The graph is represented by parent-child relationships. To navigate:
60+
61+
**Finding Upstream Dependencies (Parents):**
62+
```python
63+
# Direct: What does this node depend on?
64+
target_node = find_node_by_id(result, "model.customers")
65+
direct_parents = target_node["parentIds"]
66+
# Result: ["model.stg_customers"]
67+
```
68+
69+
**Finding Downstream Dependents (Children):**
70+
```python
71+
# Search: What depends on this node?
72+
target_id = "model.customers"
73+
direct_children = [
74+
node for node in result
75+
if target_id in node.get("parentIds", [])
76+
]
77+
# Result: nodes that list "model.customers" in their parentIds
78+
```
79+
80+
**Understanding the Results:**
81+
82+
- The target node is always included in the response
83+
- All returned nodes are connected to the target (no disconnected nodes)
84+
- To get full lineage, omit the `types` parameter
85+
- To reduce payload size, specify relevant `types`
86+
87+
**Common Use Cases:**
88+
89+
1. **Impact Analysis**: "What will break if I change this model?"
90+
- Look at downstream dependents (nodes that have target in `parentIds`)
91+
92+
2. **Dependency Tracking**: "What does this model depend on?"
93+
- Look at upstream dependencies (`parentIds` of the target node)
94+
95+
3. **Data Lineage**: "Show the complete data flow for this entity"
96+
- Use all returned nodes to build a complete graph
97+
98+
4. **Finding Tests**: "What tests exist for this model and its dependencies?"
99+
- Filter results where `resourceType == "Test"`

src/dbt_mcp/tools/tool_names.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ class ToolName(Enum):
3030
GET_MODEL_PARENTS = "get_model_parents"
3131
GET_MODEL_CHILDREN = "get_model_children"
3232
GET_MODEL_HEALTH = "get_model_health"
33+
GET_LINEAGE = "get_lineage"
3334
GET_ALL_SOURCES = "get_all_sources"
3435
GET_SOURCE_DETAILS = "get_source_details"
3536
GET_EXPOSURES = "get_exposures"

src/dbt_mcp/tools/toolsets.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ class Toolset(Enum):
6161
ToolName.GET_MODEL_PARENTS,
6262
ToolName.GET_MODEL_CHILDREN,
6363
ToolName.GET_MODEL_HEALTH,
64+
ToolName.GET_LINEAGE,
6465
ToolName.GET_ALL_SOURCES,
6566
ToolName.GET_SOURCE_DETAILS,
6667
ToolName.GET_EXPOSURES,

0 commit comments

Comments
 (0)