Skip to content

Commit c9bf758

Browse files
Teradata lineage and dependency graph analysis (#302)
* Added graph_queryDependenciesAgent MCP tool for Teradata object dependency analysis Introduce a new MCP tool that provides comprehensive object dependency analysis for Teradata databases with support for wildcards, CSV patterns, and bidirectional dependency traversal. Key Features: - Analyses upstream dependencies (what an object depends on) and downstream dependencies (what depends on the object) - Supports single objects, wildcard patterns (%), and CSV pattern lists - Configurable traversal depth for both upstream (max_depth_up) and downstream (max_depth_down) analysis (0-10 levels) - Server-side filtering with exclude_objects and include_containers parameters - Returns dependency graph as nodes and edges for visualisation - Multiple output formats: 'detailed', 'summary', 'edges_only' Use Cases: - Impact analysis: Determine blast radius before dropping/changing objects - Data lineage tracing: Track upstream data sources - Dependency discovery: Understand object relationships - Pre-deployment validation: Assess impacts before changes - Documentation: Map database object dependencies Parameters: - object_name (required): Object pattern(s) - supports wildcards and CSV Examples: 'DB.Table', '%WBC%.%', 'DB1.T1,DB2.T2' - max_depth_up (default: 3): Upstream traversal depth (0-10) - max_depth_down (default: 3): Downstream traversal depth (0-10) - exclude_objects (default: ''): CSV patterns to exclude from analysis - include_containers (default: ''): Whitelist of schemas/databases - edge_repository (default: 'DEV_01_ODEX_STD_0_V.ODEXRepository'): ODEX repository table - return_format (default: 'detailed'): Output format Technical Implementation: - Leverages ODEX repository for dependency metadata - Uses STRTOK_SPLIT_TO_TABLE for server-side CSV parsing - Automatic whitespace trimming of patterns - Returns formatted response with dependency graph and metadata - Performance optimised with proper exclusion patterns (20-50% reduction) Example Usage: graph_queryDependenciesAgent( object_name="%WBC%.%,%StGeo%.%", max_depth_up=5, exclude_objects="PRD_%,TST_%" ) BREAKING CHANGE: None - new feature addition * Replaced QueryDependenciesAgent, added findRootObjects, added graph_detectCycles, graph_connectedComponents and _graph_bfsLevels replace QueryDependenciesAgent with QueryDependenciesAgentBatch (better performance). Added findRootObjects to find source objects to start analysing downstream graphs Added graph_detectCycles to identify circular references Add graph_connectedComponents to identify groups of connected component (groups of closely related objects) And Added graph_bfsLevels using a Breadth First Search for use in Object Migration Wave planning * feat: add graph analysis tools - analyseDatabase, bfsLevels, connectedComponents, detectCycles, findRootObjects, edgeContract - Replaced monolithic queryDependenciesAgent with modular graph tools - Added _graph_utils shared utility module - Removed graph_prompts.yml and legacy documentation - Updated app.py and profiles.yml for graph tool registration * refactor(graph): compliance pass, contract v1.1, helper consolidation refactor(graph): compliance pass, contract v1.1, helper consolidation BREAKING CHANGES - graph_queryDependenciesAgent renamed to graph_traceLineage (file, function, constant, tool name string). Update any callers accordingly. - graph_detectCycles: strategy and max_edges_for_cte parameters removed. - graph_detectCycles, graph_connectedComponents: object_dependency_table renamed to edge_repository; excl_patterns renamed to exclude_objects. - graph_edgeContractDDL: generated DDL column names corrected from SrcContainer/SrcObject/SrcKind to Src_Container_Name/Src_Object_Name/ Src_Kind (and Tgt equivalents). Previously generated tables were incompatible with the tool SQL. Contract version bumped to 1.1. PROGRESSIVE DISCLOSURE COMPLIANCE - graph_tools.py: graph_analyseDatabase and graph_edgeContractDDL were missing from GRAPH_TOOLS. All 7 tools now registered in workflow order: edgeContractDDL → findRootObjects → bfsLevels → traceLineage → detectCycles → connectedComponents → analyseDatabase. - GRAPH_EDGE_CONTRACT_DDL_TOOL descriptor added to graph_edge_contract.py (was absent entirely; tool was unregisterable in static mode). TERMINOLOGY - Remove all ODEX references from __init__.py and _graph_utils.py per standing instruction. Replaced with generic terms (dependency graph, object dependency graph). LOGGING - Replace all f-string logger calls with %s style throughout graph_findRootObjects.py (5 calls), graph_bfsLevels.py (1 call), and graph_edge_contract.py (2 calls, including logger.warning). - Remove stray print() from graph_findRootObjects.py; replaced with logger.debug. PARAMETER CHANGES - edge_repository: runtime validation added to all 6 tools that accept it. Empty string now returns an early error with the AI-Native Data Product convention hint ({ProductName}_Semantic.lineage_graph). - graph_bfsLevels, graph_traceLineage, graph_detectCycles, graph_connectedComponents: stale cross-references to graph_queryDependenciesAgent updated to graph_traceLineage throughout docstrings and descriptors. GRAPH EDGE CONTRACT v1.1 - Column names corrected throughout: DDL, sample DML, view template, COMMENT ON COLUMN, canonical contract text, file header. - Optional enrichment columns added: Edge_Relationship VARCHAR(50), Transformation_Type VARCHAR(50). Ignored by graph analysis tools; present in {ProductName}_Semantic.lineage_graph for visualisation clients. ADDITIONAL COLUMNS section updated accordingly. - Src_Kind/Tgt_Kind COMPRESS lists expanded to cover both single-letter codes (T, V, P...) and full-word values (Table, View, Job...) to match lineage_graph output. - Sample DML updated: basic examples use 6-column form; new ETL-job example demonstrates source→job→target two-leg pattern using all 8 columns. - View template updated: optional columns included as nullable CAST(NULL AS VARCHAR(50)) placeholders with mapping guidance. - AI-Native Data Product convention documented in file header, contract text, docstring, descriptor, and all edge_repository error messages. HELPER CONSOLIDATION (phase 1 — safe mechanical changes only) - _graph_utils.py: add parse_csv_patterns() and build_like_or(). - Remove 7 local copies of parse_csv/_parse_csv_patterns (graph_ analyseDatabase, graph_bfsLevels, graph_detectCycles, graph_connected Components, graph_traceLineage, graph_findRootObjects ×2); replace with shared import. - Remove 3 local copies of _build_like_or/_build_like_clauses (graph_ analyseDatabase, graph_detectCycles, graph_connectedComponents); replace with shared import. - Deferred to phase 2: _UnionFind consolidation (recursion bug in graph_detectCycles.find()), _build_excl_* parameterisation. * �[200~fix(graph): remove unmatched brace in graph_connectedComponents tool descriptor GRAPH_CONNECTED_COMPONENTS_TOOL had a duplicate closing brace at line 481 in the parameters dict, causing a SyntaxError at import time. beartype's import hook surfaced the error during package load, which caused the entire graph package to fail silently — all seven graph tools were unregistered with no server-side warning. Removed the spurious at line 481. Root cause: raw dict tool descriptors have no structural validation at definition time. A future refactor to dataclass-based ToolDescriptor would catch this class of error at module load rather than requiring manual import tracing. * fix: resolve ruff and mypy CI failures in graph tools - Rename camelCase graph module files to snake_case (N999) - Update all imports in __init__.py and graph_tools.py to match new names - Lowercase WHITE/GREY/BLACK constants used as local variables (N806) - Replace set comprehensions with set() calls (C416) - Rename unused loop variable comp_root to _comp_root (B007) - Remove trailing whitespace from SQL strings (W291) - Type _parent dict as dict[str, str] in UnionFind classes (mypy no-any-return) - Change stack type annotation from object to Iterator[str] (mypy call-overload) - Annotate type_counts and db_counts dicts explicitly (mypy var-annotated) - Annotate upstream/downstream_level and nearest_root_val as Optional (mypy assignment) - Rename members to cycle_members to avoid conflicting type inference (mypy assignment) - Annotate rows as list[dict[str, Any]] for sort key compatibility (mypy arg-type) * style: apply ruff formatting to graph module and app.py --------- Co-authored-by: earthshiner <paul.dancer@gmail.com> Co-authored-by: Paul Dancer <paul.dancer@teradata.com>
1 parent a214bcf commit c9bf758

16 files changed

Lines changed: 5916 additions & 12 deletions

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,4 @@ test_*.py
1818
.planning/
1919
.ruff_cache/
2020
Start_MCP_Server.bat
21+

src/teradata_mcp_server/app.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
from teradata_mcp_server.config import Settings
3535
from teradata_mcp_server.middleware import RequestContextMiddleware
3636
from teradata_mcp_server.tools import ContextCatalog
37+
from teradata_mcp_server.tools.graph.graph_edge_contract import GRAPH_EDGE_CONTRACT
3738
from teradata_mcp_server.tools.utils import (
3839
convert_tdml_docstring_to_mcp_docstring,
3940
execute_analytic_function,
@@ -1287,5 +1288,19 @@ def get_glossary_term(term_name: str) -> dict[str, Any]:
12871288
else:
12881289
return {"error": f"Glossary term not found: {term_name}"}
12891290

1291+
# ── Graph Edge Contract Resource ──────────────────────────────────────
1292+
# Always registered (static content, no YAML dependency).
1293+
# AI agents retrieve this to understand the edge_repository schema
1294+
# required by all graph_* tools.
1295+
# ──────────────────────────────────────────────────────────────────────
1296+
if any(re.match(pattern, "graph_edge_contract") for pattern in config.get("resource", [])):
1297+
1298+
@mcp.resource("graph://edge-contract")
1299+
def get_graph_edge_contract() -> str:
1300+
"""Return the Graph Edge Contract schema definition."""
1301+
return GRAPH_EDGE_CONTRACT
1302+
1303+
logger.info("Registered resource: graph_edge_contract")
1304+
12901305
# Return the configured app and some handles used by the entrypoint if needed
12911306
return mcp, logger

src/teradata_mcp_server/config/profiles.yml

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ eda:
4444
- "base_(?!(writeQuery|dynamicQuery)$).*"
4545
- qlty_.*
4646
- sec_userDbPermissions
47-
47+
4848
bar:
4949
tool:
5050
- ^bar_*
@@ -60,4 +60,13 @@ llmUser:
6060
- ^base_*
6161
- ^chat_*
6262
prompt:
63-
- ^chat_*
63+
- ^chat_*
64+
65+
graph:
66+
tool:
67+
- ^graph_.*
68+
prompt:
69+
- ^graph_.*
70+
resource:
71+
- ^graph_edge_contract$
72+

0 commit comments

Comments
 (0)