Teradata lineage and dependency graph analysis#302
Merged
Conversation
…dency analysis
Introduce a new MCP tool that provides comprehensive object dependency analysis
for Teradata databases with support for wildcards, CSV patterns, and bidirectional
dependency traversal.
Key Features:
- Analyses upstream dependencies (what an object depends on) and downstream
dependencies (what depends on the object)
- Supports single objects, wildcard patterns (%), and CSV pattern lists
- Configurable traversal depth for both upstream (max_depth_up) and downstream
(max_depth_down) analysis (0-10 levels)
- Server-side filtering with exclude_objects and include_containers parameters
- Returns dependency graph as nodes and edges for visualisation
- Multiple output formats: 'detailed', 'summary', 'edges_only'
Use Cases:
- Impact analysis: Determine blast radius before dropping/changing objects
- Data lineage tracing: Track upstream data sources
- Dependency discovery: Understand object relationships
- Pre-deployment validation: Assess impacts before changes
- Documentation: Map database object dependencies
Parameters:
- object_name (required): Object pattern(s) - supports wildcards and CSV
Examples: 'DB.Table', '%WBC%.%', 'DB1.T1,DB2.T2'
- max_depth_up (default: 3): Upstream traversal depth (0-10)
- max_depth_down (default: 3): Downstream traversal depth (0-10)
- exclude_objects (default: ''): CSV patterns to exclude from analysis
- include_containers (default: ''): Whitelist of schemas/databases
- edge_repository (default: 'DEV_01_ODEX_STD_0_V.ODEXRepository'):
ODEX repository table
- return_format (default: 'detailed'): Output format
Technical Implementation:
- Leverages ODEX repository for dependency metadata
- Uses STRTOK_SPLIT_TO_TABLE for server-side CSV parsing
- Automatic whitespace trimming of patterns
- Returns formatted response with dependency graph and metadata
- Performance optimised with proper exclusion patterns (20-50% reduction)
Example Usage:
graph_queryDependenciesAgent(
object_name="%WBC%.%,%StGeo%.%",
max_depth_up=5,
exclude_objects="PRD_%,TST_%"
)
BREAKING CHANGE: None - new feature addition
…etectCycles, graph_connectedComponents and _graph_bfsLevels replace QueryDependenciesAgent with QueryDependenciesAgentBatch (better performance). Added findRootObjects to find source objects to start analysing downstream graphs Added graph_detectCycles to identify circular references Add graph_connectedComponents to identify groups of connected component (groups of closely related objects) And Added graph_bfsLevels using a Breadth First Search for use in Object Migration Wave planning
…dComponents, detectCycles, findRootObjects, edgeContract - Replaced monolithic queryDependenciesAgent with modular graph tools - Added _graph_utils shared utility module - Removed graph_prompts.yml and legacy documentation - Updated app.py and profiles.yml for graph tool registration
refactor(graph): compliance pass, contract v1.1, helper consolidation
BREAKING CHANGES
- graph_queryDependenciesAgent renamed to graph_traceLineage (file,
function, constant, tool name string). Update any callers accordingly.
- graph_detectCycles: strategy and max_edges_for_cte parameters removed.
- graph_detectCycles, graph_connectedComponents: object_dependency_table
renamed to edge_repository; excl_patterns renamed to exclude_objects.
- graph_edgeContractDDL: generated DDL column names corrected from
SrcContainer/SrcObject/SrcKind to Src_Container_Name/Src_Object_Name/
Src_Kind (and Tgt equivalents). Previously generated tables were
incompatible with the tool SQL. Contract version bumped to 1.1.
PROGRESSIVE DISCLOSURE COMPLIANCE
- graph_tools.py: graph_analyseDatabase and graph_edgeContractDDL were
missing from GRAPH_TOOLS. All 7 tools now registered in workflow order:
edgeContractDDL → findRootObjects → bfsLevels → traceLineage →
detectCycles → connectedComponents → analyseDatabase.
- GRAPH_EDGE_CONTRACT_DDL_TOOL descriptor added to graph_edge_contract.py
(was absent entirely; tool was unregisterable in static mode).
TERMINOLOGY
- Remove all ODEX references from __init__.py and _graph_utils.py per
standing instruction. Replaced with generic terms (dependency graph,
object dependency graph).
LOGGING
- Replace all f-string logger calls with %s style throughout
graph_findRootObjects.py (5 calls), graph_bfsLevels.py (1 call), and
graph_edge_contract.py (2 calls, including logger.warning).
- Remove stray print() from graph_findRootObjects.py; replaced with
logger.debug.
PARAMETER CHANGES
- edge_repository: runtime validation added to all 6 tools that accept it.
Empty string now returns an early error with the AI-Native Data Product
convention hint ({ProductName}_Semantic.lineage_graph).
- graph_bfsLevels, graph_traceLineage, graph_detectCycles,
graph_connectedComponents: stale cross-references to
graph_queryDependenciesAgent updated to graph_traceLineage throughout
docstrings and descriptors.
GRAPH EDGE CONTRACT v1.1
- Column names corrected throughout: DDL, sample DML, view template,
COMMENT ON COLUMN, canonical contract text, file header.
- Optional enrichment columns added: Edge_Relationship VARCHAR(50),
Transformation_Type VARCHAR(50). Ignored by graph analysis tools;
present in {ProductName}_Semantic.lineage_graph for visualisation
clients. ADDITIONAL COLUMNS section updated accordingly.
- Src_Kind/Tgt_Kind COMPRESS lists expanded to cover both single-letter
codes (T, V, P...) and full-word values (Table, View, Job...) to match
lineage_graph output.
- Sample DML updated: basic examples use 6-column form; new ETL-job
example demonstrates source→job→target two-leg pattern using all 8
columns.
- View template updated: optional columns included as nullable
CAST(NULL AS VARCHAR(50)) placeholders with mapping guidance.
- AI-Native Data Product convention documented in file header, contract
text, docstring, descriptor, and all edge_repository error messages.
HELPER CONSOLIDATION (phase 1 — safe mechanical changes only)
- _graph_utils.py: add parse_csv_patterns() and build_like_or().
- Remove 7 local copies of parse_csv/_parse_csv_patterns (graph_
analyseDatabase, graph_bfsLevels, graph_detectCycles, graph_connected
Components, graph_traceLineage, graph_findRootObjects ×2); replace
with shared import.
- Remove 3 local copies of _build_like_or/_build_like_clauses (graph_
analyseDatabase, graph_detectCycles, graph_connectedComponents);
replace with shared import.
- Deferred to phase 2: _UnionFind consolidation (recursion bug in
graph_detectCycles.find()), _build_excl_* parameterisation.
… tool descriptor GRAPH_CONNECTED_COMPONENTS_TOOL had a duplicate closing brace at line 481 in the parameters dict, causing a SyntaxError at import time. beartype's import hook surfaced the error during package load, which caused the entire graph package to fail silently — all seven graph tools were unregistered with no server-side warning. Removed the spurious at line 481. Root cause: raw dict tool descriptors have no structural validation at definition time. A future refactor to dataclass-based ToolDescriptor would catch this class of error at module load rather than requiring manual import tracing.
- Rename camelCase graph module files to snake_case (N999) - Update all imports in __init__.py and graph_tools.py to match new names - Lowercase WHITE/GREY/BLACK constants used as local variables (N806) - Replace set comprehensions with set() calls (C416) - Rename unused loop variable comp_root to _comp_root (B007) - Remove trailing whitespace from SQL strings (W291) - Type _parent dict as dict[str, str] in UnionFind classes (mypy no-any-return) - Change stack type annotation from object to Iterator[str] (mypy call-overload) - Annotate type_counts and db_counts dicts explicitly (mypy var-annotated) - Annotate upstream/downstream_level and nearest_root_val as Optional (mypy assignment) - Rename members to cycle_members to avoid conflicting type inference (mypy assignment) - Annotate rows as list[dict[str, Any]] for sort key compatibility (mypy arg-type)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces graph dependency analysis capability to the Teradata MCP Server with seven new tools for directed graph traversal across Teradata object lineage and data pipelines.
This supersedes PR #292 (same content, CI fixes applied).
New graph tools:
graph_edgeContractDDL— Generate edge repository DDL (no DB connection required)graph_findRootObjects— Discover objects with no upstream dependenciesgraph_bfsLevels— BFS wave planning and deployment sequencinggraph_traceLineage— Full lineage tracing and impact path analysisgraph_detectCycles— Circular reference detection and DAG validationgraph_connectedComponents— Graph partitioning and isolated sub-graph identificationgraph_analyseDatabase— All four analyses in one call with one shared edge fetchInfrastructure changes:
utils/__init__.py:create_response()now returnsdictinstead of JSON string (correct per MCP spec)app.py:get_tdconn()simplified;graph://edge-contractMCP resource registered forgraphprofilemodule_loader.py:graphprefix added toMODULE_MAPconfig/profiles.yml:graphprofile addedCI fixes applied on top of PR #292:
WHITE/GREY/BLACKlocal variables (ruff N806)Test plan
graph_edgeContractDDLcan be called without a DB connection and returns valid DDLgraph://edge-contractMCP resource is available when running with--profile graphcreate_response()dict changeuv run ruff check src/passesuv run mypy src/passes