Skip to content

Teradata lineage and dependency graph analysis#302

Merged
dtehan-td merged 8 commits into
mainfrom
td_lineage_analysis
Apr 23, 2026
Merged

Teradata lineage and dependency graph analysis#302
dtehan-td merged 8 commits into
mainfrom
td_lineage_analysis

Conversation

@dtehan-td

Copy link
Copy Markdown
Collaborator

Summary

This PR introduces graph dependency analysis capability to the Teradata MCP Server with seven new tools for directed graph traversal across Teradata object lineage and data pipelines.

This supersedes PR #292 (same content, CI fixes applied).

New graph tools:

  • graph_edgeContractDDL — Generate edge repository DDL (no DB connection required)
  • graph_findRootObjects — Discover objects with no upstream dependencies
  • graph_bfsLevels — BFS wave planning and deployment sequencing
  • graph_traceLineage — Full lineage tracing and impact path analysis
  • graph_detectCycles — Circular reference detection and DAG validation
  • graph_connectedComponents — Graph partitioning and isolated sub-graph identification
  • graph_analyseDatabase — All four analyses in one call with one shared edge fetch

Infrastructure changes:

  • utils/__init__.py: create_response() now returns dict instead of JSON string (correct per MCP spec)
  • app.py: get_tdconn() simplified; graph://edge-contract MCP resource registered for graph profile
  • module_loader.py: graph prefix added to MODULE_MAP
  • config/profiles.yml: graph profile added

CI fixes applied on top of PR #292:

  • Renamed camelCase module files to snake_case (ruff N999)
  • Lowercased WHITE/GREY/BLACK local variables (ruff N806)
  • Fixed set comprehensions, unused loop vars, trailing whitespace (ruff C416/B007/W291)
  • Fixed mypy type errors in UnionFind classes, DFS stack types, and optional variable annotations

Test plan

  • graph_edgeContractDDL can be called without a DB connection and returns valid DDL
  • graph://edge-contract MCP resource is available when running with --profile graph
  • Existing tool profiles (base, dba, sec) unaffected by create_response() dict change
  • uv run ruff check src/ passes
  • uv run mypy src/ passes

earthshiner and others added 8 commits March 5, 2026 15:51
…dency analysis

Introduce a new MCP tool that provides comprehensive object dependency analysis
for Teradata databases with support for wildcards, CSV patterns, and bidirectional
dependency traversal.

Key Features:
- Analyses upstream dependencies (what an object depends on) and downstream
  dependencies (what depends on the object)
- Supports single objects, wildcard patterns (%), and CSV pattern lists
- Configurable traversal depth for both upstream (max_depth_up) and downstream
  (max_depth_down) analysis (0-10 levels)
- Server-side filtering with exclude_objects and include_containers parameters
- Returns dependency graph as nodes and edges for visualisation
- Multiple output formats: 'detailed', 'summary', 'edges_only'

Use Cases:
- Impact analysis: Determine blast radius before dropping/changing objects
- Data lineage tracing: Track upstream data sources
- Dependency discovery: Understand object relationships
- Pre-deployment validation: Assess impacts before changes
- Documentation: Map database object dependencies

Parameters:
- object_name (required): Object pattern(s) - supports wildcards and CSV
  Examples: 'DB.Table', '%WBC%.%', 'DB1.T1,DB2.T2'
- max_depth_up (default: 3): Upstream traversal depth (0-10)
- max_depth_down (default: 3): Downstream traversal depth (0-10)
- exclude_objects (default: ''): CSV patterns to exclude from analysis
- include_containers (default: ''): Whitelist of schemas/databases
- edge_repository (default: 'DEV_01_ODEX_STD_0_V.ODEXRepository'):
  ODEX repository table
- return_format (default: 'detailed'): Output format

Technical Implementation:
- Leverages ODEX repository for dependency metadata
- Uses STRTOK_SPLIT_TO_TABLE for server-side CSV parsing
- Automatic whitespace trimming of patterns
- Returns formatted response with dependency graph and metadata
- Performance optimised with proper exclusion patterns (20-50% reduction)

Example Usage:
  graph_queryDependenciesAgent(
    object_name="%WBC%.%,%StGeo%.%",
    max_depth_up=5,
    exclude_objects="PRD_%,TST_%"
  )

BREAKING CHANGE: None - new feature addition
…etectCycles, graph_connectedComponents and _graph_bfsLevels

replace QueryDependenciesAgent with QueryDependenciesAgentBatch (better performance).
Added findRootObjects to find source objects to start analysing downstream graphs
Added graph_detectCycles to identify circular references
Add graph_connectedComponents to identify groups of connected component (groups of closely related objects)
And Added graph_bfsLevels using a Breadth First Search  for use in Object Migration Wave planning
…dComponents, detectCycles, findRootObjects, edgeContract

- Replaced monolithic queryDependenciesAgent with modular graph tools
- Added _graph_utils shared utility module
- Removed graph_prompts.yml and legacy documentation
- Updated app.py and profiles.yml for graph tool registration
refactor(graph): compliance pass, contract v1.1, helper consolidation

BREAKING CHANGES
- graph_queryDependenciesAgent renamed to graph_traceLineage (file,
  function, constant, tool name string). Update any callers accordingly.
- graph_detectCycles: strategy and max_edges_for_cte parameters removed.
- graph_detectCycles, graph_connectedComponents: object_dependency_table
  renamed to edge_repository; excl_patterns renamed to exclude_objects.
- graph_edgeContractDDL: generated DDL column names corrected from
  SrcContainer/SrcObject/SrcKind to Src_Container_Name/Src_Object_Name/
  Src_Kind (and Tgt equivalents). Previously generated tables were
  incompatible with the tool SQL. Contract version bumped to 1.1.

PROGRESSIVE DISCLOSURE COMPLIANCE
- graph_tools.py: graph_analyseDatabase and graph_edgeContractDDL were
  missing from GRAPH_TOOLS. All 7 tools now registered in workflow order:
  edgeContractDDL → findRootObjects → bfsLevels → traceLineage →
  detectCycles → connectedComponents → analyseDatabase.
- GRAPH_EDGE_CONTRACT_DDL_TOOL descriptor added to graph_edge_contract.py
  (was absent entirely; tool was unregisterable in static mode).

TERMINOLOGY
- Remove all ODEX references from __init__.py and _graph_utils.py per
  standing instruction. Replaced with generic terms (dependency graph,
  object dependency graph).

LOGGING
- Replace all f-string logger calls with %s style throughout
  graph_findRootObjects.py (5 calls), graph_bfsLevels.py (1 call), and
  graph_edge_contract.py (2 calls, including logger.warning).
- Remove stray print() from graph_findRootObjects.py; replaced with
  logger.debug.

PARAMETER CHANGES
- edge_repository: runtime validation added to all 6 tools that accept it.
  Empty string now returns an early error with the AI-Native Data Product
  convention hint ({ProductName}_Semantic.lineage_graph).
- graph_bfsLevels, graph_traceLineage, graph_detectCycles,
  graph_connectedComponents: stale cross-references to
  graph_queryDependenciesAgent updated to graph_traceLineage throughout
  docstrings and descriptors.

GRAPH EDGE CONTRACT v1.1
- Column names corrected throughout: DDL, sample DML, view template,
  COMMENT ON COLUMN, canonical contract text, file header.
- Optional enrichment columns added: Edge_Relationship VARCHAR(50),
  Transformation_Type VARCHAR(50). Ignored by graph analysis tools;
  present in {ProductName}_Semantic.lineage_graph for visualisation
  clients. ADDITIONAL COLUMNS section updated accordingly.
- Src_Kind/Tgt_Kind COMPRESS lists expanded to cover both single-letter
  codes (T, V, P...) and full-word values (Table, View, Job...) to match
  lineage_graph output.
- Sample DML updated: basic examples use 6-column form; new ETL-job
  example demonstrates source→job→target two-leg pattern using all 8
  columns.
- View template updated: optional columns included as nullable
  CAST(NULL AS VARCHAR(50)) placeholders with mapping guidance.
- AI-Native Data Product convention documented in file header, contract
  text, docstring, descriptor, and all edge_repository error messages.

HELPER CONSOLIDATION (phase 1 — safe mechanical changes only)
- _graph_utils.py: add parse_csv_patterns() and build_like_or().
- Remove 7 local copies of parse_csv/_parse_csv_patterns (graph_
  analyseDatabase, graph_bfsLevels, graph_detectCycles, graph_connected
  Components, graph_traceLineage, graph_findRootObjects ×2); replace
  with shared import.
- Remove 3 local copies of _build_like_or/_build_like_clauses (graph_
  analyseDatabase, graph_detectCycles, graph_connectedComponents);
  replace with shared import.
- Deferred to phase 2: _UnionFind consolidation (recursion bug in
  graph_detectCycles.find()), _build_excl_* parameterisation.
… tool descriptor

GRAPH_CONNECTED_COMPONENTS_TOOL had a duplicate closing brace at line 481
in the parameters dict, causing a SyntaxError at import time. beartype's
import hook surfaced the error during package load, which caused the entire
graph package to fail silently — all seven graph tools were unregistered
with no server-side warning.

Removed the spurious  at line 481.

Root cause: raw dict tool descriptors have no structural validation at
definition time. A future refactor to dataclass-based ToolDescriptor would
catch this class of error at module load rather than requiring manual
import tracing.
- Rename camelCase graph module files to snake_case (N999)
- Update all imports in __init__.py and graph_tools.py to match new names
- Lowercase WHITE/GREY/BLACK constants used as local variables (N806)
- Replace set comprehensions with set() calls (C416)
- Rename unused loop variable comp_root to _comp_root (B007)
- Remove trailing whitespace from SQL strings (W291)
- Type _parent dict as dict[str, str] in UnionFind classes (mypy no-any-return)
- Change stack type annotation from object to Iterator[str] (mypy call-overload)
- Annotate type_counts and db_counts dicts explicitly (mypy var-annotated)
- Annotate upstream/downstream_level and nearest_root_val as Optional (mypy assignment)
- Rename members to cycle_members to avoid conflicting type inference (mypy assignment)
- Annotate rows as list[dict[str, Any]] for sort key compatibility (mypy arg-type)
@dtehan-td dtehan-td merged commit c9bf758 into main Apr 23, 2026
3 checks passed
@dtehan-td dtehan-td deleted the td_lineage_analysis branch April 23, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants