Skip to content

feat: emit tagged predicate edges for JOIN ON clause columns#62

Merged
mingjerli merged 2 commits into
mainfrom
feature/gap7-join-predicate-columns
Apr 14, 2026
Merged

feat: emit tagged predicate edges for JOIN ON clause columns#62
mingjerli merged 2 commits into
mainfrom
feature/gap7-join-predicate-columns

Conversation

@mingjerli
Copy link
Copy Markdown
Owner

Summary

  • JOIN predicate lineage: ON-clause columns (equi-join keys, range predicates, function-wrapped keys) now produce is_join_predicate=True edges to projected output columns from the joined table
  • Metadata: Each predicate edge carries join_condition (raw ON-clause SQL) and join_side ("left"/"right") for filtering and analysis
  • Gap 4 composition: Predicate edges correctly resolve through self-read nodes in self-referencing pipelines (MERGE+INSERT SCD2 patterns)

Closes Gap 7 from the CDC/SCD pipeline gap analysis (docs/superpowers/specs/2026-04-13-gap7-join-predicate-columns-design.md).

Files Changed

File Change
models.py Added JoinPredicateInfo dataclass, join_predicates on QueryUnit, is_join_predicate/join_condition/join_side on ColumnEdge
query_parser.py ON-clause column extraction in _parse_select_unit, _extract_join_predicate_columns, _get_join_type, _get_join_right_table
lineage_builder.py _create_join_predicate_edges, _resolve_join_predicate_column (step 9 in _process_unit)
pipeline_lineage_builder.py 3 new fields in _add_query_edges explicit copy list
test_join_predicate_columns.py 31 tests: CDC point-in-time, band join, function-based, multi-chain, dialect consistency, Gap 4 interaction, impact analysis
test_unqualified_column_resolution.py 3 tests updated to filter predicate edges in count assertions

Test plan

  • All 31 new tests pass (tests/test_join_predicate_columns.py)
  • All 1448 existing tests pass (0 regressions, 3 tests updated for predicate edge filtering)
  • Pre-commit hooks pass (ruff format, ruff lint)
  • CI pipeline passes

JOIN ON predicate columns (e.g., temporal range joins, equi-join keys)
previously produced zero column-lineage edges, making them invisible in
impact analysis. This adds is_join_predicate=True edges from ON-clause
columns to projected output columns from the joined table, with
join_condition and join_side metadata.

Supports equi-joins, range/BETWEEN, function-wrapped keys, multi-join
chains, and composes with Gap 4 self-read nodes for self-referencing
pipelines.

Closes Gap 7 from the CDC/SCD pipeline gap analysis.
…dicates

Two new example notebooks demonstrating Gap 4 and Gap 7 features:

- self_referencing_lineage.ipynb: Single-statement self-ref, SCD2 MERGE+INSERT,
  impact analysis through self-read chains, edge annotations
- join_predicate_lineage.ipynb: Equi-join predicates, point-in-time BETWEEN joins,
  multi-join chain scoping, impact analysis with predicate filtering
@mingjerli mingjerli merged commit d0d72e2 into main Apr 14, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant