Skip to content

feat(ingest/tableau): log Initial SQL + lineage and warn when none parses#17881

Open
treff7es wants to merge 1 commit into
masterfrom
feat/tableau-initial-sql-diagnostics
Open

feat(ingest/tableau): log Initial SQL + lineage and warn when none parses#17881
treff7es wants to merge 1 commit into
masterfrom
feat/tableau-initial-sql-diagnostics

Conversation

@treff7es

Copy link
Copy Markdown
Contributor

Summary

Adds diagnostics to the Tableau Initial SQL lineage path so a run that produces no Initial-SQL lineage can be debugged directly from the ingestion output, and so the most common silent-failure mode becomes visible.

Motivation: investigating a case where a data source's Initial SQL was captured as the initialSql custom property but produced no upstream lineage, with nothing in the report explaining why. The cause turned out to be Initial SQL whose line breaks were lost: a leading -- line comment with no terminating newline collapses the entire script into a single comment, so split_statements yields zero statements. This is not a parse failure, so it incremented no counter and emitted no warning — completely silent.

Changes

  • Per-data-source DEBUG log of the dataset URN, the raw Initial SQL (via repr(), so newline boundaries are visible), and the upstream URNs it produced. Run ingestion with --debug to see exactly what the connector received and what lineage it derived.
  • Structured report warning + counter (num_initial_sql_connections_without_statements) when a non-empty Initial SQL splits into zero statements. The warning context reports has_newline / has_line_comment, which immediately tells you whether the SQL arrived with its line breaks intact.
  • New counter num_initial_sql_statements_parsed for visibility into how many statements were extracted across all connections.

No change to lineage emission behavior.

Testing

  • New unit test test_get_initial_sql_lineage_flattened_comment_warns_and_counts covers the zero-statements case (raw SQL still captured, no upstreams, counter incremented, not counted as a parse failure).
  • Strengthened the existing multi-statement test to assert num_initial_sql_statements_parsed and that the zero-statements counter stays 0.
  • metadata-ingestion:lintFix and targeted mypy pass on the changed files; full tests/unit/tableau/test_tableau_initial_sql.py suite passes (47 tests).

Checklist

  • PR conforms to the Contributing Guideline (PR Title Format)
  • Tests added
  • Docs (none required — observability only)
  • Breaking changes (none)

…rses

Add diagnostics to the Tableau Initial SQL path so a run with missing
Initial SQL lineage can be debugged directly:

- Per data source, log (at DEBUG) the dataset URN, the raw Initial SQL
  (repr, so newline boundaries are visible), and the upstream URNs it
  produced. Run ingestion with --debug to see it.
- Emit a structured report warning + counter
  (num_initial_sql_connections_without_statements) when a non-empty
  Initial SQL splits into zero statements and therefore yields no
  lineage. This is otherwise silent (it is not a parse failure), and is
  the signal that the SQL lost its line breaks (e.g. `--` line comments
  with no terminating newline collapse the whole script into one
  comment).
- Add num_initial_sql_statements_parsed for visibility into how many
  statements were extracted.

No behavior change to lineage emission. Adds unit coverage for the
zero-statements case and asserts statement counts on the existing
multi-statement test.
@github-actions github-actions Bot added the ingestion PR or Issue related to the ingestion of metadata label Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@datahub-connector-tests

Copy link
Copy Markdown

Connector Tests Results

All connector tests passed for commit 7aaf58c

View full test logs →

To skip connector tests, add the skip-connector-tests label (org members only).

Autogenerated by the connector-tests CI pipeline.

@maggiehays maggiehays added the needs-review Label for PRs that need review from a maintainer. label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants