Skip to content

feat: add skip_merge_on_empty_source incremental config#1410

Open
moomindani wants to merge 1 commit intodatabricks:mainfrom
moomindani:feat/skip-merge-on-empty-source
Open

feat: add skip_merge_on_empty_source incremental config#1410
moomindani wants to merge 1 commit intodatabricks:mainfrom
moomindani:feat/skip-merge-on-empty-source

Conversation

@moomindani
Copy link
Copy Markdown

Summary

Adds an opt-in incremental config skip_merge_on_empty_source that
bypasses MERGE and all associated metadata queries (DESCRIBE, SHOW
TBLPROPERTIES, constraint/tag/mask lookups) when the compiled source
SELECT returns zero rows.

Motivation

Customers running dbt run on a cadence against sources that receive
deltas sporadically pay ~4–7 s per incremental model per run on:

  • CREATE OR REPLACE TEMPORARY VIEW for the compiled source SELECT
  • 3–4 × DESCRIBE TABLE EXTENDED … AS JSON
  • SHOW TBLPROPERTIES
  • information_schema lookups (tags, column_tags, column_masks, constraints)
  • MERGE INTO planning + Delta no-op commit

Even with incremental_apply_config_changes: false, the structural
overhead (temp view, DESCRIBE, MERGE planning) remains. On a real workload
with 11 light incremental stg models, this totals ~30–50 s of wall-clock
time per no-op run.

Snowflake short-circuits empty MERGE sources in ~1–2 s; this flag closes
the gap for workloads where deltas are often empty.

Design

  • DatabricksConfig.skip_merge_on_empty_source: Optional[bool] = None
    (default false – opt-in, no behavior change for existing projects).
  • Two helper macros:
    • source_has_rows(compiled_code): issues SELECT 1 FROM (<compiled_code>) LIMIT 1.
    • should_skip_merge_on_empty_source(...): checks the flag, SQL
      language, execute mode, then calls source_has_rows. On empty
      source, fires a no-op main statement + apply_grants + logs.
  • Short-circuit call inserted in both V1 and V2 merge branches. V1 is
    placed before create_temp_relation/get_relation_config so we skip
    all metadata queries too; V2 is placed after the intermediate table
    creation (kept conservative to preserve pre-hook side-effects).

Measured impact

On a light merge model with ~0 row delta, wall-clock goes from ~7 s to
~1 s. For a project with 11 such models, saves ~60 s / run on the
incremental path without changing any other behavior.

Test plan

  • Added functional test tests/functional/adapter/incremental/test_incremental_skip_on_empty_source.py
    • Short-circuit fires and table unchanged after 2nd no-op run (V1 and V2)
    • Default-off: MERGE still runs when flag is unset
  • Manually validated on a live Databricks SQL Warehouse against a
    real incremental merge model: 7.0 s → 1.17 s on no-op run

Adds an opt-in incremental config that bypasses MERGE and all associated
metadata queries (DESCRIBE, SHOW TBLPROPERTIES, constraint/tag/mask
lookups) when the compiled source SELECT returns zero rows.

Motivating case: customers who run `dbt run` on a schedule against source
tables that receive deltas sporadically. Today, each incremental model
still pays ~4-7s per run on temp view creation + metadata queries +
MERGE planning even when there is nothing to merge. With
`skip_merge_on_empty_source: true`, the materialization runs a cheap
`SELECT 1 FROM (<compiled>) LIMIT 1` probe and, if empty, returns early
after firing pre/post hooks and a no-op `main` statement.

Scope:
- V1 (`use_materialization_v2: false`) and V2 paths both honor the flag
- Default is `false` (opt-in, no behavior change for existing projects)
- SQL language only (Python models fall through to the standard path)

Files:
- `dbt/adapters/databricks/impl.py`: new `skip_merge_on_empty_source`
  field on `DatabricksConfig`
- `dbt/include/databricks/macros/materializations/incremental/incremental.sql`:
  two helper macros (`source_has_rows`, `should_skip_merge_on_empty_source`)
  and short-circuit calls in the V1/V2 merge branches
- `tests/functional/adapter/incremental/test_incremental_skip_on_empty_source.py`:
  functional tests covering the short-circuit path and the default-off
  behavior under both V1 and V2
- `CHANGELOG.md`: Features entry

Co-authored-by: Isaac
@moomindani moomindani marked this pull request as ready for review April 21, 2026 04:20
@github-actions
Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  dbt/adapters/databricks
  impl.py
Project Total  

This report was generated by python-coverage-comment-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant