Skip to content

Conversation

@dlouseiro
Copy link
Contributor

@dlouseiro dlouseiro commented Jan 12, 2026

Summary

Fixes #506.

Problem

The catchup configuration flag for materialized views was only respected during initial creation. When performing a full refresh (dbt run --full-refresh), the target table would always be backfilled with historical data regardless of the catchup: False setting, leading to inconsistent behavior.

Solution

Extended the catchup flag to control backfilling behavior during full refresh operations across all code paths:

  • Atomic exchange path (line 76): When can_exchange=True (modern ClickHouse versions)
  • Replace MV path (line 274): When can_exchange=False (older versions or specific database engines)

When catchup: False is set, the target table will not be backfilled with historical data during either initial creation or full refresh, providing consistent behavior across all deployment scenarios.

Implementation Details

  • Set catchup_data as a global variable (line 11) to make it available throughout the materialization
  • Created new clickhouse__create_target_table macro (lines 138-147) to consolidate table creation logic and eliminate code duplication
  • Updated clickhouse__replace_mv macro to accept and use the catchup parameter (line 266)
  • Added comprehensive test suite (TestCatchup class) with 5 tests covering:
    • Initial creation with catchup=False
    • Full refresh with atomic exchange path
    • Full refresh with replace MV path (mocked)
    • Toggling catchup flag between runs
    • Control test with default catchup=True behavior

Breaking Changes

None - the default behavior (catchup: True) remains unchanged.

Checklist

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG
  • For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

Note

Ensures consistent catchup behavior for materialized views during initial creation and full refresh.

  • Thread catchup through all MV paths; when catchup: False, target table is created empty (no backfill) during both create and full refresh
  • New clickhouse__create_target_table consolidates table creation/backfill logic and is used by clickhouse__get_create_materialized_view_as_sql and clickhouse__replace_mv
  • Set catchup_data once and pass to relevant macros; update replace path to accept catchup
  • Tests: add TestCatchup suite covering initial create, full refresh (exchange and replace), toggling flag, and default behavior; simplify model vars (use_view, use_updated_schema, schema_name, catchup); remove legacy catchup test
  • Update CHANGELOG.md with the improvement

Written by Cursor Bugbot for commit 4dbed82. This will update automatically on new commits. Configure here.

@koletzilla
Copy link
Contributor

koletzilla commented Jan 12, 2026

Thanks for the contribution! :D

Add new config option catchup_on_full_refresh which ignores table backfilling when this setting is set to False, similar to what happens with the catchup option

I'm thinking that maybe we just reuse the catchup variable to also apply for the backfilling when the table is recreated, so looks like we may not need a new variable for that. What do you think?

@koletzilla koletzilla added this to the v1.10.0 milestone Jan 15, 2026
@dlouseiro dlouseiro changed the title Add catchup_on_full_refresh option Ensure catchup option also works on re-deployments Jan 15, 2026
@dlouseiro dlouseiro force-pushed the dlouseiro/add-catchup-on-full-refresh branch from 95644cc to cafadb6 Compare January 15, 2026 19:08
@dlouseiro dlouseiro changed the title Ensure catchup option also works on re-deployments Respect catchup flag during full refresh operations for materialized views Jan 15, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Copy link
Contributor

@koletzilla koletzilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Leaving you a few small comments, but everything else looks good to me

If catchup is False, creates an empty table without backfilling.
#}
{% macro clickhouse__create_target_table(relation, sql, catchup=True) -%}
{% set catchup_data = catchup if catchup is not none else config.get("catchup", True) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already getting the value of catchup in the materialization macro, looks like we don't need to get it again nor adding a default value to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

CHANGELOG.md Outdated
* Bump minimum `dbt-adapters` version to 1.16.7 to fix a compatibility issue that breaks tests if an older version is installed ([#578](https://github.com/ClickHouse/dbt-clickhouse/pull/578)).
* It is now possible to use an empty `local_suffix` configuration ([#569](https://github.com/ClickHouse/dbt-clickhouse/pull/569)).
* Column order is now respected when using incremental materialization with contracts ([#575](https://github.com/ClickHouse/dbt-clickhouse/pull/575)).
* Respect `catchup` configuration flag during full refresh operations for materialized views. When `catchup: False` is set, the target table will not be backfilled with historical data during full refresh, providing consistent behavior across initial creation and redeployment scenarios ([#XXX](https://github.com/ClickHouse/dbt-clickhouse/pull/XXX)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was created when the release of the 1.9.8 was ongoing but now it's already released. Would you move this point to the 1.9.9 release notes? 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 30 to 44
on_schema_change=var('on_schema_change', 'ignore'),
schema='catchup' if var('run_type', '') == 'catchup' else 'custom_schema',
**({'catchup': False} if var('run_type', '') == 'catchup' else {})
schema=(
'catchup' if var('run_type', '') == 'catchup' else
'catchup_initial' if var('run_type', '') == 'catchup_initial' else
'catchup_full_refresh_disabled' if var('run_type', '') in ['catchup_full_refresh_disabled', 'catchup_full_refresh_disabled_extended'] else
'catchup_full_refresh_enabled' if var('run_type', '') in ['catchup_full_refresh_enabled', 'catchup_full_refresh_enabled_extended'] else
'catchup_toggle' if var('run_type', '') in ['catchup_toggle', 'catchup_toggle_extended'] else
'catchup_no_exchange' if var('run_type', '') in ['catchup_no_exchange', 'catchup_no_exchange_extended'] else
'custom_schema'
),
**({'catchup': False} if var('run_type', '') in ['catchup', 'catchup_initial', 'catchup_full_refresh_disabled', 'catchup_full_refresh_disabled_extended', 'catchup_toggle', 'catchup_no_exchange', 'catchup_no_exchange_extended'] else {})
) }}
{% if var('run_type', '') in ['', 'catchup', 'view_conversion'] %}
{% if var('run_type', '') in ['', 'catchup', 'catchup_initial', 'catchup_full_refresh_disabled', 'catchup_full_refresh_enabled', 'catchup_toggle', 'catchup_no_exchange', 'view_conversion'] %}
select
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel these huge ifs got a bit too complicated to read and understand. Can we instead just use three vars like these?

  • schema_name to get the schema name.
  • catchup to manage when to apply {'catchup': False}.
  • use_updated_schema to decide if use the sql extended.

This way all the combinations would be easier to understand.

I have not fully tested this, but this part would look like:

     ...
       schema=var('schema_name', 'custom_schema'),
       **({'catchup': False} if not var('catchup', True) else {})
) }}

{% if not var('use_updated_schema', false) %}
select
    id,
    name,
    case
        when name like 'Dade' then 'crash_override'
        when name like 'Kate' then 'acid burn'
        else 'N/A'
    end as hacker_alias
from {{ source('raw', 'people') }}
where department = 'engineering'

{% else %}
select
    id,
    name,
    case
        when name like 'Dade' and age = 11 then 'zero cool'
        when name like 'Dade' and age != 11 then 'crash override'
        when name like 'Kate' then 'acid burn'
        else 'N/A'
    end as hacker_alias,
    id as id2
from {{ source('raw', 'people') }}
where department = 'engineering'

{% endif %}

And then each test will be just. like...

def test_initial_creation_catchup_disabled(self, project):
    # ...
    run_vars = {"schema_name": "catchup_initial", "catchup": False}
    results = run_dbt(["run", "--vars", json.dumps(run_vars)])
    # ...

def test_full_refresh_catchup_disabled(self, project):
    # First run
    run_vars = {"schema_name": "catchup_disabled", "catchup": False}
    results = run_dbt(["run", "--vars", json.dumps(run_vars)])
    # ...
    
    # Second run with schema change
    run_vars = {"schema_name": "catchup_disabled", "catchup": False, "use_updated_schema": True}
    run_dbt(["run", "--full-refresh", "--vars", json.dumps(run_vars)])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, can do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

@dlouseiro dlouseiro requested a review from koletzilla January 16, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add option to ignore backfill of MV when redeploying a model

2 participants