-
Notifications
You must be signed in to change notification settings - Fork 158
Respect catchup flag during full refresh operations for materialized views
#589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Respect catchup flag during full refresh operations for materialized views
#589
Conversation
|
Thanks for the contribution! :D
I'm thinking that maybe we just reuse the |
catchup_on_full_refresh optioncatchup option also works on re-deployments
…ouse#587) * Prepare v1.9.8 release: Increase version and update changelog
95644cc to
cafadb6
Compare
catchup option also works on re-deploymentscatchup flag during full refresh operations for materialized views
# Conflicts: # CHANGELOG.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
tests/integration/adapter/materialized_view/test_materialized_view.py
Outdated
Show resolved
Hide resolved
koletzilla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Leaving you a few small comments, but everything else looks good to me
| If catchup is False, creates an empty table without backfilling. | ||
| #} | ||
| {% macro clickhouse__create_target_table(relation, sql, catchup=True) -%} | ||
| {% set catchup_data = catchup if catchup is not none else config.get("catchup", True) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already getting the value of catchup in the materialization macro, looks like we don't need to get it again nor adding a default value to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
CHANGELOG.md
Outdated
| * Bump minimum `dbt-adapters` version to 1.16.7 to fix a compatibility issue that breaks tests if an older version is installed ([#578](https://github.com/ClickHouse/dbt-clickhouse/pull/578)). | ||
| * It is now possible to use an empty `local_suffix` configuration ([#569](https://github.com/ClickHouse/dbt-clickhouse/pull/569)). | ||
| * Column order is now respected when using incremental materialization with contracts ([#575](https://github.com/ClickHouse/dbt-clickhouse/pull/575)). | ||
| * Respect `catchup` configuration flag during full refresh operations for materialized views. When `catchup: False` is set, the target table will not be backfilled with historical data during full refresh, providing consistent behavior across initial creation and redeployment scenarios ([#XXX](https://github.com/ClickHouse/dbt-clickhouse/pull/XXX)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR was created when the release of the 1.9.8 was ongoing but now it's already released. Would you move this point to the 1.9.9 release notes? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| on_schema_change=var('on_schema_change', 'ignore'), | ||
| schema='catchup' if var('run_type', '') == 'catchup' else 'custom_schema', | ||
| **({'catchup': False} if var('run_type', '') == 'catchup' else {}) | ||
| schema=( | ||
| 'catchup' if var('run_type', '') == 'catchup' else | ||
| 'catchup_initial' if var('run_type', '') == 'catchup_initial' else | ||
| 'catchup_full_refresh_disabled' if var('run_type', '') in ['catchup_full_refresh_disabled', 'catchup_full_refresh_disabled_extended'] else | ||
| 'catchup_full_refresh_enabled' if var('run_type', '') in ['catchup_full_refresh_enabled', 'catchup_full_refresh_enabled_extended'] else | ||
| 'catchup_toggle' if var('run_type', '') in ['catchup_toggle', 'catchup_toggle_extended'] else | ||
| 'catchup_no_exchange' if var('run_type', '') in ['catchup_no_exchange', 'catchup_no_exchange_extended'] else | ||
| 'custom_schema' | ||
| ), | ||
| **({'catchup': False} if var('run_type', '') in ['catchup', 'catchup_initial', 'catchup_full_refresh_disabled', 'catchup_full_refresh_disabled_extended', 'catchup_toggle', 'catchup_no_exchange', 'catchup_no_exchange_extended'] else {}) | ||
| ) }} | ||
| {% if var('run_type', '') in ['', 'catchup', 'view_conversion'] %} | ||
| {% if var('run_type', '') in ['', 'catchup', 'catchup_initial', 'catchup_full_refresh_disabled', 'catchup_full_refresh_enabled', 'catchup_toggle', 'catchup_no_exchange', 'view_conversion'] %} | ||
| select |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel these huge ifs got a bit too complicated to read and understand. Can we instead just use three vars like these?
schema_nameto get the schema name.catchupto manage when to apply{'catchup': False}.use_updated_schemato decide if use the sql extended.
This way all the combinations would be easier to understand.
I have not fully tested this, but this part would look like:
...
schema=var('schema_name', 'custom_schema'),
**({'catchup': False} if not var('catchup', True) else {})
) }}
{% if not var('use_updated_schema', false) %}
select
id,
name,
case
when name like 'Dade' then 'crash_override'
when name like 'Kate' then 'acid burn'
else 'N/A'
end as hacker_alias
from {{ source('raw', 'people') }}
where department = 'engineering'
{% else %}
select
id,
name,
case
when name like 'Dade' and age = 11 then 'zero cool'
when name like 'Dade' and age != 11 then 'crash override'
when name like 'Kate' then 'acid burn'
else 'N/A'
end as hacker_alias,
id as id2
from {{ source('raw', 'people') }}
where department = 'engineering'
{% endif %}And then each test will be just. like...
def test_initial_creation_catchup_disabled(self, project):
# ...
run_vars = {"schema_name": "catchup_initial", "catchup": False}
results = run_dbt(["run", "--vars", json.dumps(run_vars)])
# ...
def test_full_refresh_catchup_disabled(self, project):
# First run
run_vars = {"schema_name": "catchup_disabled", "catchup": False}
results = run_dbt(["run", "--vars", json.dumps(run_vars)])
# ...
# Second run with schema change
run_vars = {"schema_name": "catchup_disabled", "catchup": False, "use_updated_schema": True}
run_dbt(["run", "--full-refresh", "--vars", json.dumps(run_vars)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, can do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Summary
Fixes #506.
Problem
The
catchupconfiguration flag for materialized views was only respected during initial creation. When performing a full refresh (dbt run --full-refresh), the target table would always be backfilled with historical data regardless of thecatchup: Falsesetting, leading to inconsistent behavior.Solution
Extended the
catchupflag to control backfilling behavior during full refresh operations across all code paths:can_exchange=True(modern ClickHouse versions)can_exchange=False(older versions or specific database engines)When
catchup: Falseis set, the target table will not be backfilled with historical data during either initial creation or full refresh, providing consistent behavior across all deployment scenarios.Implementation Details
catchup_dataas a global variable (line 11) to make it available throughout the materializationclickhouse__create_target_tablemacro (lines 138-147) to consolidate table creation logic and eliminate code duplicationclickhouse__replace_mvmacro to accept and use thecatchupparameter (line 266)TestCatchupclass) with 5 tests covering:catchup=Falsecatchup=TruebehaviorBreaking Changes
None - the default behavior (
catchup: True) remains unchanged.Checklist
Note
Ensures consistent
catchupbehavior for materialized views during initial creation and full refresh.catchupthrough all MV paths; whencatchup: False, targettableis created empty (no backfill) during both create and full refreshclickhouse__create_target_tableconsolidates table creation/backfill logic and is used byclickhouse__get_create_materialized_view_as_sqlandclickhouse__replace_mvcatchup_dataonce and pass to relevant macros; update replace path to acceptcatchupTestCatchupsuite covering initial create, full refresh (exchange and replace), toggling flag, and default behavior; simplify model vars (use_view,use_updated_schema,schema_name,catchup); remove legacy catchup testCHANGELOG.mdwith the improvementWritten by Cursor Bugbot for commit 4dbed82. This will update automatically on new commits. Configure here.