Skip to content

fix(snapshot): support dbt_valid_to_current config in snapshot macros#630

Open
PSalinovic wants to merge 6 commits into
ClickHouse:mainfrom
PSalinovic:fix-dbt-valid-to-current
Open

fix(snapshot): support dbt_valid_to_current config in snapshot macros#630
PSalinovic wants to merge 6 commits into
ClickHouse:mainfrom
PSalinovic:fix-dbt-valid-to-current

Conversation

@PSalinovic
Copy link
Copy Markdown

@PSalinovic PSalinovic commented Apr 2, 2026

Summary

The ClickHouse adapter's custom snapshot staging table macros for both
timestamp and check strategies ignored the dbt_valid_to_current config
option. Current records were always filtered with WHERE dbt_valid_to IS
NULL, which missed rows that had the configured sentinel value (e.g.
'9999-12-31'). This caused duplicate rows on every subsequent snapshot
run when dbt_valid_to_current was configured.

The fix reads config.get('dbt_valid_to_current') and:

  • Filters current records with (dbt_valid_to = OR dbt_valid_to IS NULL)
  • Sets dbt_valid_to for new insertions using coalesce(nullif(x, x), )
    The OR IS NULL fallback ensures backward compatibility with rows
    snapshotted before dbt_valid_to_current was configured.

Fixes: #481

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 2, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ PSalinovic
❌ Petar Salinovic


Petar Salinovic seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

@koletzilla koletzilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The fix correctly addresses the dbt_valid_to_current gap in ClickHouse's custom snapshot staging macros. The approach aligns with dbt-core's default__snapshot_staging_table pattern and the merge macro correctly doesn't need changes (it joins on dbt_scd_id which is unique per row, unlike dbt-core's MERGE which filters on dbt_valid_to).

What looks good

  • WHERE clause matches dbt-core: (dbt_valid_to = <sentinel> OR dbt_valid_to IS NULL) — the IS NULL fallback ensures backward compatibility with rows snapshotted before the config was added.
  • New insert dbt_valid_to: coalesce(nullif(ts, ts), <sentinel_or_null>) matches dbt-core's get_dbt_valid_to_current macro.
  • Updates correctly expire with the real timestamp, not the sentinel — this is correct SCD2 behavior.
  • First-run path is unaffected — dbt-core's default__build_snapshot_table (which already handles dbt_valid_to_current) is used since the adapter doesn't override it.
  • Both check and timestamp strategies are handled.

Issues

1. Missing test for record modifications (high)

This is the exact scenario from issue #481: "Update some column → old version rows are not closed, new version rows are automatically closed." The tests only cover inserts and idempotent re-runs — they never modify an existing row and re-snapshot.

Please add a test step that:

  1. Modifies an existing row's value (e.g., update a seed CSV with a changed name for id=1)
  2. Re-runs snapshot
  3. Asserts the old version has dbt_valid_to = real timestamp (not sentinel, not NULL)
  4. Asserts the new version has dbt_valid_to = sentinel
  5. Asserts total row count = N + 1

Without this, the most important SCD2 behavior is unverified.

2. hard_deletes: 'invalidate' not tested with dbt_valid_to_current (medium)

Issue #481 specifically configures hard_deletes: 'invalidate'. The fix correctly filters current records in the snapshotted_data CTE (which the deletes CTE depends on), but there's no test covering this combination.

3. Minor: equals() macro not used (low, informational)

dbt-core uses {{ equals(column, value) }} for the sentinel comparison (which provides NULL-safe equality depending on behavior flags). This PR uses direct =. In ClickHouse this works correctly (NULL = X evaluates to NULL/falsy, caught by the OR IS NULL fallback), but noting the divergence.

@koletzilla
Copy link
Copy Markdown
Contributor

koletzilla commented Apr 13, 2026

Hi @PSalinovic ! Thanks for the contribution, those snapshot issues are quite relevant, so having someone working on them is more than welcome!

We are using AI to help us reviewing PRs as a first check before going deep intro a manual validation. We don't blindly trust them, but usually they find some stuff interesting and in this case it looks like it has some opinions on changes that look good. Can you check them?

About the CI:

  • There're also a couple of small lint problems, easy to fix.
  • test_cloud fails in all community PRs, so ignore it. It's not needed to merge this PR.

@PSalinovic
Copy link
Copy Markdown
Author

Hi, thanks for the feedback.

I fixed the linting issue from the failing test.

Also, I added two more test cases in the test file. That was a good catch by the clanker. Testing the case with the updated record works fine and the test passes however the test with the delete that uses 'hard_deletes' config variable does not pass. It looks like this is a known bug that already has two issues opened: #271 and #291. The test will actually pass if I have the join_use_nulls set to true in the global config, which i did locally by updating the conftest.py. I didn't have the time to look into this more but it seems like it could be solved by using NOT EXIST instead of LEFT JOIN. I'll try to see if this works and submit a PR if it does.
I left the test in the test file but disabled it and left a comment.

Another issue I encountered is that specifying composite keys in the snapshot config per dbt docs does not work and acutally produces a CROSS JOIN where the data multiplies. This is mentioned in #630 and #544. I'll try to submit a PR for this as well as this is critical on our end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dbt snapshot - check strategy - stopped working properly

3 participants