Base issue to summarize and discuss all the pending work needed to improve the current Snapshot status:
About changes needed to make Snapsots work:
To make Snapshots work we needed to reimplement some of the SQL macros related to this materialisation. The reasons to do this are the following ones:
clickhouse__snapshot_merge_sql: ClickHouse has no MERGE/UPDATE. The target is rebuilt via insert + EXCHANGE TABLES (or drop+rename).
build_snapshot_staging_table: (non-dispatched) Uses a non-temporary staging table (create_table_as(False, …)) because the rebuild reads it across several separate statements; ClickHouse temp tables don't persist that way.
clickhouse__snapshot_hash_arguments: halfMD5(... cast as varchar ...) instead of md5 needed as md5 is not a CH function. MD5 is but it doesn't return needed hex text. As we need a 32-char string, halfMD5 do the trick.
clickhouse__post_snapshot: drop the staging relation.
clickhouse__snapshot_staging_table:
snapshot_time CTE (check strategy): pins now() so it is consistent across the whole query
We need to keep this documented in the code so we have all the context on these decissions.
Missing features:
Since we added the current Snapshot implementation, the code in dbt-core has evolved but we have not kept up with all the new added features. Some of them are:
There are a few base tests that tests these values and we are not inherited them:
Existing bugs:
Base issue to summarize and discuss all the pending work needed to improve the current Snapshot status:
About changes needed to make Snapsots work:
To make Snapshots work we needed to reimplement some of the SQL macros related to this materialisation. The reasons to do this are the following ones:
clickhouse__snapshot_merge_sql: ClickHouse has noMERGE/UPDATE. The target is rebuilt viainsert+EXCHANGE TABLES(or drop+rename).build_snapshot_staging_table: (non-dispatched) Uses a non-temporary staging table (create_table_as(False, …)) because the rebuild reads it across several separate statements; ClickHouse temp tables don't persist that way.clickhouse__snapshot_hash_arguments:halfMD5(... cast as varchar ...)instead ofmd5needed asmd5is not a CH function. MD5 is but it doesn't return needed hex text. As we need a 32-char string, halfMD5 do the trick.clickhouse__post_snapshot: drop the staging relation.clickhouse__snapshot_staging_table:snapshot_timeCTE (check strategy): pinsnow()so it is consistent across the whole queryWe need to keep this documented in the code so we have all the context on these decissions.
Missing features:
Since we added the current Snapshot implementation, the code in dbt-core has evolved but we have not kept up with all the new added features. Some of them are:
unique_key1.9.3 broke snapshots with compound unique keys #544dbt_valid_to/dbt_scd_idliteralssnapshot_meta_column_namesSupport snapshot_meta_column_names #571dbt_valid_to_currentdbt snapshot - check strategy - stopped working properly #481 fix(snapshot): support dbt_valid_to_current config in snapshot macros #630hard_deletes: new_recordsupport hard_deletesnew_recordbehavior for snapshots #562There are a few base tests that tests these values and we are not inherited them:
BaseSimpleSnapshotandBaseSnapshotCheck: includes hard_deletes/revivesBaseSnapshotNewRecordDbtValidToCurrent: Exerciseshard_deletes: new_record+dbt_valid_to_current+ check strategyBaseSnapshotNewRecordTimestampModeandBaseSnapshotNewRecordCheckMode: Check new records with both strategiesExisting bugs:
settings join_use_nulls = 1to make them correctly work. snapshot invalidate_hard_deletes feature #291. This is activated withinvalidate_hard_deletes=Truedatabase_engineset as None causing Snapshot to fail Unable to create new snapshot #475