-
Hey! I have a public CSV dataset which updates once per week. Some rows get updated and they should be replace the old information based on the id column. But some rows get completely deleted and thus they should get removed completely. I would want to track that dataset in ducklake. I know I can truncate the table before loading new data but that doesnt seem efficient. Ideally only the delta between the two files would be stored in the parquet files so then the storage usage would be minimal but I would still be able to use the time travel functionality. I think this is very typical data loading case from a legacy system for analytic queries. Is there a de-facto way to do this? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Hi @onnimonni. If you have a unique key, in Ducklake 0.3 we are releasing MERGE INTO, which is suited for this use case. https://ducklake.select/docs/preview/duckdb/usage/upserting |
Beta Was this translation helpful? Give feedback.
-
@guillesd Hi, when I use DuckDB v1.3.2 and Ducklake v0.3-dev1, the "MERGE" syntax doesn't seem to work in DuckLake. For example, using the code from https://ducklake.select/docs/preview/duckdb/usage/upserting results in a parser error: "syntax error at or near 'MERGE'." Do you know why? |
Beta Was this translation helpful? Give feedback.
Hi @onnimonni. If you have a unique key, in Ducklake 0.3 we are releasing MERGE INTO, which is suited for this use case. https://ducklake.select/docs/preview/duckdb/usage/upserting