How to update/delete the whole table each time? #392

onnimonni · 2025-08-25T05:31:00Z

onnimonni
Aug 25, 2025

Hey!

I have a public CSV dataset which updates once per week. Some rows get updated and they should be replace the old information based on the id column. But some rows get completely deleted and thus they should get removed completely. I would want to track that dataset in ducklake.

I know I can truncate the table before loading new data but that doesnt seem efficient. Ideally only the delta between the two files would be stored in the parquet files so then the storage usage would be minimal but I would still be able to use the time travel functionality.

I think this is very typical data loading case from a legacy system for analytic queries. Is there a de-facto way to do this?

Answered by guillesd

Aug 25, 2025

Hi @onnimonni. If you have a unique key, in Ducklake 0.3 we are releasing MERGE INTO, which is suited for this use case. https://ducklake.select/docs/preview/duckdb/usage/upserting

View full answer

guillesd · 2025-08-25T08:20:04Z

guillesd
Aug 25, 2025
Collaborator

Hi @onnimonni. If you have a unique key, in Ducklake 0.3 we are releasing MERGE INTO, which is suited for this use case. https://ducklake.select/docs/preview/duckdb/usage/upserting

2 replies

onnimonni Aug 25, 2025
Author

This is exactly what I'm looking for and yes I can use unique id.

In addition I would need to remove all original rows which were not found in the newest data anymore.

Is that case supported or should i first do DELETE with ANTI JOIN and then use the MERGE INTO?

guillesd Aug 25, 2025
Collaborator

Unless you have a list of id's that have been deleted as an input, the best thing you can do is to indeed use an ANTI JOIN to find all keys that are not present anymore, then run a DELETE before or after you run the MERGE command. Just know that these are two different transactions and will therefore generate two different snapshots. If you want to make sure that this changes are chained in the same transaction you can look at using the BEGIN TRANSACTION syntax --> https://duckdb.org/docs/stable/sql/statements/transactions.html

jenchen95 · 2025-09-02T06:12:17Z

jenchen95
Sep 2, 2025

@guillesd Hi, when I use DuckDB v1.3.2 and Ducklake v0.3-dev1, the "MERGE" syntax doesn't seem to work in DuckLake.

For example, using the code from https://ducklake.select/docs/preview/duckdb/usage/upserting results in a parser error: "syntax error at or near 'MERGE'." Do you know why?

1 reply

guillesd Sep 2, 2025
Collaborator

You need to use DuckDB 1.4-dev! There is a dependency there. Here you can track the compatibility matrix --> https://ducklake.select/release_calendar.html#compatibility-matrix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to update/delete the whole table each time? #392

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to update/delete the whole table each time? #392

Uh oh!

onnimonni Aug 25, 2025

Replies: 2 comments · 3 replies

Uh oh!

guillesd Aug 25, 2025 Collaborator

Uh oh!

Uh oh!

onnimonni Aug 25, 2025 Author

Uh oh!

guillesd Aug 25, 2025 Collaborator

Uh oh!

Uh oh!

jenchen95 Sep 2, 2025

Uh oh!

guillesd Sep 2, 2025 Collaborator

onnimonni
Aug 25, 2025

Replies: 2 comments 3 replies

guillesd
Aug 25, 2025
Collaborator

onnimonni Aug 25, 2025
Author

guillesd Aug 25, 2025
Collaborator

jenchen95
Sep 2, 2025

guillesd Sep 2, 2025
Collaborator