-
First of all: I'm impressed by DuckDB and DuckLake. I'm trying to understand how I can use them to query a large amount of information currently maintained in PostgreSQL. The most important table currently has over 300 million rows with 10 columns (data from 2005 to the present) and is updated daily with 2,000 to 4,000 rows (only updates and inserts). The time-travel query option of DuckLake could solve a lot of my problems, so I would love to have a DuckLake setup with all this historic data that I can query, but I am concerned about the number of Parquet files that will be added each day and don't know how this will affect performance. Because I don't need time-travel queries with resolution finer than one day, I suppose that the best approach would be to capture using a CDC tool (like Debezium and Kafka) all the changes for this table in one day and initiate a transaction in DuckLake, commit all these changes in one transaction. This way, I will have a snapshot for each day. Am I right? Is there any other way to get the most out of DuckLake power in my case? Thanks for Duck* ... a marvellous piece of software, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
If you can get the the new records and or updates/deletes you can and you have a unique key, in Ducklake 0.3 we are releasing MERGE INTO, which is suited for this use case https://ducklake.select/docs/preview/duckdb/usage/upserting. If you are only inserting new records then I would go for an append strategy. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your answer, MERGE INTO will be perfect because there are also UPDATES, not just INSERTS. What about the storage? Teo |
Beta Was this translation helpful? Give feedback.
If you can get the the new records and or updates/deletes you can and you have a unique key, in Ducklake 0.3 we are releasing MERGE INTO, which is suited for this use case https://ducklake.select/docs/preview/duckdb/usage/upserting. If you are only inserting new records then I would go for an append strategy.