Feature Request: support replacing files #372
Replies: 2 comments
-
Hi @batmilkyway ! I'm not sure about this one. The thing is, is a bit dangerous to have an application that doesn't "speak" DuckLake to compact (and therefore temper) with the DuckLake managed parquet files. If you just want to query parquet files you could do this with DuckDB, why add them to DuckLake if they are not going to be managed with Ducklake? |
Beta Was this translation helpful? Give feedback.
-
Sorry, I was a little vague in my post, but I'm essentially building this https://www.confluent.io/blog/introducing-tableflow/ or this https://buf.build/docs/bufstream/iceberg/reference/. It's quite powerful to query your Kafka topics in this way. The danger is also present with Iceberg tables as well. But yes, in principle you could just query the parquet files with duckdb, but then if you wanted to make the parquet files accessible to be queried outside of the Kafka protocol you'd end up building a datalake around it, so why not re-use one(s) that exist? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Why do you want this feature?
My application uses parquet files tracked in a metadata store to serve requests using the Kafka protocol. It would be fairly simple to enable querying the parquet files via ducklake by adding the files. But the parquet files are periodically compacted together by the application so they would need to be updated in ducklake as well. Iceberg has a RewriteFiles interface (although they're deprecating
rewriteFiles
in favor of finer grained add/delete operations), so something like that for duck lake would enable my use case.Beta Was this translation helpful? Give feedback.
All reactions