Skip to content
Discussion options

You must be logged in to vote

Batch inserts are supported by JDBC driver, just, coming from Spark's dataframe.save(), they result in the following sequence:

PREPARE p AS INSERT INTO range_10 ("id") VALUES (?);
BEGIN TRANSACTION;
EXECUTE p(0);
EXECUTE p(1);
EXECUTE p(2);
...
EXECUTE p(9);
COMMIT;

The problem is that, until Data Inlining is implemented for Postgres catalog, DuckLake will write a Parquet file immediately on every EXECUTE call.

Besides ducklake_add_data_files and COPY mentioned above, this problem also can be avoided using:

PREPARE p AS INSERT INTO range_10 ("id") VALUES (?)(?)(?)...(?);
EXECUTE p(0, 1, 2, ..., 9);

But there seems to be no easy way to use this approach from Spark, so for now I think that d…

Replies: 1 comment 14 replies

Comment options

You must be logged in to vote
14 replies
@korolmi
Comment options

@staticlibs
Comment options

Answer selected by korolmi
@korolmi
Comment options

@staticlibs
Comment options

@guillesd
Comment options

guillesd Oct 2, 2025
Collaborator

@korolmi
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants