Skip to content

Skip uniqueness constraint validation on Overwrite for DuckDB & SQLite#498

Open
lukekim wants to merge 5 commits into
spiceaifrom
lukim/skip-constraint-validation-on-overwrite-v2
Open

Skip uniqueness constraint validation on Overwrite for DuckDB & SQLite#498
lukekim wants to merge 5 commits into
spiceaifrom
lukim/skip-constraint-validation-on-overwrite-v2

Conversation

@lukekim
Copy link
Copy Markdown
Collaborator

@lukekim lukekim commented Nov 25, 2025

Summary

Skip uniqueness-constraint validation when performing InsertOp::Overwrite for the DuckDB and SQLite table providers. By definition, an overwrite replaces the target table's contents, so pre-existing row values cannot produce duplicate-key violations, and values within the incoming batch are resolved by the write path itself.

Motivation

The existing write path always called validate_batch_with_constraints before inserting, which would reject perfectly valid overwrite batches that happened to contain values conflicting with the previous contents of the table (or with themselves, when the caller expected "last write wins" semantics). This made InsertOp::Overwrite unusable for common ETL patterns that re-materialize a PK-constrained table from a source.

Changes

DuckDB (core/src/duckdb/)

  • write.rs: Skip validate_batch_with_constraints when self.overwrite == InsertOp::Overwrite.
  • creator.rs: Added create_table_without_constraints, used for the staging table during overwrite so intra-batch PK duplicates do not trip DuckDB's own constraint check before the atomic swap.

SQLite (core/src/sqlite/, core/src/sqlite.rs)

  • write.rs: Skip validate_batch_with_constraints on InsertOp::Overwrite.
  • sqlite.rs: Thread InsertOp through insert_batch / insert_batch_prepared; Overwrite now uses REPLACE INTO so that intra-batch duplicates collapse to the last row written.

SQL generation (core/src/sql/arrow_sql_gen/statement.rs)

  • InsertBuilder::build now takes a replace: bool flag.
  • New build_sqlite_replace() helper emits REPLACE INTO ... for SQLite overwrite paths.

Utility (core/src/util/constraints.rs)

  • Added filter_unique_constraints helper for filtering non-unique constraints out of a Constraints set.

Test coverage

Added four new unit tests in core/src/duckdb/write.rs and core/src/sqlite/write.rs:

  • test_overwrite_skips_pk_constraint_validation_with_duplicate_pks (DuckDB & SQLite): writes a batch containing duplicate primary-key values via InsertOp::Overwrite and asserts success.
  • test_append_still_enforces_pk_constraint_validation (DuckDB & SQLite): regression guard — the same duplicate-PK batch must still be rejected when the op is Append.

All four pass locally:

test sqlite::write::tests::test_overwrite_skips_pk_constraint_validation_with_duplicate_pks ... ok
test duckdb::write::test::test_overwrite_skips_pk_constraint_validation_with_duplicate_pks ... ok
test sqlite::write::tests::test_append_still_enforces_pk_constraint_validation ... ok
test duckdb::write::test::test_append_still_enforces_pk_constraint_validation ... ok

@lukekim lukekim self-assigned this Nov 25, 2025
@lukekim lukekim force-pushed the lukim/skip-constraint-validation-on-overwrite-v2 branch from dbd40a8 to f0e3f28 Compare April 16, 2026 21:13
@lukekim lukekim force-pushed the lukim/skip-constraint-validation-on-overwrite-v2 branch from f0e3f28 to 8035ffe Compare April 16, 2026 22:03
@lukekim lukekim changed the title Overwrite op support for DuckDB & SQLite Skip uniqueness constraint validation on Overwrite for DuckDB & SQLite Apr 16, 2026
@lukekim lukekim added the enhancement New feature or request label Apr 16, 2026
Comment thread core/src/duckdb/write.rs
// Skip constraint validation for Overwrite operations since we're replacing all data
// and uniqueness constraints don't apply to the incoming data in isolation.
if self.overwrite != InsertOp::Overwrite {
if let Some(constraints) = self.table_definition.constraints() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constraints should still be applied on the incoming data if the final table will have constraints, otherwise you could end up with a table that has violated its constraints.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last write wins should still allow the validation to proceed if that is configured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants