Skip uniqueness constraint validation on Overwrite for DuckDB & SQLite#498
Open
lukekim wants to merge 5 commits into
Open
Skip uniqueness constraint validation on Overwrite for DuckDB & SQLite#498lukekim wants to merge 5 commits into
lukekim wants to merge 5 commits into
Conversation
…rite-v2 Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>
dbd40a8 to
f0e3f28
Compare
…int-validation-on-overwrite-v2
f0e3f28 to
8035ffe
Compare
| // Skip constraint validation for Overwrite operations since we're replacing all data | ||
| // and uniqueness constraints don't apply to the incoming data in isolation. | ||
| if self.overwrite != InsertOp::Overwrite { | ||
| if let Some(constraints) = self.table_definition.constraints() { |
Collaborator
There was a problem hiding this comment.
constraints should still be applied on the incoming data if the final table will have constraints, otherwise you could end up with a table that has violated its constraints.
Collaborator
There was a problem hiding this comment.
the last write wins should still allow the validation to proceed if that is configured.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Skip uniqueness-constraint validation when performing
InsertOp::Overwritefor the DuckDB and SQLite table providers. By definition, an overwrite replaces the target table's contents, so pre-existing row values cannot produce duplicate-key violations, and values within the incoming batch are resolved by the write path itself.Motivation
The existing write path always called
validate_batch_with_constraintsbefore inserting, which would reject perfectly valid overwrite batches that happened to contain values conflicting with the previous contents of the table (or with themselves, when the caller expected "last write wins" semantics). This madeInsertOp::Overwriteunusable for common ETL patterns that re-materialize a PK-constrained table from a source.Changes
DuckDB (
core/src/duckdb/)write.rs: Skipvalidate_batch_with_constraintswhenself.overwrite == InsertOp::Overwrite.creator.rs: Addedcreate_table_without_constraints, used for the staging table during overwrite so intra-batch PK duplicates do not trip DuckDB's own constraint check before the atomic swap.SQLite (
core/src/sqlite/,core/src/sqlite.rs)write.rs: Skipvalidate_batch_with_constraintsonInsertOp::Overwrite.sqlite.rs: ThreadInsertOpthroughinsert_batch/insert_batch_prepared; Overwrite now usesREPLACE INTOso that intra-batch duplicates collapse to the last row written.SQL generation (
core/src/sql/arrow_sql_gen/statement.rs)InsertBuilder::buildnow takes areplace: boolflag.build_sqlite_replace()helper emitsREPLACE INTO ...for SQLite overwrite paths.Utility (
core/src/util/constraints.rs)filter_unique_constraintshelper for filtering non-unique constraints out of aConstraintsset.Test coverage
Added four new unit tests in
core/src/duckdb/write.rsandcore/src/sqlite/write.rs:test_overwrite_skips_pk_constraint_validation_with_duplicate_pks(DuckDB & SQLite): writes a batch containing duplicate primary-key values viaInsertOp::Overwriteand asserts success.test_append_still_enforces_pk_constraint_validation(DuckDB & SQLite): regression guard — the same duplicate-PK batch must still be rejected when the op isAppend.All four pass locally: