Skip to content

DELETE physical execution (MOR delete files) #17

@zfarrell

Description

@zfarrell

Context

This is one ticket in a series carrying forward #12 foundation work. Read #12 first for repo context.

Implements DuckLake DELETE via the merge-on-read (MOR) pattern: scan the affected data files, identify rows matching the predicate, write delete files at the row-position level, and register them in the catalog.

Reference branch

ducklake-features/integration:

  • src/delete_exec.rs — physical execution plan
  • src/table_writer.rs — shared write helpers (also expanded in this branch; cross-reference)
  • Tests: tests/update_tests.rs (the populated file — confirm; the audit cataloged that one file was populated and a sibling stub was deleted in Foundation: rebase integration onto upstream, drop pass-throughs, triage SLT failures #12), and tests/delete_filter_tests.rs (already upstream — exercises the read-side MOR filter)

Scope

  1. Port src/delete_exec.rs.
  2. Behavior:
    • For each data file referenced by the target table at the current snapshot, scan with the predicate filter pushed down to identify deleted row positions
    • Write delete files (file_path: VARCHAR, pos: INT64 schema) — one per data file with deletions
    • Use the existing UploadCleanupGuard pattern from src/insert_exec.rs to clean up uploaded delete files on commit failure
    • Use checked arithmetic for row-count overflow (the audit confirmed the fork does this; preserve it)
    • Handle the "already deleted" case correctly: do not emit a duplicate delete-file entry for the same (data_file, position) pair
    • Apply via the MetadataWriter trait — backend-agnostic, no SQLite/PG/MySQL coupling here
  3. Register the exec with the planner from DuckLake QueryPlanner: intercept DML logical plans #16.

Acceptance criteria

  • cargo build clean
  • DELETE round-trip test: insert rows, DELETE with predicate, SELECT confirms the rows are gone, snapshot increments
  • DELETE all rows: predicate matches every row, table reads back empty
  • DELETE on already-deleted rows: idempotent, no spurious delete-file entries
  • Commit-failure recovery: upload completes but commit fails, uploaded delete files are cleaned up (verify via filesystem inspection in test)
  • No duckdb crate imports
  • Concurrent DELETE + DELETE on same table: one wins, other gets a conflict error (do not silently drop)

Dependencies

Out of scope

  • UPDATE and MERGE — separate tickets
  • Read-side delete filtering — already upstream

Notes

  • Audit verdict on the fork's delete_exec.rs: "solid — proper Arc cloning into async block, UploadCleanupGuard for orphan cleanup, checked arithmetic for row-count overflow, correct already-deleted handling."
  • The branch's commit history references R10-S-007 (transaction-wrapped file registration) and R11-S-002 (cleanup on partitioned INSERT failure). Same patterns apply here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions