Skip to content

DuckLake QueryPlanner: intercept DML logical plans #16

@zfarrell

Description

@zfarrell

Context

This is one ticket in a series carrying forward #12 foundation work. Read #12 first for repo context.

DataFusion natively plans INSERT against a TableProvider, but it does not natively plan DELETE, UPDATE, or MERGE. The fork adds a custom QueryPlanner that intercepts DataFusion's logical DML plans for DuckLake tables and routes them to DuckLake-specific execution plans. This is the integration point — every other DML ticket (#17 DELETE, #18 UPDATE, #19 MERGE) depends on it.

Reference branch

ducklake-features/integration:

  • src/query_planner.rs — the planner. Read it end-to-end; it's ~600 LOC and not bigger than necessary.
  • Inline #[cfg(test)] tests in the same file cover the routing logic (planner rejects unsupported plan shapes with explicit errors instead of silently emptying filters — a deliberate safety choice flagged in the audit).

Scope

  1. Port src/query_planner.rs.
  2. The planner intercepts only LogicalPlan::Dml(Delete | Update). INSERT continues to use DataFusion's native path via TableProvider::insert_into. MERGE is implemented as a custom logical extension node (see MERGE physical execution (INSERT/UPDATE/DELETE atomic) #19).
  3. For DELETE/UPDATE, the planner extracts:
    • the target DuckLakeTable (downcast via TableProvider::as_any)
    • the filter expression (rejecting plans where DataFusion rewrote the filter through joins/subqueries — emit a clear error rather than silently dropping the predicate)
    • for UPDATE, the SET expressions, identified by positional matching against the target schema (see audit note below)
  4. Construct the right physical plan: DeleteExec (DELETE physical execution (MOR delete files) #17) or UpdateExec (UPDATE physical execution (MOR delete + insert) #18).
  5. Register the planner via SessionStateBuilder::with_query_planner and expose a helper, e.g. DuckLakeQueryPlanner::register(&mut state).

Acceptance criteria

Dependencies

Out of scope

  • The DELETE/UPDATE/MERGE physical execs themselves — separate tickets
  • DDL planning (CREATE TABLE etc. continues to flow through DataFusion's native path)

Notes

  • Audit concern to address before merging: in the fork's implementation, UPDATE detection uses positional matching: projection_exprs[i].name == schema.fields()[i].name(). This depends on DataFusion never reordering projections. The audit flagged this as fragile but not currently broken. Add a runtime assertion that the names match by index and fail loudly with a planner error if they don't — that way a future DataFusion behavior change becomes a clear bug report rather than silent data corruption.
  • The audit verdict on this file was "solid" — the planner's rejection of join/subquery-rewritten filters is called out as a "genuinely thoughtful safety check."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions