Skip to content

[VL][Delta] Track Delta Lake MoR enhancement #11901

@malinjawi

Description

@malinjawi

Description

This issue tracks the native execution gaps in Delta Lake MoR (Merge-on-Read) read and DML paths on the Velox backend, focusing on deletion-vector semantics, fallback reduction, and performance.

The goal is to move Delta MoR in Gluten from partial prototype status to stable native execution.

This tracker is intended to organize the work and align with Delta Lake’s MoR design, deletion vector protocol, and Gluten’s lakehouse integration.

Related issues:

Related context:

Current status

Current PoC work has already made progress in the following areas:

  • Delta DV/MoR read foundation has been prototyped in Gluten + Velox.
  • Native DV scan/read path is partially working.
  • Fallback still exists in some control-plane and non-hot-path operations.
  • Native DELETE path has been explored, but MoR DML is not yet complete.
  • UPDATE / MERGE are not yet fully native.

Scope

This issue tracks Delta MoR work in the Velox backend, including:

  • native reads for Delta tables with deletion vectors
  • native DML paths that generate or update deletion vectors
  • protocol correctness for DV descriptors and action handling
  • performance and fallback reduction for MoR workloads

Out of scope:

  • generic Delta CoW improvements unless directly required by MoR
  • non-Velox backend work
  • unrelated lakehouse features

Priority

P0

  • Native DV read correctness
  • Native MoR read execution with minimal/zero fallback on supported queries
  • Stable build/runtime validation in clean environments

P1

  • Native DELETE support for MoR
  • Correct handling of files with existing deletion vectors
  • Reduction of control-plane overhead and fallback in MoR workloads
    (e.g. Delta helper queries, JSON/log handling, histogram aggregation)

P2

  • Native UPDATE support for MoR
  • Native MERGE support for MoR
  • Broader MoR performance optimization and workload coverage

Work areas

1. MoR read path

  • Complete native DV scan/read support
  • Integrate Delta MoR reads cleanly into Gluten + Velox planning/execution
  • Reduce fallback on supported MoR read queries

2. MoR write path

  • Add native DELETE support
  • Add native UPDATE support
  • Add native MERGE support
  • Support rewriting/replacing existing DV states correctly

3. Delta protocol alignment

  • Align implementation with Delta deletion vector protocol semantics
  • Ensure correct handling of u / p / i DV descriptors
  • Ensure correct handling of offsets, size, checksum, and cardinality
  • Ensure correct reconciliation behavior for (path, deletionVector.uniqueId)

4. Performance

  • Improve MoR read performance
  • Improve MoR write performance
  • Benchmark Gluten/Velox against vanilla Spark for representative MoR workloads
  • Achieve competitive or improved performance over vanilla Spark in representative MoR workloads

5. Testing and validation

  • Add unit and integration coverage for MoR read/write paths
  • Add regression coverage for DV protocol edge cases
  • Validate correctness across partitioned and non-partitioned tables

Success criteria

  • Supported Delta MoR read queries execute natively with zero or near-zero fallback
  • Native DV read results match vanilla Spark / Delta semantics
  • DELETE path is stable and mostly native
  • Gluten/Velox MoR performance is competitive or improved over vanilla Spark on representative workloads

Gluten version

main branch

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions