Skip to content

[Feature Request] Kernel's loadCommitRange() should gracefully handle empty ranges #5391

@zikangh

Description

@zikangh

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

Streaming microbatch logic requires gracefully handling empty commit ranges.

Motivation

Use case:

  1. Batch 1: Start at v0@BASE_INDEX -> reads all data -> last file is END sentinel at v5@END_INDEX
  2. Batch 2: Start at v6@BASE_INDEX -> tries to load commit range starting from v6

DeltaLog:
deltaLog.getChangeLogFiles(6, ...) returns an empty iterator

Kernel:
TableManager.loadCommitRange(tablePath).withStartBoundary(atVersion(6)).build(engine) throws
KernelException: "Requested table changes between [6, Optional.empty] but no log files found in the requested version range"

Further details

Temporary workaround: we catch this error on the connector (currently requires parsing and matching error messages, which is fragile). @huan233usc mentioned supporting fine-grained exceptions might be a straightforward workaround.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions