Skip to content

load_cdf return an empty dataframe when a version is out of range #3035

@pblocz

Description

@pblocz

Description

In spark delta table you can enable an option to manage out of range versions or timestamps. https://docs.delta.io/latest/delta-change-data-feed.html#read-changes-in-streaming-queries
image

Right now the behaviour of load_cdf is inconsistent, if you provide an out of range version you get an error:
image

But with a timestamp out of range, you get an empty dataset:
image

It would be useful for incremental pipelines to have a way to manage this behaviour and make it consistent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions