Skip to content

checkpoint breaks writes on 0.22.0 #3030

@echai58

Description

@echai58

Environment

Delta-rs version: 0.22.0

Binding: python


Bug

What happened: I am trying to upgrade to the latest release of delta-rs and it seems to introduce a breaking bug in checkpoints.

What you expected to happen: Checkpoints continue to work.

How to reproduce it: This introduces a breaking bug in both pyarrow and rust writer engines. In pyarrow, it does not overwrite successfully (two rows in output), and in rust, it panics.

from deltalake import DeltaTable, write_deltalake
import pandas as pd 

write_deltalake(
    "test",
    pd.DataFrame(
        {
            "a": ["a"],
            "b": [3],
        }
    ),
)
DeltaTable("test").create_checkpoint()

At this point, the delta table looks correct, e.g.:
image

But, on the next write:

write_deltalake(
    "test",
        pd.DataFrame(
            {
                "a": ["a"],
                "b": [100],
            }
        ),
    mode="overwrite",
)

, when using rust writer engine, we get the following exception:

PanicException: called `Result::unwrap()` on an `Err` value: DeletionVector("Unknown storage format: ''.")

and on pyarrow, it manifests itself with an incorrect overwrite:
image
with two rows showing up.

Side note: I think this sort of breaking bug ought to be caught by the test suite... it's a breaking bug in core usage of deltalake.

Metadata

Metadata

Assignees

Labels

binding/rustIssues for the Rust cratebugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions