Skip to content

After checkpoint statistics are not available in add_actions_table if stats written as struct #3375

Open
@alexwilcoxson-rel

Description

@alexwilcoxson-rel

Environment

Delta-rs version:
0.25.5

Binding:
Rust, Python

Environment:

  • Cloud provider: Azure
  • OS: macOS, Linux
  • Other:

Bug

What happened:
Our tables write checkpoints with statistics written as structs, delta.checkpoint.writeStatsAsStruct = true and delta.checkpoint.writeStatsAsJson = false

After a checkpoint if you call add_actions_table looking at statistics:

  1. it only checks for existence of stats on Adds vs including stats_parsed as well: https://github.com/delta-io/delta-rs/blob/python-v0.25.5/crates/core/src/table/state_arrow.rs#L98
  2. probably because the files iterator used internally uses read_adds which does not set stats_parsed

What you expected to happen:
I expect add_actions_table to have statistics available regardless of what the latest checkpoint is and how the stats were written to it

How to reproduce it:

  1. Configure table with delta.checkpoint.writeStatsAsStruct = true and delta.checkpoint.writeStatsAsJson = false
  2. Write data
  3. Checkpoint
  4. call add_actions_table
  5. observe no stats are present

More details:
log_data method is probably usable here for add_actions_table since it already has the data in arrow format AND it hydrates stats regardless of how they are represented in checkpoints or not.

It would just need a method on FileStatsAccessor to build a record batch out of its internal columns.

As a workaround I can probably enable json stats in addition to struct stats in checkpoints for little overhead.

Our use case is we make the add_actions_table queryable with datafusion to provide a sql function to explore delta table stats.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions