Skip to content

table compaction  #3043

@sebvey

Description

@sebvey

Environment

Delta-rs version: 0.22.2

Binding: python 0.22.2

Environment: python 3.13.0

  • OS: macOS 13.7.1 (22H221) / Kernel Version: Darwin 22.6.0
  • Other:

Bug

What happened:
When compacting a delta table of about 273 files (380MB) partitioned on a field 'year_month' (3 partitions), the listing of the table files seems invalid:

  • new parquet files are correctly added (2 files by partition -> 6 files)
  • previous files seem correctly marked as 'remove' (273 'remove' in the commit log of the 'OPTIMIZE' operation
  • dt.files() lists 205 files, I don't think it's expected
  • `dt.get_add_actions() also lists 205 files, I'm quite sure it's not what is expected
  • when vacuum (with proper params) is done on the table, it seems to rely on the listed files and keep 205 files

Am I missing something ?

Log file of the 'OPTIMIZE' commit:
00000000000000000274.json

Path column of the get_add_actions():
get_add_actions.json

What you expected to happen:

  • dt.files() should list 6 files ?
  • `dt.get_add_actions() should list 6 files
  • vacuum should only left 6 files untouched

How to reproduce it:
I made a repo with the code used for the test. Use the branch deltars-issue-sample: [email protected]:sebvey/delta-optim.git

I made the README.md as clear as possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    binding/rustIssues for the Rust cratebugSomething isn't workingmre-neededWhether an MRE needs to be provided

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions