Skip to content

Conversation

@dbatomic
Copy link
Contributor

@dbatomic dbatomic commented Sep 13, 2025

What changes are proposed in this pull request?

This PR relaxes type-checking rules to allow LargeUtf8 conversion to STRING and LargeBinary conversion to BINARY. With this PR Delta kernel will allow committing arrow record batches that use LargeUtf8 or that are built using LargeStringBuilder or corresponding types for building LargeBinary.

The main limitation that non-large variations of string and binary types impose is that they can hold up to 2 GBs of data, which makes it impossible to do kernel commits if there is a string or binary column larger than 2 GB. With this PR clients can just pass RecordBatch that uses LargeUtf8 or LargeBinary as column types.

How was this change tested?

New UT

@codecov
Copy link

codecov bot commented Sep 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.68%. Comparing base (45c3e3a) to head (f921714).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1294      +/-   ##
==========================================
+ Coverage   83.66%   83.68%   +0.01%     
==========================================
  Files         108      108              
  Lines       25926    25932       +6     
  Branches    25926    25932       +6     
==========================================
+ Hits        21691    21700       +9     
+ Misses       3145     3144       -1     
+ Partials     1090     1088       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zachschuermann zachschuermann changed the title fix: Allowing LargeUTF8 and LargeBinary in arrow - delta kernel conversions feat: Allow LargeUTF8 -> String and LargeBinary -> Binary in arrow conversion Sep 13, 2025
Copy link
Member

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks

@zachschuermann zachschuermann merged commit 933f426 into delta-io:main Sep 15, 2025
21 checks passed
@chiragjn
Copy link

It would help to have these changes on read paths too
We are facing overflow panics in z-ordering for large data - delta-io/delta-rs#3790
Someone had made an attempt too - #300

@zachschuermann
Copy link
Member

@chiragjn could you open a new issue? we have discussed the "Large" versions for log metadata (LargeUtf8, etc.) - if you have some more details will help us to investigate/prioritize!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants