Skip to content

Further refactor Parquet readers for v2 support #13290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eric-maynard
Copy link

@eric-maynard eric-maynard commented Jun 10, 2025

In issues like #7162 and #11371, it's reported that newer Parquet encodings like DELTA_BINARY_PACKED don't work with the current Parquet readers. #11661 recently refactored the Parquet readers to improve code re-use, but there a few more changes needed to prepare us for Parquet v2 support.

This refactor introduces a new interface VectorizedValuesReader and changes readers like TimestampMillisReader to work with this new type. After this change, new implementations of VectorizedValuesReader can be added to support encodings like DELTA_BINARY_PACKED.


This PR is a revival of @wgtmac's #9772, which based on our conversion he will not be able to continue work on. Thanks for the great work, @wgtmac.

@github-actions github-actions bot added the arrow label Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant