feat(variant): add Spark 4.0 VARIANT type and Parquet read support#355
Draft
oarap wants to merge 1 commit intobytedance:mainfrom
Draft
feat(variant): add Spark 4.0 VARIANT type and Parquet read support#355oarap wants to merge 1 commit intobytedance:mainfrom
oarap wants to merge 1 commit intobytedance:mainfrom
Conversation
a0c26eb to
84ece88
Compare
Introduce TypeKind::VARIANT as Spark-compatible STRUCT<value VARBINARY, metadata VARBINARY> and wire it through vectors, serde, and Spark SQL functions. - Add VARIANT type and VariantVector/VariantValue plumbing. - Implement Spark VARIANT dictionary parsing and decoding for both bit-coded and compact encodings (compact string length and unordered start-offset tables). - Support variant_get / parse_json over Spark VARIANT payloads. - Improve Parquet reader integration, including ScanSpec child ordering mismatch correction for (value, metadata). - Add Spark-generated VARIANT Parquet fixtures and Parquet reader/unit test coverage.
84ece88 to
d90040e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce TypeKind::VARIANT as Spark-compatible STRUCT<value VARBINARY, metadata VARBINARY> and wire it through vectors, serde, and Spark SQL functions.
What problem does this PR solve?
Issue Number: close #xxx
Type of Change
Description
Describe your changes in detail.
For complex logic, explain the "Why" and "How".
Performance Impact
No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).
Positive Impact: I have run benchmarks.
Click to view Benchmark Results
Negative Impact: Explained below (e.g., trade-off for correctness).
Release Note
Please describe the changes in this PR
Release Note:
Checklist (For Author)
Breaking Changes
No
Yes (Description: ...)
Click to view Breaking Changes