Skip to content

feat(java): row encoder supports custom types and collections #2243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025

Conversation

stevenschlansker
Copy link
Contributor

What does this PR do?

Extend Java Row Format to allow registering custom datatypes (e.g. UUID as Int128) and collection factories (e.g. SortedSet<UUID> as new TreeSet<UUID>(customComparator) )
Additionally supports arrays of custom types e.g. UUID[]

Since the type inference is in fury-core but I wanted to keep new features scoped to fury-format, I had to add a small plugin interface to core so that format can add types dynamically without affecting existing core behavior.

Related issues

#2208

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

The Encoders class has new registerCustomCodec and registerCustomCollectionFactory methods.
All custom types are written with the existing protocol as embedded memory buffers just like any other field, but with a custom byte representation, so there should be no wire compatibility concerns.

Benchmark

There should be no change to performance in existing use cases. The code is carefully written to have no runtime impact if not used. Custom types are invoked via static methods or instance method on static final fields, which should be easily inlined by jit for minimum overhead.

Here is example generated code to help show this:
https://gist.github.com/stevenschlansker/ed7dae863e78d3c87e30bdea39fa8dea

@stevenschlansker stevenschlansker force-pushed the extensible-row-format branch 8 times, most recently from 07dabdd to 276bccb Compare May 17, 2025 00:06
Copy link
Collaborator

@chaokunyang chaokunyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! Excelent work

@chaokunyang
Copy link
Collaborator

image

Code lint failed, Please use mvn spotless:apply to format code

@chaokunyang
Copy link
Collaborator

image

Some generated code failed to compile

@stevenschlansker stevenschlansker force-pushed the extensible-row-format branch from 276bccb to e602fd8 Compare May 18, 2025 20:11
@stevenschlansker
Copy link
Contributor Author

Fixed the last bug and applied formatting.

@stevenschlansker
Copy link
Contributor Author

stevenschlansker commented May 18, 2025

One question: is there any important difference between Field.nullable(fieldName, ArrowType.Binary.INSTANCE) vs DataTypes.primitiveArrayField(fieldName, DataTypes.int8())?

@stevenschlansker stevenschlansker force-pushed the extensible-row-format branch from e602fd8 to 4685049 Compare May 18, 2025 22:35
@chaokunyang
Copy link
Collaborator

primitiveArrayField

I think both are OK

@chaokunyang chaokunyang merged commit 50d3cb6 into apache:main May 19, 2025
50 checks passed
@stevenschlansker stevenschlansker deleted the extensible-row-format branch May 19, 2025 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants