Skip to content

Conversation

@polyzos
Copy link
Contributor

@polyzos polyzos commented Nov 17, 2025

This addresses #1731

@polyzos polyzos marked this pull request as ready for review November 18, 2025 07:43
@polyzos
Copy link
Contributor Author

polyzos commented Nov 18, 2025

FYI @wuchong .. This PR adds support for writing/scanning Pojos directly with the client, while keeping the API as is.
The only difference is that now it's typed.

When I find some time, I want to test the effect in terms of performance, between writing/scanning with Pojos vs InternalRows.

@polyzos
Copy link
Contributor Author

polyzos commented Nov 25, 2025

@leekeiabstraction . Indeed, we are looking at roughly 2x performance penalty.
I tested writing/scanning on both PK and Log tables with 10 million records.
I ran 5 iterations for each and calculated the average.

This is basically a trade-off for the users. Using InternalRow/GenericRow directly is way more efficient; however, this might come with some extra complexity and boilerplatete code.

For this reason, I want to give flexibility, probably leave the docs as is, with GenericRow being the go-to approach, but also add a section that Pojos can be used directly and maybe highlight this trade-off.

Moving forward I'm thinking that maybe it makes sense to add some helper classes that also derive the schema for the table from a Pojo.
Log Table
Screenshot 2025-11-25 at 3 32 34 PM

Primary Key Table
Screenshot 2025-11-25 at 3 51 46 PM

@polyzos polyzos force-pushed the java-client-pojo-support branch from d6722e7 to 24b65c2 Compare November 25, 2025 14:14
@leekeiabstraction
Copy link

Thank you @polyzos for addressing the commends and also providing the data from your tests! That gives a clear picture on the performance. Further response questions below:

This is basically a trade-off for the users. Using InternalRow/GenericRow directly is way more efficient; however, this might come with some extra complexity and boilerplatete code.
For this reason, I want to give flexibility, probably leave the docs as is, with GenericRow being the go-to approach, but also add a section that Pojos can be used directly and maybe highlight this trade-off.

IMO, this does not need to be a trade-off. I am curious if you have explored implementation complexity or cons around pushing the Pojo conversion down further so that conversion to/from InternalRow can be skipped altogether? From a quick look, it does seem like you have most of the interfaces updated to support it.

Moving forward I'm thinking that maybe it makes sense to add some helper classes that also derive the schema for the table from a Pojo.

Trying to understand, does this relate to performance or a separate thread on further changes that you're planning?

@polyzos
Copy link
Contributor Author

polyzos commented Nov 25, 2025

@leekeiabstraction yes, it's a separate thread.
Can you provide some more context in terms of what you mean - push the conversion further down?
On the scan side, I see you mentioned optimization in the CompletedFetch.toScanRecord(LogRecord), that is something indeed that I didn't think of and might result in some further optimization.
Is there something also on the write side you are thinking of?

@leekeiabstraction
Copy link

leekeiabstraction commented Nov 25, 2025

Can you provide some more context in terms of what you mean - push the conversion further down?
On the scan side, I see you mentioned optimization in the CompletedFetch.toScanRecord(LogRecord), that is something indeed that I didn't think of and might result in some further optimization.

My aim is so that we can eliminate performance penalty by avoiding performing conversion twice. By "pushing conversion down", I mean you can do something like moving the RowToPojoConverter method calls into LogScannerImpl/CompletedFetch. refactoring so LogScannerImpl so that conversion is done directly to pojo. Strategy pattern could be useful here.

Is there something also on the write side you are thinking of?

Currently not, I'm very new to this code base but it's certainly worth exploring especially if we're also converting twice on the write side. Happy to have a look at the write side as well.

@polyzos polyzos force-pushed the java-client-pojo-support branch from b955ff5 to a274f5a Compare November 28, 2025 08:52
@wuchong wuchong linked an issue Dec 21, 2025 that may be closed by this pull request
2 tasks
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @polyzos!

Overall, the pull request looks good. My only concern is the interface change. Could you please take a look at the new proposal?

Also, when you update the PR, please rebase your branch to keep the history clean.

@polyzos
Copy link
Contributor Author

polyzos commented Dec 21, 2025

@wuchong thank you for all your comments.
I have already introduced typed classes, like

https://github.com/apache/fluss/pull/1992/changes#diff-463b3a6796c0042a681c569845f0abba145f454e7e40df447326bc212b544304

If you check the test i have also ensured backwards compatibility ... do you mean something different, or am I missing something?
I have also tested things locally to make sure nothing breaks

@wuchong
Copy link
Member

wuchong commented Dec 22, 2025

@polyzos Yeah, I noticed those typed classes. However, typed classes are internal, only interfaces are visible to users.

My suggestion is to introduce dedicated typed interfaces (e.g., interface TypedLookuper<T>) that are separate from the existing Lookuper interface, rather than making Lookuper itself generic.

While turning Lookuper into a generic interface would be binary-compatible, it would introduce type erasure warnings in IDEs and clutter the public API. More importantly, for use cases involving InternalRow, the generic type T has no semantic meaning, so forcing a type parameter there adds unnecessary noise without benefit.

By keeping Lookuper and TypedLookuper<T> as distinct interfaces, we maintain clean separation of concerns:

  • Lookuper for low-level, type-agnostic access (e.g., InternalRow)
  • TypedLookuper<T> for high-level, type-safe lookups

This approach preserves backward compatibility, avoids IDE warnings, and aligns with how users actually interact with the API.

# Conflicts:
#	fluss-client/src/main/java/org/apache/fluss/client/lookup/Lookuper.java
#	fluss-client/src/main/java/org/apache/fluss/client/lookup/PrefixKeyLookuper.java
#	fluss-client/src/main/java/org/apache/fluss/client/lookup/PrimaryKeyLookuper.java
# Conflicts:
#	fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/utils/FlussRowToFlinkRowConverterTest.java
@polyzos polyzos force-pushed the java-client-pojo-support branch from f79319a to 3ea717c Compare December 22, 2025 07:06
@polyzos
Copy link
Contributor Author

polyzos commented Dec 22, 2025

@wuchong I made the required changes.
Let me know if this approach resonates and works better

@wuchong
Copy link
Member

wuchong commented Dec 23, 2025

Thank you @polyzos , I will take another look.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @polyzos , I think this PR is already in a good shape. I left some minor comments.

@Override
public <T> TypedUpsertWriter<T> createTypedWriter(Class<T> pojoClass) {
UpsertWriterImpl delegate =
new UpsertWriterImpl(tablePath, tableInfo, targetColumns, writerClient);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simplify to just call createWriter().

private final Class<T> pojoClass;
private final TableInfo tableInfo;
private final RowType tableSchema;
private final int[] targetColumns; // may be null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add @Nullable annotation to indicate it is nullable

delegate.close();
}

private final Class<T> pojoClass;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used, can be removed.


LookupResult lr = lookuper.lookup(new PLookupKey(1)).get();
AllTypesPojo one = rowConv.fromRow(lr.getSingletonRow());
assertThat(one.str).isEqualTo("s1");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertThat(one).isEqualTo(newAllTypesPojo(1));

After adding equals and hashcode method to the AllTypesPojo class, we can simply assert the full record, this can check the full POJO record deserialization.

AllTypesPojo lookedUp =
rowConv.fromRow(lookuper.lookup(new PLookupKey(1)).get().getSingletonRow());
assertThat(lookedUp.str).isEqualTo("second");
assertThat(lookedUp.dec).isEqualByComparingTo("99.99");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to test the partial update feature, we should assert the other fields are keep unchanged here.

TypedScanRecords<AllTypesPojo> recs = scanner.poll(Duration.ofSeconds(2));
for (TypedScanRecord<AllTypesPojo> r : recs) {
if (r.getChangeType() == ChangeType.UPDATE_AFTER) {
assertThat(r.getValue().str).isEqualTo("second");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 40 to 42
public static final ScanRecords empty() {
return new ScanRecords(Collections.emptyMap());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary? I still prefer the preivous implementation because it can avoid some small object overhead (GC).

Comment on lines 43 to 45
default void close() throws Exception {
// by default do nothing
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the newly introduced close() method doesn’t currently have any meaningful work to perform, as the writers don’t hold any resources at this stage. I suggest holding off on introducing it for now.

Typically, a close() method should either flush pending records or ensure all previously submitted requests have completed—otherwise, it may mislead users into expecting cleanup or finalization behavior that isn’t actually implemented. This introduces some complex to this PR.

@wuchong wuchong merged commit f731fc6 into apache:main Dec 23, 2025
5 checks passed
@polyzos polyzos mentioned this pull request Dec 23, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Client] Java Client Add Pojo Support

3 participants