Skip to content

Conversation

@polyzos
Copy link
Contributor

@polyzos polyzos commented Aug 28, 2025

Introduce Compacted Log Format, for scenarios that the KV store doesn't require projection pushdown, like tables with aggregates

@polyzos polyzos marked this pull request as ready for review August 31, 2025 14:07
@polyzos polyzos requested a review from Copilot August 31, 2025 14:07

This comment was marked as outdated.

@polyzos polyzos force-pushed the main branch 3 times, most recently from d88c76c to 434a4f4 Compare August 31, 2025 15:13
@polyzos polyzos requested a review from Copilot September 1, 2025 13:23

This comment was marked as outdated.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Introduces Compacted Log Format for scenarios where the KV store doesn't require projection pushdown, particularly for primary key tables with aggregates.

  • Adds support for COMPACTED log format as a third option alongside ARROW and INDEXED
  • Implements CompactedRow-based record encoding/decoding with space-optimized binary format
  • Updates validation logic to allow both ARROW and COMPACTED formats when KV format is COMPACTED

Reviewed Changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
TableDescriptorValidation.java Updates validation to allow COMPACTED log format for primary key tables
CompactedWalBuilder.java New WAL builder implementation for compacted log format
KvTablet.java Adds case handling for COMPACTED log format
MemoryLogRecordsCompactedBuilder.java New builder extending common row-based builder for compacted records
CompactedLogRecord.java Core log record implementation for compacted row format
AbstractRowMemoryLogRecordsBuilder.java Extracted common functionality for row-based log builders
LogFormat.java Adds COMPACTED enum value and updates documentation
CompactedLogWriteBatch.java Client-side batch writer for compacted format
WriteRecord.java Adds factory method for compacted append operations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@polyzos
Copy link
Contributor Author

polyzos commented Nov 10, 2025

I'm closing this as #85 addresses that.. Gonna check though if there are any possible improvements that can benefit the current implementation and create a separate PR if required.

@polyzos polyzos closed this Nov 10, 2025
@polyzos
Copy link
Contributor Author

polyzos commented Nov 11, 2025

@swuferhong, before leaving this, since as you already addressed this, I wanted to check with you if you think there is anything we can reuse in case I missed something.
I only see this check https://github.com/apache/fluss/pull/1605/files#diff-74b13bc1eb78378f01a154188cdb7942d9b67dff4bbe2c1354a92617ae829349R148
And maybe there is any test we can reuse? Otherwise seems to me you have already covered everything, and my implementation doesn't add anything more, really.

@swuferhong
Copy link
Contributor

@swuferhong, before leaving this, since as you already addressed this, I wanted to check with you if you think there is anything we can reuse in case I missed something. I only see this check https://github.com/apache/fluss/pull/1605/files#diff-74b13bc1eb78378f01a154188cdb7942d9b67dff4bbe2c1354a92617ae829349R148 And maybe there is any test we can reuse? Otherwise seems to me you have already covered everything, and my implementation doesn't add anything more, really.

Hi, @polyzos. I have wrongly closed #85 while organizing the PR. The work for "introduce compacted log row" hasn't been done yet. You can proceed with it in your current PR, but it's probably not high priority since there's currently no clear use case requiring this format.

@polyzos
Copy link
Contributor Author

polyzos commented Nov 11, 2025

@swuferhong So what happens with all the work that got merged? because I see there is lot of similarity in both PRs
Otherwise, maybe you can take a look at my PR?

@polyzos polyzos reopened this Nov 11, 2025
@swuferhong
Copy link
Contributor

@swuferhong So what happens with all the work that got merged? because I see there is lot of similarity in both PRs Otherwise, maybe you can take a look at my PR?

Hi, @polyzos I don't quite understand this 'So what happens with all the work that got merged?' as this work wasn't merged yet?

I will try to review it. However, I'm not quite sure if it's necessary to introduce this format.

@xx789633
Copy link
Contributor

xx789633 commented Dec 3, 2025

This feature is useful when projection is not required. it may not be a high priority.

@luoyuxia
Copy link
Contributor

luoyuxia commented Dec 4, 2025

@polyzos Hi, I'm thinking to push forward this pr. We have a user case that big embeding vector write to fluss , and flusss tier to lance when column prune is not required. Then use compacted change log format can much reduce the cost of server rebuilding arrow change log. Could you please rebase main branh?

@polyzos polyzos force-pushed the support-compacted-log-format branch from a6974fd to 6b6fda0 Compare December 4, 2025 07:06
@polyzos
Copy link
Contributor Author

polyzos commented Dec 4, 2025

@luoyuxia done 🫡

@luoyuxia
Copy link
Contributor

luoyuxia commented Dec 4, 2025

@polyzos Hi, thanks for your greate work. I verify it in my local, it works.
I see you pr support:

  • write compacted row to log table
  • generate change log with compacted row

But for me, the real highest priority is generate change log with compacted row, so could you please split this pr into two, one is to support write compacted row to log table, another one is support change log with compacted row? I'll review the pr for support change log with compacted row?

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @polyzos , this looks good to me. I only improved some code, like merge tests to reduce test time.

CompactedRow row,
@Nullable byte[] bucketKey) {
checkNotNull(row);
int estimatedSizeInBytes = CompactedLogRecord.sizeOf(row) + RECORD_BATCH_HEADER_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RECORD_BATCH_HEADER_SIZE is for kv record batch, we should use LogRecordBatchFormat.recordBatchHeaderSize(CURRENT_LOG_MAGIC_VALUE)

@wuchong wuchong force-pushed the support-compacted-log-format branch from f9e7526 to cf23fc3 Compare December 25, 2025 09:37
@wuchong wuchong merged commit a908690 into apache:main Dec 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Compacted format for Log Tables

5 participants