-
Notifications
You must be signed in to change notification settings - Fork 458
Support COMPACTED format for log tables #1605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d88c76c to
434a4f4
Compare
e919a85 to
4639453
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Introduces Compacted Log Format for scenarios where the KV store doesn't require projection pushdown, particularly for primary key tables with aggregates.
- Adds support for COMPACTED log format as a third option alongside ARROW and INDEXED
- Implements CompactedRow-based record encoding/decoding with space-optimized binary format
- Updates validation logic to allow both ARROW and COMPACTED formats when KV format is COMPACTED
Reviewed Changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| TableDescriptorValidation.java | Updates validation to allow COMPACTED log format for primary key tables |
| CompactedWalBuilder.java | New WAL builder implementation for compacted log format |
| KvTablet.java | Adds case handling for COMPACTED log format |
| MemoryLogRecordsCompactedBuilder.java | New builder extending common row-based builder for compacted records |
| CompactedLogRecord.java | Core log record implementation for compacted row format |
| AbstractRowMemoryLogRecordsBuilder.java | Extracted common functionality for row-based log builders |
| LogFormat.java | Adds COMPACTED enum value and updates documentation |
| CompactedLogWriteBatch.java | Client-side batch writer for compacted format |
| WriteRecord.java | Adds factory method for compacted append operations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
fluss-common/src/main/java/org/apache/fluss/record/LogRecordReadContext.java
Outdated
Show resolved
Hide resolved
|
I'm closing this as #85 addresses that.. Gonna check though if there are any possible improvements that can benefit the current implementation and create a separate PR if required. |
|
@swuferhong, before leaving this, since as you already addressed this, I wanted to check with you if you think there is anything we can reuse in case I missed something. |
Hi, @polyzos. I have wrongly closed #85 while organizing the PR. The work for "introduce compacted log row" hasn't been done yet. You can proceed with it in your current PR, but it's probably not high priority since there's currently no clear use case requiring this format. |
|
@swuferhong So what happens with all the work that got merged? because I see there is lot of similarity in both PRs |
Hi, @polyzos I don't quite understand this 'So what happens with all the work that got merged?' as this work wasn't merged yet? I will try to review it. However, I'm not quite sure if it's necessary to introduce this format. |
|
This feature is useful when projection is not required. it may not be a high priority. |
|
@polyzos Hi, I'm thinking to push forward this pr. We have a user case that big embeding vector write to fluss , and flusss tier to lance when column prune is not required. Then use compacted change log format can much reduce the cost of server rebuilding arrow change log. Could you please rebase main branh? |
a6974fd to
6b6fda0
Compare
|
@luoyuxia done 🫡 |
|
@polyzos Hi, thanks for your greate work. I verify it in my local, it works.
But for me, the real highest priority is generate change log with compacted row, so could you please split this pr into two, one is to support write compacted row to log table, another one is support change log with compacted row? I'll review the pr for support change log with compacted row? |
…classes # Conflicts: # fluss-common/src/main/java/org/apache/fluss/record/MemoryLogRecordsIndexedBuilder.java
…cordsCompactedBuilder.java Co-authored-by: Copilot <[email protected]>
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @polyzos , this looks good to me. I only improved some code, like merge tests to reduce test time.
| CompactedRow row, | ||
| @Nullable byte[] bucketKey) { | ||
| checkNotNull(row); | ||
| int estimatedSizeInBytes = CompactedLogRecord.sizeOf(row) + RECORD_BATCH_HEADER_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RECORD_BATCH_HEADER_SIZE is for kv record batch, we should use LogRecordBatchFormat.recordBatchHeaderSize(CURRENT_LOG_MAGIC_VALUE)
f9e7526 to
cf23fc3
Compare
Introduce Compacted Log Format, for scenarios that the KV store doesn't require projection pushdown, like tables with aggregates