-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Enhancement] Optimize large tablet lake compaction by support parallel compaction #66586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🧪 CI InsightsHere's what we observed from your CI run for 0b34040. 🟢 All jobs passed!But CI Insights is watching 👀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces support for parallel lake compaction, allowing multiple compaction subtasks to run concurrently within a single tablet. The feature aims to improve compaction throughput by enabling non-overlapping rowsets within the same tablet to be compacted in parallel, rather than sequentially.
Key Changes:
- Adds per-tablet parallel compaction manager in BE to coordinate concurrent subtasks with non-overlapping rowset selection
- Extends protobuf definitions for parallel compaction configuration and autonomous compaction results
- Adds FE and BE configuration options to enable/disable parallel compaction and configure parallelism limits
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| gensrc/proto/lake_types.proto | Adds CompactionResultPB for storing autonomous compaction results |
| gensrc/proto/lake_service.proto | Adds TabletParallelConfig for parallel compaction settings and TabletSubtaskStatus for tracking subtask progress |
| fe/fe-core/src/main/java/com/starrocks/common/Config.java | Adds 8 new configuration options for autonomous and parallel compaction modes |
| fe/fe-core/src/main/java/com/starrocks/lake/compaction/CompactionScheduler.java | Extends compaction requests to include parallel config and visible version; fixes typo |
| be/src/common/config.h | Adds 14 BE configuration options for autonomous/parallel compaction and recovery |
| be/src/storage/lake/tablet_parallel_compaction_manager.h | Defines TabletParallelCompactionManager class to orchestrate parallel compaction within tablets |
| be/src/storage/lake/tablet_parallel_compaction_manager.cpp | Implements parallel compaction logic: rowset selection, subtask execution, TxnLog merging |
| be/src/storage/lake/tablet_manager.h | Adds overload of compact() accepting pre-selected rowsets |
| be/src/storage/lake/tablet_manager.cpp | Implements compact() overload to support parallel compaction with pre-selected rowsets |
| be/src/storage/lake/compaction_scheduler.h | Adds TabletParallelCompactionManager member and process_parallel_compaction method |
| be/src/storage/lake/compaction_scheduler.cpp | Integrates parallel compaction path with fallback to non-parallel mode on failure |
| be/src/storage/lake/compaction_policy.h | Adds PartialCompactionState struct and PartialCompactionSelector for autonomous compaction support |
| be/src/storage/lake/compaction_policy.cpp | Implements pick_rowsets_with_limit() for rowset selection with exclusion and byte limits; adds partial compaction helpers |
| be/src/storage/CMakeLists.txt | Adds tablet_parallel_compaction_manager.cpp to build |
fe/fe-core/src/main/java/com/starrocks/lake/compaction/CompactionScheduler.java
Show resolved
Hide resolved
bb76f7c to
0b34040
Compare
|
@cursor review |
0b34040 to
2b492e4
Compare
|
@cursor review |
fd9d1a2 to
3fc0c7c
Compare
|
@cursor review |
b852e25 to
243dcad
Compare
|
@cursor review |
d019ca7 to
1188394
Compare
|
@cursor review |
| if (context->partition_id == 0) { | ||
| context->partition_id = id_pair.second; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should handle else not ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else the id already set.
| public static int lake_compaction_max_parallel_per_tablet = 3; | ||
|
|
||
| @ConfField(mutable = true, comment = "Maximum data volume (bytes) per parallel subtask (1GB default)") | ||
| public static long lake_compaction_max_bytes_per_subtask = 1073741824L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 1GB too small?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be adjusted after large-scale testing.
|
what does this feature do to |
| bool is_rowset_compacting(uint32_t rowset_id) const { return compacting_rowsets.count(rowset_id) > 0; } | ||
|
|
||
| // Check if all subtasks are completed | ||
| bool is_complete() const { return running_subtasks.empty() && total_subtasks_created > 0; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need mutex protection? since these values are written under mutex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is protected by a mutex wherever it is used.
yes |
5d5d48b to
b32170c
Compare
|
@cursor review |
be/test/storage/lake/tablet_parallel_compaction_manager_test.cpp
Outdated
Show resolved
Hide resolved
b32170c to
a4a287b
Compare
|
@cursor review |
a4a287b to
e1a1b2a
Compare
|
@cursor review |
e1a1b2a to
3f6660a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
|
@cursor review |
b5a7e76 to
7251ffd
Compare
7251ffd to
c759494
Compare
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 16 / 19 (84.21%) file detail
|
[BE Incremental Coverage Report]❌ fail : 709 / 1059 (66.95%) file detail
|
|
@cursor review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
Why I'm doing: - Lake table compaction currently executes in a single thread per tablet, which becomes a bottleneck for large tablets with significant data volume or many rowsets, causing data accumulation and query performance degradation. - To improve compaction throughput and reduce compaction lag, we need the ability to run multiple compaction subtasks concurrently within a single tablet. What I'm doing: - Introduce TabletParallelCompactionManager to orchestrate parallel compaction subtasks within a single tablet by selecting non-overlapping rowset groups. - Modify CompactionScheduler to process parallel compaction requests from FE, creating multiple subtasks that can run concurrently on the thread pool. - Add rows mapper functionality to track row mappings across parallel subtasks, enabling proper conflict resolution for primary key tables. - Optimize compaction score calculation by skipping large segments that are close to max_segment_file_size (configurable via lake_compaction_skip_large_ segment_ratio), avoiding unnecessary compaction of already optimized segments. - Defer SST compaction to execute once after all subtasks complete, preventing multiple subtasks from competing to compact the same SST files. - Add FE configurations: lake_compaction_enable_parallel_per_tablet, lake_compaction_max_parallel_per_tablet, lake_compaction_max_bytes_per_subtask. - Add BE configurations: enable_lake_compaction_skip_large_segment. - Extend protobuf messages (TabletParallelConfig, CompactionResultPB) to support parallel compaction coordination between FE and BE. - Add comprehensive unit tests for parallel compaction manager and scheduler. Signed-off-by: meegoo <[email protected]>
c759494 to
e7d3503
Compare
|




Why I'm doing:
becomes a bottleneck for large tablets with significant data volume or many
rowsets, causing data accumulation and query performance degradation.
to run multiple compaction subtasks concurrently within a single tablet.
What I'm doing:
subtasks within a single tablet by selecting non-overlapping rowset groups.
creating multiple subtasks that can run concurrently on the thread pool.
enabling proper conflict resolution for primary key tables.
close to max_segment_file_size (configurable via lake_compaction_skip_large_
segment_ratio), avoiding unnecessary compaction of already optimized segments.
multiple subtasks from competing to compact the same SST files.
lake_compaction_max_parallel_per_tablet, lake_compaction_max_bytes_per_subtask.
lake_compaction_max_bytes_per_subtask, enable_lake_compaction_skip_large_segment.
parallel compaction coordination between FE and BE.
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
Note
Introduces parallel compaction for Lake tablets and optimizations to reduce unnecessary work.
TabletParallelCompactionManagerorchestrates per-tablet subtasks, mergesTxnLogPB, and defers SST compaction until all subtasks finishCompactionSchedulerhandlesparallel_config, creates subtasks, tracks/renders parallel task states, and falls back to non-parallel when neededMultiRowsMapperIterator), per-subtask mapper files,subtask_idplumbed through writers and contextsTabletManager::compact(context, input_rowsets)overload to run compaction on pre-picked rowsetscalc_effective_segment_count()and config flags (enable_lake_compaction_skip_large_segment,lake_compaction_skip_large_segment_ratio) skip large segments in score/selectionsubtask_id; skip SST compaction inside subtasks for PK tables (run once post-merge)Written by Cursor Bugbot for commit c759494. This will update automatically on new commits. Configure here.