Skip to content

Conversation

@meegoo
Copy link
Contributor

@meegoo meegoo commented Dec 11, 2025

Why I'm doing:

  • Lake table compaction currently executes in a single thread per tablet, which
    becomes a bottleneck for large tablets with significant data volume or many
    rowsets, causing data accumulation and query performance degradation.
  • To improve compaction throughput and reduce compaction lag, we need the ability
    to run multiple compaction subtasks concurrently within a single tablet.

What I'm doing:

  • Introduce TabletParallelCompactionManager to orchestrate parallel compaction
    subtasks within a single tablet by selecting non-overlapping rowset groups.
  • Modify CompactionScheduler to process parallel compaction requests from FE,
    creating multiple subtasks that can run concurrently on the thread pool.
  • Add rows mapper functionality to track row mappings across parallel subtasks,
    enabling proper conflict resolution for primary key tables.
  • Optimize compaction score calculation by skipping large segments that are
    close to max_segment_file_size (configurable via lake_compaction_skip_large_
    segment_ratio), avoiding unnecessary compaction of already optimized segments.
  • Defer SST compaction to execute once after all subtasks complete, preventing
    multiple subtasks from competing to compact the same SST files.
  • Add FE configurations: lake_compaction_enable_parallel_per_tablet,
    lake_compaction_max_parallel_per_tablet, lake_compaction_max_bytes_per_subtask.
  • Add BE configurations: lake_compaction_max_parallel_per_tablet,
    lake_compaction_max_bytes_per_subtask, enable_lake_compaction_skip_large_segment.
  • Extend protobuf messages (TabletParallelConfig, CompactionResultPB) to support
    parallel compaction coordination between FE and BE.
  • Add comprehensive unit tests for parallel compaction manager and scheduler.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Note

Introduces parallel compaction for Lake tablets and optimizations to reduce unnecessary work.

  • New TabletParallelCompactionManager orchestrates per-tablet subtasks, merges TxnLogPB, and defers SST compaction until all subtasks finish
  • CompactionScheduler handles parallel_config, creates subtasks, tracks/renders parallel task states, and falls back to non-parallel when needed
  • PK conflict resolution and writers updated for parallelism: multi-file rows mapper (MultiRowsMapperIterator), per-subtask mapper files, subtask_id plumbed through writers and contexts
  • Adds TabletManager::compact(context, input_rowsets) overload to run compaction on pre-picked rowsets
  • Scoring change: calc_effective_segment_count() and config flags (enable_lake_compaction_skip_large_segment, lake_compaction_skip_large_segment_ratio) skip large segments in score/selection
  • Rows mapper utilities extended: subtask-specific filenames, total row count checks, cleanup helpers; light publish verifies row counts across subtasks
  • Hook in horizontal/vertical compaction tasks to pass subtask_id; skip SST compaction inside subtasks for PK tables (run once post-merge)
  • Tests: new parallel compaction manager/scheduler tests; CMake wired for new sources

Written by Cursor Bugbot for commit c759494. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings December 11, 2025 02:10
@meegoo meegoo requested review from a team as code owners December 11, 2025 02:10
@wanpengfei-git wanpengfei-git requested a review from a team December 11, 2025 02:10
@mergify
Copy link
Contributor

mergify bot commented Dec 11, 2025

🧪 CI Insights

Here's what we observed from your CI run for 0b34040.

🟢 All jobs passed!

But CI Insights is watching 👀

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces support for parallel lake compaction, allowing multiple compaction subtasks to run concurrently within a single tablet. The feature aims to improve compaction throughput by enabling non-overlapping rowsets within the same tablet to be compacted in parallel, rather than sequentially.

Key Changes:

  • Adds per-tablet parallel compaction manager in BE to coordinate concurrent subtasks with non-overlapping rowset selection
  • Extends protobuf definitions for parallel compaction configuration and autonomous compaction results
  • Adds FE and BE configuration options to enable/disable parallel compaction and configure parallelism limits

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
gensrc/proto/lake_types.proto Adds CompactionResultPB for storing autonomous compaction results
gensrc/proto/lake_service.proto Adds TabletParallelConfig for parallel compaction settings and TabletSubtaskStatus for tracking subtask progress
fe/fe-core/src/main/java/com/starrocks/common/Config.java Adds 8 new configuration options for autonomous and parallel compaction modes
fe/fe-core/src/main/java/com/starrocks/lake/compaction/CompactionScheduler.java Extends compaction requests to include parallel config and visible version; fixes typo
be/src/common/config.h Adds 14 BE configuration options for autonomous/parallel compaction and recovery
be/src/storage/lake/tablet_parallel_compaction_manager.h Defines TabletParallelCompactionManager class to orchestrate parallel compaction within tablets
be/src/storage/lake/tablet_parallel_compaction_manager.cpp Implements parallel compaction logic: rowset selection, subtask execution, TxnLog merging
be/src/storage/lake/tablet_manager.h Adds overload of compact() accepting pre-selected rowsets
be/src/storage/lake/tablet_manager.cpp Implements compact() overload to support parallel compaction with pre-selected rowsets
be/src/storage/lake/compaction_scheduler.h Adds TabletParallelCompactionManager member and process_parallel_compaction method
be/src/storage/lake/compaction_scheduler.cpp Integrates parallel compaction path with fallback to non-parallel mode on failure
be/src/storage/lake/compaction_policy.h Adds PartialCompactionState struct and PartialCompactionSelector for autonomous compaction support
be/src/storage/lake/compaction_policy.cpp Implements pick_rowsets_with_limit() for rowset selection with exclusion and byte limits; adds partial compaction helpers
be/src/storage/CMakeLists.txt Adds tablet_parallel_compaction_manager.cpp to build

@meegoo meegoo force-pushed the parallel_compaction branch from bb76f7c to 0b34040 Compare December 11, 2025 02:22
@meegoo meegoo changed the title [Feature] support parallel lake compaction [Enhancement] support parallel lake compaction Dec 11, 2025
@meegoo meegoo changed the title [Enhancement] support parallel lake compaction [Enhancement] Optimize large tablet lake compaction by support parallel compaction Dec 11, 2025
@alvin-celerdata
Copy link
Contributor

@cursor review

@meegoo meegoo force-pushed the parallel_compaction branch from 0b34040 to 2b492e4 Compare December 16, 2025 12:03
@alvin-celerdata
Copy link
Contributor

@cursor review

@meegoo meegoo force-pushed the parallel_compaction branch 3 times, most recently from fd9d1a2 to 3fc0c7c Compare December 17, 2025 11:37
@alvin-celerdata
Copy link
Contributor

@cursor review

@meegoo meegoo force-pushed the parallel_compaction branch 2 times, most recently from b852e25 to 243dcad Compare December 18, 2025 03:00
@alvin-celerdata
Copy link
Contributor

@cursor review

@meegoo meegoo force-pushed the parallel_compaction branch 2 times, most recently from d019ca7 to 1188394 Compare December 18, 2025 14:06
@alvin-celerdata
Copy link
Contributor

@cursor review

if (context->partition_id == 0) {
context->partition_id = id_pair.second;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should handle else not ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else the id already set.

public static int lake_compaction_max_parallel_per_tablet = 3;

@ConfField(mutable = true, comment = "Maximum data volume (bytes) per parallel subtask (1GB default)")
public static long lake_compaction_max_bytes_per_subtask = 1073741824L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 1GB too small?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be adjusted after large-scale testing.

@starrocks-xupeng
Copy link
Contributor

what does this feature do to show proc '/compactions' and be_cloud_native_compactions, are we able to monitor sub task?

bool is_rowset_compacting(uint32_t rowset_id) const { return compacting_rowsets.count(rowset_id) > 0; }

// Check if all subtasks are completed
bool is_complete() const { return running_subtasks.empty() && total_subtasks_created > 0; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need mutex protection? since these values are written under mutex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is protected by a mutex wherever it is used.

@meegoo
Copy link
Contributor Author

meegoo commented Dec 25, 2025

what does this feature do to show proc '/compactions' and be_cloud_native_compactions, are we able to monitor sub task?

yes

@meegoo meegoo force-pushed the parallel_compaction branch 2 times, most recently from 5d5d48b to b32170c Compare December 25, 2025 12:25
@alvin-celerdata
Copy link
Contributor

@cursor review

@meegoo meegoo force-pushed the parallel_compaction branch from b32170c to a4a287b Compare December 25, 2025 18:48
@alvin-celerdata
Copy link
Contributor

@cursor review

@meegoo meegoo force-pushed the parallel_compaction branch from a4a287b to e1a1b2a Compare December 25, 2025 18:55
@alvin-celerdata
Copy link
Contributor

@cursor review

@meegoo meegoo force-pushed the parallel_compaction branch from e1a1b2a to 3f6660a Compare December 25, 2025 19:02
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

@alvin-celerdata
Copy link
Contributor

@cursor review

@meegoo meegoo force-pushed the parallel_compaction branch 2 times, most recently from b5a7e76 to 7251ffd Compare December 26, 2025 07:25
@luohaha luohaha self-requested a review December 26, 2025 07:42
@meegoo meegoo force-pushed the parallel_compaction branch from 7251ffd to c759494 Compare December 26, 2025 12:02
@github-actions
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[FE Incremental Coverage Report]

pass : 16 / 19 (84.21%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/service/InformationSchemaDataSource.java 0 3 00.00% [480, 481, 484]
🔵 com/starrocks/common/Config.java 3 3 100.00% []
🔵 com/starrocks/lake/compaction/CompactionScheduler.java 13 13 100.00% []

@github-actions
Copy link

[BE Incremental Coverage Report]

fail : 709 / 1059 (66.95%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/storage/local_primary_key_compaction_conflict_resolver.h 0 1 00.00% [54]
🔵 be/src/storage/lake/tablet_writer.h 0 1 00.00% [178]
🔵 be/src/storage/local_primary_key_compaction_conflict_resolver.cpp 0 2 00.00% [30, 31]
🔵 be/src/storage/lake/lake_primary_key_compaction_conflict_resolver.cpp 0 2 00.00% [33, 34]
🔵 be/src/storage/rows_mapper.h 0 2 00.00% [140, 145]
🔵 be/src/storage/rows_mapper.cpp 11 198 05.56% [140, 141, 142, 143, 145, 154, 163, 164, 165, 171, 172, 174, 176, 179, 181, 182, 183, 185, 186, 187, 191, 192, 193, 194, 195, 197, 198, 199, 202, 205, 207, 208, 214, 215, 217, 219, 222, 224, 225, 226, 228, 229, 230, 234, 235, 236, 237, 238, 240, 241, 242, 245, 248, 256, 257, 258, 259, 260, 261, 262, 263, 265, 268, 270, 271, 272, 275, 276, 277, 280, 281, 283, 285, 288, 289, 290, 291, 293, 294, 295, 299, 300, 301, 302, 305, 306, 307, 308, 311, 312, 313, 316, 317, 318, 320, 321, 322, 325, 328, 330, 331, 334, 335, 336, 339, 340, 342, 344, 347, 348, 349, 350, 352, 353, 354, 358, 359, 360, 361, 364, 365, 366, 367, 370, 371, 372, 375, 376, 377, 379, 380, 381, 384, 387, 388, 389, 392, 393, 395, 396, 397, 398, 399, 401, 402, 403, 404, 405, 406, 408, 409, 410, 413, 414, 418, 419, 420, 423, 426, 428, 429, 430, 431, 432, 433, 437, 438, 439, 441, 442, 443, 444, 446, 448, 451, 452, 456, 457, 458, 459, 461, 462, 463, 464, 465, 467, 468]
🔵 be/src/storage/primary_key_compaction_conflict_resolver.h 1 4 25.00% [59, 60, 61]
🔵 be/src/storage/lake/compaction_task.cpp 1 2 50.00% [47]
🔵 be/src/storage/lake/horizontal_compaction_task.cpp 1 2 50.00% [71]
🔵 be/src/storage/lake/vertical_compaction_task.cpp 1 2 50.00% [64]
🔵 be/src/storage/lake/tablet_manager.cpp 9 17 52.94% [1151, 1152, 1153, 1154, 1155, 1157, 1158, 1161]
🔵 be/src/storage/primary_key_compaction_conflict_resolver.cpp 19 35 54.29% [125, 126, 127, 129, 132, 136, 137, 138, 200, 201, 202, 204, 207, 211, 212, 213]
🔵 be/src/storage/lake/update_manager.cpp 22 36 61.11% [1216, 1218, 1221, 1236, 1237, 1238, 1239, 1240, 1242, 1245, 1246, 1247, 1248, 1348]
🔵 be/src/storage/lake/lake_primary_key_compaction_conflict_resolver.h 4 6 66.67% [64, 66]
🔵 be/src/storage/lake/tablet_parallel_compaction_manager.cpp 499 601 83.03% [61, 106, 112, 130, 207, 208, 253, 254, 265, 288, 289, 305, 306, 307, 329, 348, 445, 466, 467, 469, 470, 471, 472, 473, 474, 475, 476, 478, 479, 480, 482, 488, 493, 494, 495, 499, 502, 506, 507, 571, 584, 587, 591, 665, 738, 774, 816, 888, 890, 894, 895, 896, 930, 988, 993, 1000, 1002, 1003, 1006, 1008, 1009, 1011, 1012, 1013, 1016, 1018, 1019, 1021, 1022, 1023, 1024, 1026, 1027, 1030, 1033, 1036, 1039, 1040, 1049, 1050, 1051, 1063, 1068, 1069, 1071, 1072, 1073, 1074, 1077, 1078, 1079, 1080, 1083, 1087, 1088, 1089, 1090, 1092, 1093, 1094, 1097, 1122]
🔵 be/src/storage/lake/primary_key_compaction_policy.cpp 13 15 86.67% [102, 107]
🔵 be/src/storage/lake/pk_tablet_writer.cpp 18 20 90.00% [53, 192]
🔵 be/src/storage/lake/primary_key_compaction_policy.h 14 15 93.33% [58]
🔵 be/src/storage/lake/compaction_policy.cpp 21 22 95.45% [47]
🔵 be/src/storage/lake/compaction_scheduler.cpp 64 65 98.46% [334]
🔵 be/src/storage/lake/tablet_parallel_compaction_manager.h 4 4 100.00% []
🔵 be/src/storage/lake/compaction_task_context.cpp 2 2 100.00% []
🔵 be/src/storage/lake/compaction_task_context.h 5 5 100.00% []

@alvin-celerdata
Copy link
Contributor

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

Why I'm doing:
- Lake table compaction currently executes in a single thread per tablet, which
  becomes a bottleneck for large tablets with significant data volume or many
  rowsets, causing data accumulation and query performance degradation.
- To improve compaction throughput and reduce compaction lag, we need the ability
  to run multiple compaction subtasks concurrently within a single tablet.

What I'm doing:
- Introduce TabletParallelCompactionManager to orchestrate parallel compaction
  subtasks within a single tablet by selecting non-overlapping rowset groups.
- Modify CompactionScheduler to process parallel compaction requests from FE,
  creating multiple subtasks that can run concurrently on the thread pool.
- Add rows mapper functionality to track row mappings across parallel subtasks,
  enabling proper conflict resolution for primary key tables.
- Optimize compaction score calculation by skipping large segments that are
  close to max_segment_file_size (configurable via lake_compaction_skip_large_
  segment_ratio), avoiding unnecessary compaction of already optimized segments.
- Defer SST compaction to execute once after all subtasks complete, preventing
  multiple subtasks from competing to compact the same SST files.
- Add FE configurations: lake_compaction_enable_parallel_per_tablet,
  lake_compaction_max_parallel_per_tablet, lake_compaction_max_bytes_per_subtask.
- Add BE configurations: enable_lake_compaction_skip_large_segment.
- Extend protobuf messages (TabletParallelConfig, CompactionResultPB) to support
  parallel compaction coordination between FE and BE.
- Add comprehensive unit tests for parallel compaction manager and scheduler.

Signed-off-by: meegoo <[email protected]>
@meegoo meegoo force-pushed the parallel_compaction branch from c759494 to e7d3503 Compare December 29, 2025 02:52
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants