-
Notifications
You must be signed in to change notification settings - Fork 730
fix: preallocate compaction task id range per loop and assert bounds #24654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes compaction task ID allocation by pre-allocating a range of IDs per loop iteration instead of fetching them individually. This reduces the number of SQL transactions needed when generating multiple compaction tasks.
Changes:
- Added
next_compaction_task_id_intervalfunction to allocate multiple task IDs at once - Modified compaction task selection loop to pre-allocate IDs and assert bounds
- Updated iceberg compaction manager to use the new helper function
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/meta/src/hummock/manager/sequence.rs | Added new function to allocate multiple compaction task IDs in a single call |
| src/meta/src/hummock/manager/compaction/mod.rs | Pre-allocates task ID range before loop and replaces per-iteration allocation with local increment |
| src/meta/src/manager/iceberg_compaction.rs | Refactored to use the existing helper function instead of direct sequence call |
|
|
||
| // StoredIdGenerator already implements ids pre-allocation by ID_PREALLOCATE_INTERVAL. | ||
| let task_id = next_compaction_task_id(&self.env).await?; | ||
| assert!(next_task_id < task_id_end); |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion should check next_task_id < task_id_end BEFORE using task_id, but the assignment happens after the assert. If the assert fails, we'd still have incremented past the end. Consider moving the assignment of task_id after the assertion, or restructure to check the condition before both the assignment and increment.
| // Pre-allocate task ids for this loop to avoid per-group SQL transactions. | ||
| let task_id_capacity = compaction_groups.len() as u32; | ||
| let mut next_task_id = | ||
| next_compaction_task_id_interval(&self.env, task_id_capacity).await?; | ||
| let task_id_end = next_task_id + u64::from(task_id_capacity); |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-allocating exactly compaction_groups.len() task IDs may be insufficient. The loop can exit early if pick_tasks.len() >= max_select_count or if a group selector returns None, meaning not all allocated IDs will be used. However, if the loop continues past allocated capacity (e.g., if groups are processed multiple times or the logic changes), the assertion at line 410 will panic. Consider either: (1) allocating min(compaction_groups.len(), max_select_count) IDs, or (2) adding logic to handle exhaustion of pre-allocated IDs gracefully rather than panicking.
Preallocate compaction task ID range per loop to avoid per-group SQL transactions. Capacity is min(group_count, max_select_count) since each group consumes at most one task ID (trivial tasks share the same ID with normal task). Fallback to single ID allocation if pre-allocated range is exhausted.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
As title
Checklist
Documentation
Release note