-
Notifications
You must be signed in to change notification settings - Fork 313
fix: some replicas never start manaul compact after zero o'clock #1556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
If the current task has been discarded, it will be triggered in the next config_sync (the invoke link is Is the |
This is exactly the problem. In
If the running time of this function is greater than 0 o'clock, trigger_time will be counted to the next day of the expect day. |
I see, thanks for your clarifying! |
dsn::tasking::enqueue(LPC_MANUAL_COMPACT, | ||
&_app->_tracker, | ||
[this, options]() { | ||
_pfc_manual_compact_enqueue_count->decrement(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to increment before enqueue too?
|
||
// bug fix : https://github.com/apache/incubator-pegasus/issues/1479 | ||
// now_timestamp return dsn_now_ms() | ||
int loop_enqueue_time = now_timestamp() + 60 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execute time is 60 seconds later, the enqueue time is now, right?
}, | ||
0, | ||
std::chrono::seconds(60)); | ||
LOG_INFO_PREFIX("retry 60 seconds later,now task enqueue time({})ms", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question is: when to give up retrying?
Suppose a rare case, there are 24 replicas of a table on a server, each one cost more than 1 hour to manual compact, will the queue increase infinity?
What problem does this PR solve?
#1479
What is changed and how does it work?
In the existing logic, if the number of replicas for a particular table undergoing manual_compact on the current node exceeds the limit, the LPC_MANUAL_COMPACT task will be discarded once it enters the task queue. In the new logic, the task will be delayed by 60 seconds before being reinserted into the queue.
In the existing logic, the logic that filters whether a replica should be added to the queue can prevent tasks that have already undergone compaction on the same day from repeatedly entering the queue. Therefore, a simple modification to the task dequeue logic can ensure that replicas that should undergo compaction will not unexpectedly skip compaction due to the calculation at zero o'clock.
Checklist
Tests
I create a pegasus app,and set manual compact time to 23:59. So that some replica have to do manual compact after zero o'clock.
I got result like this:
As shown in the above log, the adjusted compaction results are as expected. Replicas that had compact operations scheduled after midnight were successfully processed. Furthermore, there were no replicas that underwent compaction multiple times, ensuring a non-repetitive compaction process.