Skip to content

Increase majc priority for tablets over file size threshold #5026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 8, 2024

Conversation

dlmarion
Copy link
Contributor

CompactionManager.mainLoop runs in the TabletServer looking for tablets that need to be compacted. It ends up calling CompactionService.submitCompaction at some interval for each hosted Tablet. CompactionService.getCompactionPlan will make a CompactionPlan for the tablet and log a warning if no CompactionPlan is created but the number of files is larger than TSERV_SCAN_MAX_OPENFILES. When there are no compactions running for the tablet and no plan is calculated, then DefaultCompactionPlanner.makePlan takes into account TABLE_FILE_MAX and TSERV_SCAN_MAX_OPENFILES and will create a system compaction that considers all of the files and calculates which ones need to be compacted to get below the limit. Finally, a priority is calculated by calling CompactionJobPrioritizer.createPriority. However, given that this compaction is a SYSTEM compaction it will have a lower priority than all current USER compactions and it's priority will still be based on the total number of files. Given that the TABLE_FILE_MAX is per-table it's possible to have two tablets from different tables and the tablet that is over the size threshold has a lower priority than the tablet that is not over the size threshold. This change modifies
CompactionJobPrioritizer.createPriority to take into account whether or not the tablet is over the threshold to give it a higher priority.

Closes #4610

CompactionManager.mainLoop runs in the TabletServer looking
for tablets that need to be compacted. It ends up calling
CompactionService.submitCompaction at some interval for each
hosted Tablet. CompactionService.getCompactionPlan will make
a CompactionPlan for the tablet and log a warning if no
CompactionPlan is created but the number of files is larger
than TSERV_SCAN_MAX_OPENFILES. When there are no compactions
running for the tablet and no plan is calculated, then
DefaultCompactionPlanner.makePlan takes into account
TABLE_FILE_MAX and TSERV_SCAN_MAX_OPENFILES and will create
a system compaction that considers all of the files and
calculates which ones need to be compacted to get below the
limit. Finally, a priority is calculated by calling
CompactionJobPrioritizer.createPriority. However, given that
this compaction is a SYSTEM compaction it will have a lower
priority than all current USER compactions and it's priority
will still be based on the total number of files. Given that
the TABLE_FILE_MAX is per-table it's possible to have two
tablets from different tables and the tablet that is over the
size threshold has a lower priority than the tablet that is
not over the size threshold. This change modifies
CompactionJobPrioritizer.createPriority to take into account
whether or not the tablet is over the threshold to give it a
higher priority.

Closes apache#4610
@dlmarion dlmarion added this to the 2.1.4 milestone Oct 31, 2024
@dlmarion dlmarion self-assigned this Oct 31, 2024
// Given that tablets with too many files cause several problems,
// boost their priority to the maximum allowed value.
if (condition == Condition.TABLET_OVER_SIZE) {
return Short.MAX_VALUE;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could possibly increment a metric for this condition, although I'm not sure how useful it would be. I think knowing that all of the external compactors and compaction threads in the tservers are fully utilized, along with a backlog of waiting compactions, is a more useful indication that more capacity is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your changes to the monitor its now scanning the metadata table periodically. In that scan it could compute the number of tablets over the max per table and display that on the monitor.

@keith-turner
Copy link
Contributor

May still want to leave #4610 open after merging this. This change is nice in that it helps compactions run faster for tablets over the threshold. There is still the problem that scan servers cache the file list. So if a scan server caches the list of files of a tablet w/ too many files for 5 mins then that tablet can not be scanned for that 5 min period even if it does compact. Not sure of the best way to handle this, scan servers could possibly poll the tablets files a bit more frequently when a tablet is in this condition.

@dlmarion
Copy link
Contributor Author

dlmarion commented Nov 5, 2024

@keith-turner - I think I addressed all of your suggestions in the latest commit.

Range<Short> range = null;
Function<Range<Short>,Short> func = normalPriorityFunction;
if (Namespace.ACCUMULO.id() == nsId) {
// Handle system tables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be chop compactions when merging the metadata table and these may end w/ a null range causing an exception. Need to add handling for kind of chop. Will have to drop this in the 3.1 code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 74d66b2 I added code to treat CHOP as USER compaction kind.

@dlmarion dlmarion merged commit ba9c376 into apache:2.1 Nov 8, 2024
8 checks passed
@dlmarion dlmarion deleted the 4610-increase-majc-prio branch November 8, 2024 19:55
dlmarion added a commit to dlmarion/accumulo that referenced this pull request Nov 12, 2024
The test was timing out because it never started running compactions
created by the test method. Instead, the compactor process was
running compactions created by the previous test method because
the previous test created a table with a lot of files, started a
user compaction, then cancelled the user compaction. The recent
changes in apache#5026 caused a bunch of system compactions to be
generated for the table. The two test methods share the same
compaction queue, so the compactor was busy running the system
compactions.

To fix this issue I backported a property added in apache#3955 that
makes the compactor cancel check method time configurable and
I deleted the table in the test method that created a lot of files.

Closes apache#5052
dlmarion added a commit that referenced this pull request Nov 14, 2024
The test was timing out because it never started running compactions
created by the test method. Instead, the compactor process was
running compactions created by the previous test method because
the previous test created a table with a lot of files, started a
user compaction, then cancelled the user compaction. The recent
changes in #5026 caused a bunch of system compactions to be
generated for the table. The two test methods share the same
compaction queue, so the compactor was busy running the system
compactions.

To fix this issue I backported a property added in #3955 that
makes the compactor cancel check method time configurable and
I deleted the table in the test method that created a lot of files.

Closes #5052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tablet with lots of file may not be readable on scan servers for long periods of time.
3 participants