Skip to content

Moves balancing code out of Manager class #5537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

keith-turner
Copy link
Contributor

@keith-turner keith-turner commented May 6, 2025

The balancing code needs some improvements. Currently the code is entangled in the manager code in odd places (like some of the balancing code is un the Manager.StatusThread class). This change moves the balancing code to its own class w/o making any changes to the code (except fixing two small existing bugs in shouldCleanupMigration()).

Making this change as a first step in making the balancer more independent of other manager code and threads.

The balancing code needs some improvements.  Currently the code is
entangled in the manager code in odd places (like under some of the
balancing code is un the Manager.StatusThread class).  This change moves
the balancing code to its own class w/o making any changes to the code
(except fixing two small existing bugs in shouldCleanupMigration()).

Making this change as a first step in making the balancer more
independent of other manager code and threads.
try (
var tabletsMetadata = getContext()
.getAmple().readTablets().forLevel(dl).fetch(TabletMetadata.ColumnType.PREV_ROW,
TabletMetadata.ColumnType.LOCATION, TabletMetadata.ColumnType.MIGRATION)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the LOCATION column to this code because I noticed an error in the logs where shouldCleanupMigration() was throw an exception because the column was not fetched.

balancedNotifier.notifyAll();
}
} else if (totalMigrationsOut > 0) {
manager.nextEvent.event("Migrating %d more tablets, %d total", totalMigrationsOut,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the code in the class was copied as is from the manager and then some small changes were made to add the manager. prefix here. Did not change the functionality of any of the code though.

var tableState = getContext().getTableManager().getTableState(tabletMetadata.getTableId());
var migration = tabletMetadata.getMigration();
Preconditions.checkState(migration != null,
"This method should only be called if there is a migration");
return tableState == TableState.OFFLINE || !onlineTabletServers().contains(migration)
|| tabletMetadata.getLocation().getServerInstance().equals(migration);
|| (tabletMetadata.getLocation() != null
Copy link
Contributor Author

@keith-turner keith-turner May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the 2nd existing bug that was fixed in this change (the first was fetching the location column when calling this). Was seeing an NPE here in the logs.

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should run ScanServerUpgrade11to12TestIT.testScanRefTableCreation with this change locally. It was hanging trying to scan the root and metadata tables because they were not hosted.

}

private void initializeBalancer() {
var localTabletBalancer =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably log which balancer class was created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in 705d58b

}

void propertyChanged(String property) {
String resolved = DeprecatedPropertyUtil.getReplacementName(property, (log, replacement) -> {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still an old deprecated property in 4.0? If so, should we be looking for the deprecated property in initializeBalancer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is probably no longer needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in 705d58b

@keith-turner
Copy link
Contributor Author

You should run ScanServerUpgrade11to12TestIT.testScanRefTableCreation with this change locally. It was hanging trying to scan the root and metadata tables because they were not hosted.

I think that is the problem that needs the balancer to run in a separate thread. With this change was just attempting to clean up some tech debt in the way the balancer code is organized before making the threading changes. Hoping we can make a change to spin up three balancer threads and remove balancing from the status thread after this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants