-
Notifications
You must be signed in to change notification settings - Fork 460
Fix race condition in Manager.partitionMigrations #5531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -638,6 +638,43 @@ public void run() { | |||||||||
*/ | ||||||||||
private Map<DataLevel,Set<KeyExtent>> partitionMigrations() { | ||||||||||
final Map<DataLevel,Set<KeyExtent>> partitionedMigrations = new EnumMap<>(DataLevel.class); | ||||||||||
|
||||||||||
// Don't try to check migrations if the Root and Metadata | ||||||||||
// tables are not hosted. | ||||||||||
boolean skipMigrationCheck = false; | ||||||||||
|
||||||||||
if (watchers.size() != 3) { | ||||||||||
log.debug("Skipping migration check, not all TabletGroupWatchers are started"); | ||||||||||
skipMigrationCheck = true; | ||||||||||
} else { | ||||||||||
for (TabletGroupWatcher watcher : watchers) { | ||||||||||
if (!watcher.isAlive()) { | ||||||||||
log.debug("Skipping migration check, not all TabletGroupWatchers are started"); | ||||||||||
skipMigrationCheck = true; | ||||||||||
break; | ||||||||||
} | ||||||||||
Comment on lines
+646
to
+655
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had a problem with one of the ITs in this case where the TGWs wouldn't be started and it would hang on the scan. It hung every time then suddenly after a few changes it stopped hanging, so I assumed it was fixed. Looking back, I should have still been checking that the TGWs were started... This is a good catch and looks good to me |
||||||||||
final DataLevel level = watcher.getLevel(); | ||||||||||
if (level == DataLevel.ROOT || level == DataLevel.METADATA) { | ||||||||||
final TableId tid = level == DataLevel.ROOT ? SystemTables.ROOT.tableId() | ||||||||||
: SystemTables.METADATA.tableId(); | ||||||||||
Comment on lines
+657
to
+659
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This use of accumulo/core/src/main/java/org/apache/accumulo/core/metadata/schema/Ample.java Lines 82 to 85 in f676217
|
||||||||||
final TableCounts count = watcher.getStats(tid); | ||||||||||
if (count.hosted() == 0 || count.assigned() > 0 || count.assignedToDeadServers() > 0 | ||||||||||
|| count.suspended() > 0 || count.unassigned() > 0) { | ||||||||||
log.debug("Table {} has tablets that are not hosted, {}", tid, count); | ||||||||||
log.debug("Skipping migration check due to unhosted system tables"); | ||||||||||
skipMigrationCheck = true; | ||||||||||
break; | ||||||||||
} | ||||||||||
} | ||||||||||
} | ||||||||||
} | ||||||||||
if (skipMigrationCheck) { | ||||||||||
partitionedMigrations.put(DataLevel.ROOT, new HashSet<>()); | ||||||||||
partitionedMigrations.put(DataLevel.METADATA, new HashSet<>()); | ||||||||||
partitionedMigrations.put(DataLevel.USER, new HashSet<>()); | ||||||||||
return partitionedMigrations; | ||||||||||
} | ||||||||||
|
||||||||||
for (DataLevel dl : DataLevel.values()) { | ||||||||||
Set<KeyExtent> extents = new HashSet<>(); | ||||||||||
// prev row needed for the extent | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we want to do this check everywhere we scan for migrations. This is done in several places in Manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to structure the code such that it ok if the thread gets stuck trying to read migrations.