-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
staleThe PR/Issue does not have recent activities and will be closed automaticallyThe PR/Issue does not have recent activities and will be closed automaticallytype-bugThis issue is about a bugThis issue is about a bug
Description
Alluxio Version:
2.8.1
Describe the bug
The method of reinit block blockReinit can be invoked by other method, eg:getCachedWorkers(); but the method of getCachedWorkers lock FileSystemContext;reinit needs a lock of FileSystemContext too.
public void reinit(boolean updateClusterConf, boolean updatePathConf)
throws UnavailableException, IOException {
try (Closeable r = mReinitializer.allow()) {
InetSocketAddress masterAddr;
try {
masterAddr = getMasterAddress();
} catch (IOException e) {
throw new UnavailableException("Failed to get master address during reinitialization", e);
}
try {
getClientContext().loadConf(masterAddr, updateClusterConf, updatePathConf);
} catch (AlluxioStatusException e) {
// Failed to load configuration from meta master, maybe master is being restarted,
// or their is a temporary network problem, give up reinitialization. The heartbeat thread
// will try to reinitialize in the next heartbeat.
throw new UnavailableException(String.format("Failed to load configuration from "
+ "meta master (%s) during reinitialization", masterAddr), e);
}
LOG.debug("Reinitializing FileSystemContext: update cluster conf: {}, update path conf:"
+ " {}", updateClusterConf, updateClusterConf);
closeContext();
ReconfigurableRegistry.update();
initContext(getClientContext(), MasterInquireClient.Factory.create(getClusterConf(),
getClientContext().getUserState()));
LOG.debug("FileSystemContext re-initialized");
mReinitializer.onSuccess();
}
}
jstack:
"task-execution-service-5" #1131 daemon prio=5 os_prio=0 tid=0x00007f9938008800 nid=0x153bc waiting on condition [0x00007fa21fdfe000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c011b7c0> (a alluxio.concurrent.CountingLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at alluxio.concurrent.CountingLatch.inc(CountingLatch.java:108)
at alluxio.client.file.FileSystemContextReinitializer$ReinitBlockerResource.<init>(FileSystemContextReinitializer.java:104)
at alluxio.client.file.FileSystemContextReinitializer.block(FileSystemContextReinitializer.java:155)
at alluxio.client.file.FileSystemContext.blockReinit(FileSystemContext.java:350)
at alluxio.client.file.FileSystemContext.acquireBlockMasterClientResource(FileSystemContext.java:477)
at alluxio.client.file.FileSystemContext.getAllWorkers(FileSystemContext.java:650)
at alluxio.client.file.FileSystemContext.getCachedWorkers(FileSystemContext.java:636)
- locked <0x00000006c03b8680> (a alluxio.client.file.FileSystemContext)
at alluxio.job.util.JobUtils.loadBlock(JobUtils.java:128)
at alluxio.job.plan.load.LoadDefinition.runTask(LoadDefinition.java:189)
at alluxio.job.plan.load.LoadDefinition.runTask(LoadDefinition.java:54)
at alluxio.job.plan.batch.BatchedJobDefinition.runTask(BatchedJobDefinition.java:81)
at alluxio.job.plan.batch.BatchedJobDefinition.runTask(BatchedJobDefinition.java:42)
at alluxio.worker.job.task.TaskExecutor.run(TaskExecutor.java:88)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"config-hash-master-heartbeat-0" #292 daemon prio=5 os_prio=0 tid=0x00007fa46d7ae800 nid=0x145af waiting for monitor entry [0x00007f9e3aeee000]
java.lang.Thread.State: BLOCKED (on object monitor)
at alluxio.client.file.FileSystemContext.closeContext(FileSystemContext.java:298)
- waiting to lock <0x00000006c03b8680> (a alluxio.client.file.FileSystemContext)
at alluxio.client.file.FileSystemContext.reinit(FileSystemContext.java:393)
at alluxio.client.file.ConfigHashSync.heartbeat(ConfigHashSync.java:94)
- locked <0x00000006c03cf088> (a alluxio.client.file.ConfigHashSync)
at alluxio.client.file.FileSystemContextReinitializer.lambda$new$0(FileSystemContextReinitializer.java:69)
at alluxio.client.file.FileSystemContextReinitializer$$Lambda$98/1029472813.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
staleThe PR/Issue does not have recent activities and will be closed automaticallyThe PR/Issue does not have recent activities and will be closed automaticallytype-bugThis issue is about a bugThis issue is about a bug