Open
Description
Problem Description
I artificially triggered this one, and it probably doesn't happen in practice much. But I had put a breakpoint in TickerScheduleTriggerEngine::start
to look at a completely unrelated problem. I paused the code there for more than 30 seconds, and then let it run again. I saw this error in the log, and watcher was not running any more. It looks like the watcher service died and did not automatically restart.
[2024-10-18T13:04:18,603][ERROR][o.e.x.w.WatcherService ] [runTask-0] error reloading watcher org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.
at [email protected]/org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:68)
at [email protected]/org.elasticsearch.action.support.PlainActionFuture.actionGet(PlainActionFuture.java:171)
at [email protected]/org.elasticsearch.action.support.PlainActionFuture.actionGet(PlainActionFuture.java:165)
at org.elasticsearch.xpack.watcher.WatcherService.loadWatches(WatcherService.java:337)
at org.elasticsearch.xpack.watcher.WatcherService.reloadInner(WatcherService.java:268)
at org.elasticsearch.xpack.watcher.WatcherService.lambda$reload$1(WatcherService.java:224)
at org.elasticsearch.xpack.watcher.WatcherService$1.doRun(WatcherService.java:450)
at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
at [email protected]/org.elasticsearch.action.support.PlainActionFuture$Sync.get(PlainActionFuture.java:250)
at [email protected]/org.elasticsearch.action.support.PlainActionFuture.get(PlainActionFuture.java:74)
at [email protected]/org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:66)
... 11 more
I'm not sure what the best fix would be. We could restart the thread on failure. Or we could just not use that timeout -- i'm not sure why it's there.