Skip to content

WatcherService thread stops running if querying .watches index takes more than 30 seconds #115157

Open
@masseyke

Description

@masseyke

Problem Description

I artificially triggered this one, and it probably doesn't happen in practice much. But I had put a breakpoint in TickerScheduleTriggerEngine::start to look at a completely unrelated problem. I paused the code there for more than 30 seconds, and then let it run again. I saw this error in the log, and watcher was not running any more. It looks like the watcher service died and did not automatically restart.

[2024-10-18T13:04:18,603][ERROR][o.e.x.w.WatcherService   ] [runTask-0] error reloading watcher org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.
        at [email protected]/org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:68)
        at [email protected]/org.elasticsearch.action.support.PlainActionFuture.actionGet(PlainActionFuture.java:171)
        at [email protected]/org.elasticsearch.action.support.PlainActionFuture.actionGet(PlainActionFuture.java:165)
        at org.elasticsearch.xpack.watcher.WatcherService.loadWatches(WatcherService.java:337)
        at org.elasticsearch.xpack.watcher.WatcherService.reloadInner(WatcherService.java:268)
        at org.elasticsearch.xpack.watcher.WatcherService.lambda$reload$1(WatcherService.java:224)
        at org.elasticsearch.xpack.watcher.WatcherService$1.doRun(WatcherService.java:450)
        at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
        at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
        at [email protected]/org.elasticsearch.action.support.PlainActionFuture$Sync.get(PlainActionFuture.java:250)
        at [email protected]/org.elasticsearch.action.support.PlainActionFuture.get(PlainActionFuture.java:74)
        at [email protected]/org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:66)
        ... 11 more

I'm not sure what the best fix would be. We could restart the thread on failure. Or we could just not use that timeout -- i'm not sure why it's there.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions