-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Some OpenSearch queries are timing out before returning but we don't have a nice way of measuring the impact of this. The default is currently 10s, and it could be that this is actually a perfectly fine value and the failing queries complete in a fraction of the time when rerun.
Indeed it might even make sense to reduce the timeout - for example, if most queries return in <1s, including the re-run failed ones, then it might be better to timeout sooner and just retry until success.
It could also be that many queries are pushing 10s and raising the timeout a little is a sensible idea that would prevent most failures.
Basically, we need a way to measure the query time and failure rate so that we can make an informed decision instead of a guess, so adding some instrumentation and recording metrics here for a couple of runs would be of great utility.