Description
Issue
We observed a lot of application crashes due to health check (with http request) timeout, but all other http requests were actually working right before crashes, and also all other metrics were good.
Actually, our health check endpoint is quite fast without any other logic, and also we increased the timeout to 20 seconds but it doesn't help too much.
Expected result
We are not sure why some of health checks fail due to timeout, might be CPU throttling.
But we expect that the runtime gives the application another chance to do another health check.
Current result
Application instance would be restarted after only one single health check failure.
Possible Fix
Adding failure threshold for health check, and after health check fails failureThreshold times in a row, the runtime considers that the overall check has failed and the container is not healthy/live.