Skip to content

Allow failure threshold for application health check #3435

Open
@zuesxiao

Description

@zuesxiao

Issue

We observed a lot of application crashes due to health check (with http request) timeout, but all other http requests were actually working right before crashes, and also all other metrics were good.

Actually, our health check endpoint is quite fast without any other logic, and also we increased the timeout to 20 seconds but it doesn't help too much.

Expected result

We are not sure why some of health checks fail due to timeout, might be CPU throttling.
But we expect that the runtime gives the application another chance to do another health check.

Current result

Application instance would be restarted after only one single health check failure.

Possible Fix

Adding failure threshold for health check, and after health check fails failureThreshold times in a row, the runtime considers that the overall check has failed and the container is not healthy/live.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions