Startup and Liveness Probes

Startup and Liveness Probes are tools you can add to a kubernetes deployment which tell Kubernetes that the node is actually ready. These are most highly relevant in a Kubernetes Service where you use a group of pods to serve responses over HTTP.

In the case of LLM inference, a service to serve the LLM can come online but it is not ready to start serving requests immediately because it first has to load the model weights. In this case, kubernetes will see that the service is ready because the container has started, but if you send a request to it while the model is loading it will error because it is not actually ready to serve. Hence, we add probes.

Startup Probe

Given the above, a startup probe is a way to ensure Kubernetes will not route traffic to a container until the startup probe returns success. In the case of vLLM for example, vLLM exposes a /health endpoint that can be queried that will simply return a 200 OK status when the model has been loaded and vLLM is ready to serve requests.

A key criteria of a startup probe is that it will run until it succeeds and then not run anymore. Success criteria is defined in the configuration.

Liveness Probe

A liveness probe is very similar to a startup probe, except that it runs continuously in the background to make sure the service is still alive. This is often run much less frequently as a "pulse".

Parameter Description

Both startup and liveness probes in Corrino utilize the same parameters, simplifying the interface quite a bit. These can be added to any service to define when the service is ready, and if it is still alive:

There is an important consideration to make for the startup probe. If initialization of your application is long (for example you are loading a very large model that takes 30 minutes to load), then it is important to set an initial_delay_seconds that reflects that, or set a very high failure_threshold, or set a long period_seconds such that the time between checks is sufficient.

Param	Description	Example
failure_threshold	Allowable failures until container reports error	30
endpoint_path	Endpoint to query to check for readiness / liveness	/health
port	Port k8s uses to query `endpoint_path`	8000
scheme	protocol to use when querying	HTTP or HTTPS
initial_delay_seconds	Time before probe begins its testing	60
period_seconds	Amount of time between subsequent checks	2
success_threshold	Number of successful queries before probe returns success	1
timeout_seconds	Number of seconds after which probe times out	1

Examples

The examples below represent the defaults used in the control plane. These are not required, but are recommended for services with any kind of startup time (like LLM inference).

Minimum Example to Include a Probe

{
    "recipe_startup_probe_params": {
        "endpoint_path": "/health"
    },
    "recipe_liveness_probe_params": {
        "endpoint_path": "/health"
    }
}

Example with Control Plane Defaults

{
    "recipe_startup_probe_params": {
        "failure_threshold": 30,
        "endpoint_path": "/health",
        "port": 8000,
        "scheme": "HTTP",
        "initial_delay_seconds": 10,
        "period_seconds": 2,
        "success_threshold": 1,
        "timeout_seconds": 1
    },
    "recipe_liveness_probe_params": {
        "failure_threshold": 3,
        "endpoint_path": "/health",
        "port": 8000,
        "scheme": "HTTP",
        "initial_delay_seconds": 600,
        "period_seconds": 600,
        "success_threshold": 1,
        "timeout_seconds": 1
    }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Startup and Liveness Probes

Startup Probe

Liveness Probe

Parameter Description

Examples

Minimum Example to Include a Probe

Example with Control Plane Defaults

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Startup and Liveness Probes

Startup Probe

Liveness Probe

Parameter Description

Examples

Minimum Example to Include a Probe

Example with Control Plane Defaults