Skip to content

Liveness and readiness probes prevent the pod from starting #104

Open
@kolesnikovae

Description

Currently, liveness and readiness probes are configured with initialDelaySeconds set to 30s which is fairly high value. However, in case of the container crash, Pyroscope server may need even longer time to recover the storage (it is hard to estimate the procedure duration, but a minute or two is what we may expect).

A proper solution would be to separate implementations of the readiness and liveness checks:

  • liveness probe starts serving requests in the very beginning of the server initialisation (before any other component)
  • readiness probe starts serving requests only when all the components finished the initialisation

Increasing initialDelaySeconds further by default for readiness probe might be undesirable because it will introduce noticeable unconditional delay between the server start and the moment when it actually starts serving requests.

As a workaround, I think we may adjust the default settings so that the pod has at least 90s to finish initialisation, but does not prevent server from handling requests if it managed to complete initialisation sooner:

readinessProbe:
  enabled: true
  httpGet:
    path: /healthz
    port: 4040
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 30
  failureThreshold: 10
  successThreshold: 1

# Despite the fact that the initial delay is 60 seconds, if the pod crashes
# after initialisation (this is the only realistic reason why the probe may fail),
# it will be restarted.
#
# Note that livenessProbe does not wait for readinessProbe to succeed. 
livenessProbe:
  enabled: true
  httpGet:
    path: /healthz
    port: 4040
  initialDelaySeconds: 60
  periodSeconds: 10
  timeoutSeconds: 30
  failureThreshold: 3
  successThreshold: 1

The current config:

readinessProbe:
  # -- Enable Pyroscope server readiness
  enabled: true
  httpGet:
    # -- Pyroscope server readiness check path
    path: /healthz
    # -- Pyroscope server readiness check port
    port: 4040
  # -- Pyroscope server readiness initial delay in seconds
  initialDelaySeconds: 30
  # -- Pyroscope server readiness check frequency in seconds
  periodSeconds: 5
  # -- Pyroscope server readiness check request timeout
  timeoutSeconds: 30
  # -- Pyroscope server readiness check failure threshold count
  failureThreshold: 3
  # -- Pyroscope server readiness check success threshold count
  successThreshold: 1

livenessProbe:
  # -- Enable Pyroscope server liveness
  enabled: true
  httpGet:
    # -- Pyroscope server liveness check path
    path: /healthz
    # -- Pyroscope server liveness check port
    port: 4040
  # -- Pyroscope server liveness check intial delay in seconds
  initialDelaySeconds: 30
  # -- Pyroscope server liveness check frequency in seconds
  periodSeconds: 15
  # -- Pyroscope server liveness check request timeout
  timeoutSeconds: 30
  # -- Pyroscope server liveness check failure threshold
  failureThreshold: 3
  # -- Pyroscope server liveness check success threshold
  successThreshold: 1

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions