Skip to content

Reloader frequent restarts due to failed liveness probes #252

Open
@messiahUA

Description

I'm facing an issue with frequent restarts on v0.0.95.

Liveness endpoint (/metrics) response may sometimes be in the range of 1-5 seconds. Of course I can just increase the timeout (which is 1 second by default), but this only hides the problem. I believe there is some inefficiency in the code which affects even /metrics responses.

There are quite a lot of secrets and configmaps in the cluster, so it might put a strain, but there are no cpu and memory limits, so it should just take as much as needed and continue working. I think that /metrics should have its own thread or it would be even better to have a dedicated /readiness and /liveness endpoints which will really check and appropriately report the status of the service. Otherwise its unreliable to run in production especially considering there is no HA, so if pod is restarted I believe it will lose any info about the resources and may miss triggering reloads.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions