Description
I'm facing an issue with frequent restarts on v0.0.95.
Liveness endpoint (/metrics) response may sometimes be in the range of 1-5 seconds. Of course I can just increase the timeout (which is 1 second by default), but this only hides the problem. I believe there is some inefficiency in the code which affects even /metrics responses.
There are quite a lot of secrets and configmaps in the cluster, so it might put a strain, but there are no cpu and memory limits, so it should just take as much as needed and continue working. I think that /metrics should have its own thread or it would be even better to have a dedicated /readiness and /liveness endpoints which will really check and appropriately report the status of the service. Otherwise its unreliable to run in production especially considering there is no HA, so if pod is restarted I believe it will lose any info about the resources and may miss triggering reloads.
Activity