Description
What version of nebula
are you using?
1.7.2
What operating system are you using?
Linux
Describe the Bug
We are now monitoring our certificate expiries thanks to the new feature built straight into the prometheus metrics. In addition to this, we have set up monitoring rules which let us know if the ttl_seconds falls below a certain threshold.
We started noticing an issue whenever nebula clients are restarted with this monitoring in place, that directly after boot, the cert_ttl_seconds metric is emitting a 0
until the certificate itself is fully initialized within Nebula. This has the unfortunate side effect of firing our monitors every time a nebula client is restarted with this monitoring in place.
~# curl -s 127.0.0.1:8090/metrics | grep cert
# HELP nebula_certificate_ttl_seconds certificate.ttl_seconds
# TYPE nebula_certificate_ttl_seconds gauge
nebula_certificate_ttl_seconds 0
~# curl -s 127.0.0.1:8090/metrics | grep cert
# HELP nebula_certificate_ttl_seconds certificate.ttl_seconds
# TYPE nebula_certificate_ttl_seconds gauge
nebula_certificate_ttl_seconds 60803
It seems to me it would be better to not emit this metric at all if the value it provides is inaccurate -- prometheus will use the last value scraped until a new value is provided to update it.
Logs from affected hosts
No response
Config files from affected hosts
No response