- Program: dnsdist
- Issue type: Feature request
Short description
new prometheus metric showing a counter how often the status of a resolver changed.
Usecase
For some reason we have a flapping resolver. The logs show:
Oct 21 10:48:09 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:10 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:17 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:19 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:34 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:36 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:57 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:58 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Since the outage is usually lasts just 1-2 seconds it remains largely invisible when monitoring dnsdist_server_status,
therefore we would propose to add two new counters to dnsdist's prometheus metrics to make these issues visible to monitoring.
Description
Given these events:
Oct 21 10:48:09 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:10 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:17 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:18 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
the new metrics would contain:
dnsdist_server_status_changes_total{server="109_70_100_136:53"} 3
dnsdist_server_status_down_seconds_total{server="109_70_100_136:53"} 2
Short description
new prometheus metric showing a counter how often the status of a resolver changed.
Usecase
For some reason we have a flapping resolver. The logs show:
Since the outage is usually lasts just 1-2 seconds it remains largely invisible when monitoring
dnsdist_server_status,therefore we would propose to add two new counters to dnsdist's prometheus metrics to make these issues visible to monitoring.
Description
Given these events:
the new metrics would contain: