Skip to content

metrics for downstream state changes and total downtime #12113

@appliedprivacy

Description

@appliedprivacy
  • Program: dnsdist
  • Issue type: Feature request

Short description

new prometheus metric showing a counter how often the status of a resolver changed.

Usecase

For some reason we have a flapping resolver. The logs show:

Oct 21 10:48:09 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:10 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:17 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:19 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:34 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:36 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:57 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:58 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'

Since the outage is usually lasts just 1-2 seconds it remains largely invisible when monitoring dnsdist_server_status,
therefore we would propose to add two new counters to dnsdist's prometheus metrics to make these issues visible to monitoring.

Description

Given these events:

Oct 21 10:48:09 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:10 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:17 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:18 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'

the new metrics would contain:

dnsdist_server_status_changes_total{server="109_70_100_136:53"} 3
dnsdist_server_status_down_seconds_total{server="109_70_100_136:53"}  2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions