Skip to content

prometheus metrics: better understanding of SERVFAILS with RFC8914 Extended DNS Error counters #9733

@appliedprivacy

Description

@appliedprivacy
  • Program: dnsdist
  • Issue type: Feature request

Short description

When looking at dnsdist prometheus metrics SERVFAIL graphs the obvious question comes up: What is the root cause behind them?
A recently published RFC aims to help with that:
https://datatracker.ietf.org/doc/rfc8914/
https://blog.cloudflare.com/unwrap-the-servfail/

Usecase

Better understanding of the root cause behind SERVFAILs (if EDE data is available)

Description

Would be nice if each EDE case would be counted and published in prometheus metrics individually if the information is available.

Current metrics

  • dnsdist_frontend_servfail
  • dnsdist_servfail_responses

could be extended with a EDE label containing the codes:

ede=
https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#extended-dns-error-codes

 Other
 Unsupported DNSKEY Algorithm
 Unsupported DS Digest Type
 Stale Answer
 Forged Answer
 DNSSEC Indeterminate
 DNSSEC Bogus
 Signature Expired
 Signature Not Yet Valid
 DNSKEY Missing
 RRSIGs Missing
 No Zone Key Bit Set
 NSEC Missing
 Cached Error
 Not Ready
 Blocked
 Censored
 Filtered
 Prohibited
 Stale NXDOMAIN Answer
 Not Authoritative
 Not Supported
 No Reachable Authority
 Network Error
 Invalid Data

example:

dnsdist_servfail_responses{ede="DNSSEC Bogus"} 10

In addition to those with an EDE present it would be nice to also see the amount of SERVFAIL with no EDE present.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions