Skip to content

Inhibit Rules should be able to consider different labels in their equal statement #2254

@debugloop

Description

@debugloop

While the title basically says it all, I will try to back this up using a concrete example. Imagine a setup where in addition to all routers being a target for some type of blackbox_exporter-style metrics an additional source of data is used to generate JSON target lists for file_sd.

In my example, this additional source could be a routers configuration backup, which gives us a definitive list of link addresses configured on any interface of any router, as well as any configured meta data for each interface. It is trivial to build a JSON file containing the targets (all link addresses configured locally) with the relevant labels:

  • the router this adress is configured on
  • the interface name/description
  • the remote hostname, for instance parsed from the description

These metrics could end up looking like this (10.0.0.0/8 are link adresses, 192.168.0.0/16 are loopbacks):

# job: ping-router-loopback
probe_success{instance="192.168.0.1", hostname="r1"} 0
probe_success{instance="192.168.0.2", hostname="r2"} 1

# job: ping-router-interface
probe_success{instance="10.0.0.1", hostname="r2", interface="Te0/7/0/12", remote="r1"} 0
probe_success{instance="10.0.0.2", hostname="r1", interface="Te0/2/0/1", remote="r2"} 0

Alerting in the most obvious way would create alerts similar to the job names, for instance a RouterDown alert with the expression probe_success{job="ping-router-loopback"} == 0. I would obviously want the following inhibition rule:

- source_match:
    alertname: RouterDown
  target_match:
    alertname: InterfaceDown
  equal:
  - hostname

This would inhibit the alert informing me that an interface is down on a router which is already being alerted as Down itself. I would however like to go one step further using a inhibit rule such as the following one, as it does not come as a surprise that any interface adjacent to the downed router will go down in turn, even though it is on another router/in another region/whatever.

# option A
- source_match:
    alertname: RouterDown
  target_match:
    alertname: InterfaceDown
  equal:
  - source_label: hostname
    target_label: remote

# option B, which would make the original `equal` kind of unnecessary
# by using `hostname: $hostname` for instance
- source_match:
    alertname: RouterDown
  target_match:
    alertname: InterfaceDown
    remote: "$hostname"

While the proposed syntax variants are just a general idea, I feel that this should be possible in some way which does not involve hacking around with the underlying alerts expressions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions