Description
While the title basically says it all, I will try to back this up using a concrete example. Imagine a setup where in addition to all routers being a target for some type of blackbox_exporter
-style metrics an additional source of data is used to generate JSON target lists for file_sd
.
In my example, this additional source could be a routers configuration backup, which gives us a definitive list of link addresses configured on any interface of any router, as well as any configured meta data for each interface. It is trivial to build a JSON file containing the targets (all link addresses configured locally) with the relevant labels:
- the router this adress is configured on
- the interface name/description
- the remote hostname, for instance parsed from the description
These metrics could end up looking like this (10.0.0.0/8
are link adresses, 192.168.0.0/16
are loopbacks):
# job: ping-router-loopback
probe_success{instance="192.168.0.1", hostname="r1"} 0
probe_success{instance="192.168.0.2", hostname="r2"} 1
# job: ping-router-interface
probe_success{instance="10.0.0.1", hostname="r2", interface="Te0/7/0/12", remote="r1"} 0
probe_success{instance="10.0.0.2", hostname="r1", interface="Te0/2/0/1", remote="r2"} 0
Alerting in the most obvious way would create alerts similar to the job names, for instance a RouterDown
alert with the expression probe_success{job="ping-router-loopback"} == 0
. I would obviously want the following inhibition rule:
- source_match:
alertname: RouterDown
target_match:
alertname: InterfaceDown
equal:
- hostname
This would inhibit the alert informing me that an interface is down on a router which is already being alerted as Down itself. I would however like to go one step further using a inhibit rule such as the following one, as it does not come as a surprise that any interface adjacent to the downed router will go down in turn, even though it is on another router/in another region/whatever.
# option A
- source_match:
alertname: RouterDown
target_match:
alertname: InterfaceDown
equal:
- source_label: hostname
target_label: remote
# option B, which would make the original `equal` kind of unnecessary
# by using `hostname: $hostname` for instance
- source_match:
alertname: RouterDown
target_match:
alertname: InterfaceDown
remote: "$hostname"
While the proposed syntax variants are just a general idea, I feel that this should be possible in some way which does not involve hacking around with the underlying alerts expressions.