Open
Description
What did you do?
Running AM in prometheus as stateful set with a headless service, giving each AM a name like alertmanager-0.alertmanager.default.svc.cluster.local
.
The pod gets, among others, default.svc.cluster.local
configured as search domain in /etc/resolve.conf:
# cat /etc/resolv.conf
nameserver 10.35.240.10
search default.svc.cluster.local svc.cluster.local cluster.local c.latency-at.internal google.internal
This allows for the alertmanager-0.alertmanager
name to be resolved unqualified like this:
# nslookup alertmanager-0.alertmanager
Server: 10.35.240.10
Address 1: 10.35.240.10 kube-dns.kube-system.svc.cluster.local
Name: alertmanager-0.alertmanager
Address 1: 10.32.4.32 alertmanager-0.alertmanager.default.svc.cluster.local
The alertmanager though can't resolve this name unqualified (which was working at least in 0.11.0) and logs this error:
level=warn ts=2018-03-30T13:25:15.016042032Z caller=cluster.go:129 component=cluster msg="failed to join cluster" err="2 errors occurred:\n\n* Failed to resolve alertmanager-0.alertmanager:6783: lookup alertmanager-0.alertmanager on 10.35.240.10:53: no such host\n* Failed to join 10.32.6.23: dial tcp 10.32.6.23:6783: connect: connection refused"
Environment
- Alertmanager version:
alertmanager, version 0.15.0-rc.1 (branch: HEAD, revision: acb111e812530bec1ac6d908bc14725793e07cf3)