hello, I am trying to use this plugin, but my coredns pods get OOMKilled. I am probably mis-configuring it, possibly creating a loop... I'd like someone to review my config and possibly help me troubleshoot.
I have three clusters each with a modified coredns config. This is the config, this is one of them as an example:
.:53 {
errors
health {
lameduck 5s
}
ready
rewrite name substring cluster.cluster1 cluster.local
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
cluster.cluster2:53 {
rewrite name substring cluster.cluster2 cluster.local
forward . ${cluster2_coredns_ip}:53 {
expire 10s
policy round_robin
}
cache 10
}
cluster.cluster3:53 {
rewrite name substring cluster.cluster3 cluster.local
forward . ${cluster3_coredns_ip}:53 {
expire 10s
policy round_robin
}
cache 10
}
cluster.all:53 {
gathersrv cluster.all. {
cluster.cluster1. c1-
cluster.cluster2. c2-
cluster.cluster3. c3-
}
forward . 127.0.0.1:53
}
so cluster.local is the local cluster, cluster.cluster[1..3] is rewritten as cluster.local and forwarded to the pertinent coredns. Finally cluster.all should gather srv records from all of the clusters.
pointing to cluster1 coredns IP, I can resolve _peers._tcp.etcd-headless.h2.svc.cluster.local:
dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.local
; <<>> DiG 9.18.24 <<>> @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.local
; (1 server found)
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27603
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: f47ee6beb5803a4b (echoed)
;; QUESTION SECTION:
;_peers._tcp.etcd-headless.h2.svc.cluster.local. IN SRV
;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.local. 30 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.local.
;; ADDITIONAL SECTION:
etcd-headless.h2.svc.cluster.local. 30 IN A 10.96.0.42
;; Query time: 5 msec
;; SERVER: 10.89.0.225#53(10.89.0.225) (UDP)
;; WHEN: Tue Apr 02 12:48:12 EDT 2024
;; MSG SIZE rcvd: 237
and resolve _peers._tcp.etcd-headless.h2.svc.cluster.cluster1:
dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster1
; <<>> DiG 9.18.24 <<>> @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster1
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3306
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: e8816b2d226c93c1 (echoed)
;; QUESTION SECTION:
;_peers._tcp.etcd-headless.h2.svc.cluster.cluster1. IN SRV
;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.local. 30 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.local.
;; ADDITIONAL SECTION:
etcd-headless.h2.svc.cluster.local. 30 IN A 10.96.0.42
;; Query time: 2 msec
;; SERVER: 10.89.0.225#53(10.89.0.225) (UDP)
;; WHEN: Tue Apr 02 12:49:10 EDT 2024
;; MSG SIZE rcvd: 240
which result in the same response, correctly so.
I can also try with cluster2:
dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster2
; <<>> DiG 9.18.24 <<>> @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.cluster2
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42009
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: d5ce2f5ab20abd3b (echoed)
;; QUESTION SECTION:
;_peers._tcp.etcd-headless.h2.svc.cluster.cluster2. IN SRV
;; ANSWER SECTION:
_peers._tcp.etcd-headless.h2.svc.cluster.local. 10 IN SRV 0 100 2379 etcd-headless.h2.svc.cluster.local.
;; ADDITIONAL SECTION:
etcd-headless.h2.svc.cluster.local. 10 IN A 10.96.1.114
;; Query time: 9 msec
;; SERVER: 10.89.0.225#53(10.89.0.225) (UDP)
;; WHEN: Tue Apr 02 12:50:49 EDT 2024
;; MSG SIZE rcvd: 240
which still works but it is resolved to a different IP.
however if I try cluster.all:
dig @10.89.0.225 -t SRV _peers._tcp.etcd-headless.h2.svc.cluster.all
;; communications error to 10.89.0.225#53: timed out
I get a timeout and generate an OOMKilled for the coredns pod.
hello, I am trying to use this plugin, but my coredns pods get OOMKilled. I am probably mis-configuring it, possibly creating a loop... I'd like someone to review my config and possibly help me troubleshoot.
I have three clusters each with a modified coredns config. This is the config, this is one of them as an example:
so
cluster.localis the local cluster,cluster.cluster[1..3]is rewritten ascluster.localand forwarded to the pertinent coredns. Finallycluster.allshould gather srv records from all of the clusters.pointing to cluster1 coredns IP, I can resolve
_peers._tcp.etcd-headless.h2.svc.cluster.local:and resolve
_peers._tcp.etcd-headless.h2.svc.cluster.cluster1:which result in the same response, correctly so.
I can also try with cluster2:
which still works but it is resolved to a different IP.
however if I try cluster.all:
I get a timeout and generate an OOMKilled for the coredns pod.