Description
OS: RHEL7
Docker Version: 20.10.17
Problem: When primary DNS server is down, embedded DNS server returns timeout even though secondary is available
Reproduction (on RHEL7 host- I got trial sub to get RHEL 7 https://access.redhat.com/downloads/content/69/ver=/rhel---7/7.9/x86_64/packages):
- Ensure iptables is in use: https://tecadmin.net/install-and-use-iptables-on-centos-rhel-7/
docker swarm init
docker network create -d overlay dns-test-network --attachable
docker run --network dns-test-network -it openjdk:11 /bin/bash
cat > DNSLookup.java <<'EOF'
import java.net.InetAddress;
import java.net.UnknownHostException;
public class DNSLookup
{
public static void main(String args[])
{
System.out.println("DNS Lookup Test");
try {
System.out.println(InetAddress.getByName("example.com"));
} catch (UnknownHostException e) {
System.err.println(e);
}
}
}
EOF
javac DNSLookup.java
java DNSLookup
This works as expected. However, if we simulate failure of primary DNS with iptables the results are not as we would expect
Drop traffic to primary DNS (eg 10.10.10.10)
iptables -I DOCKER-USER -p udp -d 10.10.10.10 --dport 53 -j DROP
Re-run java DNSLookup
in container and we intermittently but the majority of the time get
Temporary failure in name resolution
The debug logs show that we get an io timeout to the primary (replaced with 10.10.10.10), it tries and succeeds to get a result from secondary (replaced with 10.10.10.20) but then continues to try both the primary and the secondary with search domain appended, which means that the successful request was never returned to the underlying container
level=debug msg="Name To resolve: example.com."
level=debug msg="[resolver] query example.com. (A) from 172.18.0.3:39558, forwarding to udp:10.10.10.10"
level=debug msg="Name To resolve: example.com."
level=debug msg="[resolver] query example.com. (A) from 172.18.0.3:58022, forwarding to udp:10.10.10.10"
level=debug msg="Name To resolve: example.com.search.com."
level=debug msg="[resolver] query example.com.search.com. (A) from 172.18.0.3:56944, forwarding to udp:10.10.10.10"
level=debug msg="[resolver] read from DNS server failed, read udp 172.18.0.3:39558->10.10.10.10:53: i/o timeout"
level=debug msg="[resolver] query example.com. (A) from 172.18.0.3:60164, forwarding to udp:10.10.10.20"
level=debug msg="[resolver] received A record \"10.1.1.1\" for \"example.com\" from udp:10.10.10.20"
level=debug msg="Name To resolve: example.com.search.com."
level=debug msg="[resolver] query example.com.search.com. (A) from 172.18.0.3:51365, forwarding to udp:10.10.10.10"
level=debug msg="[resolver] read from DNS server failed, read udp 172.18.0.3:58022->10.10.10.10:53: i/o timeout"
level=debug msg="[resolver] query example.com. (A) from 172.18.0.3:37294, forwarding to udp:10.10.10.20"
level=debug msg="[resolver] received A record \"10.1.1.1\" for \"example.com\" from udp:10.10.10.20"
level=debug msg="[resolver] read from DNS server failed, read udp 172.18.0.3:56944->10.10.10.10:53: i/o timeout"
level=debug msg="[resolver] query example.com.search.com. (A) from 172.18.0.3:50534, forwarding to udp:10.10.10.20"
level=debug msg="[resolver] external DNS udp:10.10.10.20 responded with NXDOMAIN for \"example.com.search.com.\""
level=debug msg="[resolver] external DNS udp:10.10.10.20 did not return any A records for \"example.com.search.com.\""
level=debug msg="[resolver] read from DNS server failed, read udp 172.18.0.3:51365->10.10.10.10:53: i/o timeout"
level=debug msg="[resolver] query example.com.search.com. (A) from 172.18.0.3:32985, forwarding to udp:10.10.10.20"
level=debug msg="[resolver] external DNS udp:10.10.10.20 responded with NXDOMAIN for \"example.com.search.com.\""
level=debug msg="[resolver] external DNS udp:10.10.10.20 did not return any A records for \"example.com.search.com.\""
So when it got a valid return from secondary DNS (lines 8 and 9), it should have stopped and things would have worked
level=debug msg="[resolver] query example.com. (A) from 172.18.0.3:60164, forwarding to udp:10.10.10.20"
level=debug msg="[resolver] received A record \"10.1.1.1\" for \"example.com\" from udp:10.10.10.20"
We know that replacing 127.0.0.11 (docker embdedded dns) with the nameservers from host /etc/resolv.conf works but ideally we would like to find a way forward that allows us to still use docker embdedded dns
Edit: It does work from time to time, this is result of working scenario:
level=debug msg="Name To resolve: example.com."
level=debug msg="[resolver] query example.com. (A) from 172.18.0.3:53936, forwarding to udp:10.10.10.10"
level=debug msg="Name To resolve: example.com."
level=debug msg="[resolver] query example.com. (A) from 172.18.0.3:37429, forwarding to udp:10.10.10.10"
level=debug msg="[resolver] read from DNS server failed, read udp 172.18.0.3:53936->10.10.10.10:53: i/o timeout"
level=debug msg="[resolver] query example.com. (A) from 172.18.0.3:46871, forwarding to udp:10.10.10.20"
level=debug msg="[resolver] received A record \"10.1.1.1\" for \"example.com.\" from udp:10.10.10.20"