[Freeipa-users] DNS forwarding configuration randomly breaks and stops working

Petr Spacek pspacek at redhat.com
Mon Oct 5 08:03:09 UTC 2015


On 3.10.2015 01:47, nathan at nathanpeters.com wrote:
> This issue has occured again and I am once again trying to troubleshoot it.
> 
> show forwarder
> --------------
> -bash-4.2$ ipa dnsconfig-show
>   Global forwarders: 10.21.0.14
>   Allow PTR sync: TRUE
> 
> attempt ping
> ------------
>   -bash-4.2$ ping stash.externaldomain.net
> ping: unknown host stash.externaldomain.net
> 
> -attempt nslookup
> -----------------
> -bash-4.2$ nslookup
>> stash.externaldomain.net
> Server:         127.0.0.1
> Address:        127.0.0.1#53
> 
> ** server can't find stash.externaldomain.net: NXDOMAIN
> 
> *comment* : strange it doesn't work against localhost.  Lets make sure
> that localhost lookups work at all :
> 
> -bash-4.2$ nslookup
>> google.com
> Server:         127.0.0.1
> Address:        127.0.0.1#53
> 
> Non-authoritative answer:
> Name:   google.com
> Address: 216.58.216.142
> 
> *comment* : yup, I can resolve google.com when talking to localhost...
> 
> now lets try to talk to the forwarder configured in the global settings
> -----------------------------------------------------------------------
>> server 10.21.0.14
> Default server: 10.21.0.14
> Address: 10.21.0.14#53
>> stash.externaldomain.net
> Server:         10.21.0.14
> Address:        10.21.0.14#53
> 
> Non-authoritative answer:
> stash.externaldomain.net   canonical name = git1.externaldomain.net.
> Name:   git1.externaldomain.net
> Address: 10.20.10.30
> 
> more troubleshooting
> --------------------
> I ran wireshark to see what the freeipa server was sending back to the
> client :
> 
> 3	0.000393	10.178.0.99	10.178.21.2	DNS	163	Standard query response 0x12ca
> No such name CNAME git1.externaldomain.net
> 
> I've never seen a 'no such CNAME' response before.  Lets look at the
> contents of the packet:
> 
> 
> Frame 4: 163 bytes on wire (1304 bits), 163 bytes captured (1304 bits)
> Ethernet II, Src: Vmware_b7:09:c6 (00:50:56:b7:09:c6), Dst:
> HewlettP_3c:f9:48 (2c:59:e5:3c:f9:48)
> Internet Protocol Version 4, Src: 10.178.0.99 (10.178.0.99), Dst:
> 10.178.21.2 (10.178.21.2)
> User Datagram Protocol, Src Port: 53 (53), Dst Port: 57374 (57374)
> Domain Name System (response)
>     [Request In: 1]
>     [Time: 0.000414000 seconds]
>     Transaction ID: 0x12ca
>     Flags: 0x8183 Standard query response, No such name
>         1... .... .... .... = Response: Message is a response
>         .000 0... .... .... = Opcode: Standard query (0)
>         .... .0.. .... .... = Authoritative: Server is not an authority
> for domain
>         .... ..0. .... .... = Truncated: Message is not truncated
>         .... ...1 .... .... = Recursion desired: Do query recursively
>         .... .... 1... .... = Recursion available: Server can do recursive
> queries
>         .... .... .0.. .... = Z: reserved (0)
>         .... .... ..0. .... = Answer authenticated: Answer/authority
> portion was not authenticated by the server
>         .... .... ...0 .... = Non-authenticated data: Unacceptable
>         .... .... .... 0011 = Reply code: No such name (3)
>     Questions: 1
>     Answer RRs: 1
>     Authority RRs: 1
>     Additional RRs: 0
>     Queries
>         stash.externaldomain.net: type A, class IN
>             Name: stash.externaldomain.net
>             [Name Length: 21]
>             [Label Count: 3]
>             Type: A (Host Address) (1)
>             Class: IN (0x0001)
>     Answers
>         stash.externaldomain.net: type CNAME, class IN, cname
> git1.externaldomain.net
>             Name: stash.externaldomain.net
>             Type: CNAME (Canonical NAME for an alias) (5)
>             Class: IN (0x0001)
>             Time to live: 22483
>             Data length: 20
>             CNAME: git1.externaldomain.net
>     Authoritative nameservers
>         externaldomain.net: type SOA, class IN, mname
> van-dns1.externaldomain.net
>             Name: externaldomain.net
>             Type: SOA (Start Of a zone of Authority) (6)
>             Class: IN (0x0001)
>             Time to live: 518
>             Data length: 38
>             Primary name server: van-dns1.externaldomain.net
>             Responsible authority's mailbox: tech.externaldomain.net
>             Serial Number: 2015092101
>             Refresh Interval: 10800 (3 hours)
>             Retry Interval: 900 (15 minutes)
>             Expire limit: 604800 (7 days)
>             Minimum TTL: 86400 (1 day)
> 
> 
>> We have a FreeIPA domain running IPA server 4.1.4 on CentOS 7.
>>
>> We have no per zone forwarding enabled, only a single global forwarder.
>> This seems to work fine, but then after a while (several weeks I think)
>> will randomly stop working.
>>
>> We had this issue several weeks ago on a different IPA domain (identical
>> setup) in our production network but it was ignored because a server
>> restart fixed it.
>>
>> This issue then re-surfaced in our development domain today (different
>> network, different physical hardware, same OS and IPA versions).
>>
>> I received a report today from a developer that he could not ping a
>> machine in another domain so I verified network connectivity and
>> everything was fine.  When I tried to resolve the name from the IPA dc
>> using ping it would fail, but nslookup directly to the forward server
>> worked fine.
>>
>> ipactl showed no issues, and only after I restarted the server did the
>> lookups start working again.
>>
>> Console log below :
>>
>> Using username "myipausername".
>> Last login: Thu Oct  1 16:36:51 2015 from 10.5.5.57
>> [myipausername at dc1 ~]$ sudo su -
>> Last login: Tue Sep 29 19:03:39 UTC 2015 on pts/3
>>
>> ATTEMPT FIRST PING TO UNRESOLVABLE HOST
>> =======================================
>> [root at dc1 ~]# ping artifactory.externaldomain.net
>> ping: unknown host artifactory.externaldomain.net
>>
>> CHECK IPA STATUS
>> ================
>> [root at dc1 ~]# ipactl status
>> Directory Service: RUNNING
>> krb5kdc Service: RUNNING
>> kadmin Service: RUNNING
>> named Service: RUNNING
>> ipa_memcached Service: RUNNING
>> httpd Service: RUNNING
>> pki-tomcatd Service: RUNNING
>> smb Service: RUNNING
>> winbind Service: RUNNING
>> ipa-otpd Service: RUNNING
>> ipa-dnskeysyncd Service: RUNNING
>> ipa: INFO: The ipactl command was successful
>>
>> ATTEMPT PING OF GLOBAL FORWARDER
>> ================================
>> [root at dc1 ~]# ping 10.21.0.14
>> PING 10.21.0.14 (10.21.0.14) 56(84) bytes of data.
>> 64 bytes from 10.21.0.14: icmp_seq=1 ttl=64 time=0.275 ms
>> 64 bytes from 10.21.0.14: icmp_seq=2 ttl=64 time=0.327 ms
>> ^C
>> --- 10.21.0.14 ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
>> rtt min/avg/max/mdev = 0.275/0.301/0.327/0.026 ms
>>
>> MANUAL NSLOOKUP OF DOMAIN ON GLOBAL FORWARDER FROM IPA DC
>> =========================================================
>> [root at dc1 ~]# nslookup
>>> server 10.21.0.14
>> Default server: 10.21.0.14
>> Address: 10.21.0.14#53
>>> artifactory.externaldomain.net
>> Server:         10.21.0.14
>> Address:        10.21.0.14#53
>>
>> Non-authoritative answer:
>> artifactory.externaldomain.net     canonical name =
>> van-artifactory1.externaldomain.net.
>> Name:   van-artifactory1.externaldomain.net
>> Address: 10.20.10.14
>>
>> RE-ATTEMPT PING SINCE WE KNOW THAT NAME RESOLUTION (at least via nslookup
>> IS WORKING FROM THIS MACHINE
>> ======================================================================================================
>>> ^C[root at dc1 ~]# ping artifactory.externaldomain.net
>> ping: unknown host artifactory.externaldomain.net
>> [root at dc1 ~]# ping van-artifactory1.externaldomain.net
>> ping: unknown host van-artifactory1.externaldomain.net
>>
>> RESTART IPA SERVICES
>> ====================
>> [root at dc1 ~]# ipactl restart
>> Restarting Directory Service
>> Restarting krb5kdc Service
>> Restarting kadmin Service
>> Restarting named Service
>> Restarting ipa_memcached Service
>> Restarting httpd Service
>> Restarting pki-tomcatd Service
>> Restarting smb Service
>> Restarting winbind Service
>> Restarting ipa-otpd Service
>> Restarting ipa-dnskeysyncd Service
>> ipa: INFO: The ipactl command was successful
>> [root at dc1 ~]# ipa dnsconfig-show
>> ipa: ERROR: did not receive Kerberos credentials
>> [root at dc1 ~]# kinit myipausername
>> Password for myipausername at ipadomain.NET:
>>
>> OUTPUT GLOBAL FORWARDER CONFIG FOR TROUBLESHOOTING
>> ==================================================
>> [root at dc1 ~]# ipa dnsconfig-show
>>   Global forwarders: 10.21.0.14
>>   Allow PTR sync: TRUE
>>
>> PING NOW WORKS BECAUSE IPA SERVICES WERE RESTARTED
>> ==================================================
>> [root at dc1 ~]# ping artifactory.externaldomain.net
>> PING van-artifactory1.externaldomain.net (10.20.10.14) 56(84) bytes of
>> data.
>> 64 bytes from 10.20.10.14: icmp_seq=1 ttl=60 time=3.00 ms
>> 64 bytes from 10.20.10.14: icmp_seq=2 ttl=60 time=1.42 ms
>> 64 bytes from 10.20.10.14: icmp_seq=3 ttl=60 time=2.39 ms
>> ^C
>> --- van-artifactory1.externaldomain.net ping statistics ---
>> 3 packets transmitted, 3 received, 0% packet loss, time 2004ms
>> rtt min/avg/max/mdev = 1.420/2.274/3.004/0.653 ms
>> [root at dc1 ~]#
>>
>> Here are some strange enties from my /var/log/messages relating to errors
>> from today :
>>
>> Oct  1 20:39:31 dc1 named-pkcs11[15066]: checkhints: unable to get root NS
>> rrset from cache: not found
>> Oct  1 20:39:17 dc1 named-pkcs11[15066]: error (network unreachable)
>> resolving 'pmdb1.ipadomain.net/A/IN': 2001:500:2f::f#53
>> Oct  1 20:39:17 dc1 named-pkcs11[15066]: error (network unreachable)
>> resolving 'pmdb1.ipadomain.net/AAAA/IN': 2001:500:2f::f#53
>>
>> Looking at the log entries, it appears that there may have been a network
>> connectivity 'blip' (maybe a switch or router was restarted) at some point
>> and even after connectivity was restored, the global forwarding was
>> failing because the "we can't contact our forwarder" status seemed to get
>> stuck in memory.

Most likely.

>> [root at dc1 ~]# ipa dnsconfig-show
>>   Global forwarders: 10.21.0.14
>>   Allow PTR sync: TRUE

This means that you are using the default forward policy which is 'first'.
I.e. BIND daemon on the IPA server is trying to use the forwarder first and
when it fails it fallbacks to asking server on the public Internet.

I speculate that public servers know nothing about the name you were asking
for and this negative answer got cached. This is default behavior in BIND and
IPA did not change it.

Workaround for network problems could be
$ ipa dnsconfig-mod --forward-policy=only
which will prevent BIND from falling back to public servers.

Anyway, you should solve network connectivity problems, too :-)

I hope this helps.

-- 
Petr^2 Spacek




More information about the Freeipa-users mailing list