[Freeipa-devel] [PATCH 0291-0294] Fix locking to prevent crashes and deadlocks

Martin Basti mbasti at redhat.com
Thu Sep 11 16:34:44 UTC 2014


On 11/09/14 15:57, Martin Basti wrote:
> On 11/09/14 11:59, Petr Spacek wrote:
>> Hello,
>>
>> I was fighting with random crashes for couple of days ... and 
>> discovered that run_exclusive_enter()/isc_task_beginexclusive() usage 
>> was completely incorrect and didn't actually lock anything.
>>
>> This series of patches reworks internal locking (and related event 
>> system) to work around limitations of isc_task_beginexclusive() 
>> mechanism.
>>
>> It would be better to get rid of isc_task_beginexclusive() completely 
>> but IMHO it is not possible because of BIND's dns_view*() functions 
>> have to be guarded with it.
>>
>>
>> Testing is going to be interesting because we are speaking about race 
>> conditions.
>>
>> I used ~ 100 DNS zones, each zone had ~ 100 random domain names 
>> inside with random A/AAAA/TXT RRs. My LDIF is here:
>> http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz
>>
>> I was able to randomly reproduce various crashes when BIND was 
>> running with more threads than usually.
>>
>> You can try to run BIND with this command (as root) and play games 
>> with -n parameter:
>> $ export KRB5_KTNAME="/etc/named.keytab"
>> $ named -4 -g -u named -m record -n 10
>>
>> Please test also the case where BIND receives SIGINT during start-up. 
>> It is possible to run BIND with commands above and wait for message:
>> 11-Sep-2014 11:54:58.092 running
>>
>> At this point send SIGINT (CTRL+C) to BIND and see what happens. It 
>> could crash or deadlock.
>>
>> It is necessary to send the signal before BIND prints this message:
>> 11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304
>>
>> Let me know if you need any assistance.
>>
> I need your assistance, I haven't been able to reproduce it.
>
> Martin
>
I applied the patchset, and NACK


#1
If named is running and I randomly choose few zones and delete them, it 
causes named failure

dig @localhost A r1.z12.test

; <<>> DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20 <<>> @localhost A r1.z12.test
; (2 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached

* SIGINT doesn't work
* rndc doesn't work
* DS worksSIGINT signal stops working.

Output:
<snip>
11-Sep-2014 11:26:37.495 client 127.0.0.1#62615: received notify for 
zone 'z99.test'

^C^C^C^C^C^C^C^C


Process:
named    29125  1.1  2.9 789972 45976 pts/0    Sl+  11:26   0:02 named 
-4 -g -u named -m record -n 10

I have to kill it with kill -9

#2
same as #1 If new zone is added,

#3
same as #1 If new record is added

#4
same as #1 If record is deleted

-- 
Martin Basti




More information about the Freeipa-devel mailing list