[Freeipa-devel] [PATCH 0291-0294] Fix locking to prevent crashes and deadlocks
Martin Basti
mbasti at redhat.com
Thu Sep 11 16:34:44 UTC 2014
On 11/09/14 15:57, Martin Basti wrote:
> On 11/09/14 11:59, Petr Spacek wrote:
>> Hello,
>>
>> I was fighting with random crashes for couple of days ... and
>> discovered that run_exclusive_enter()/isc_task_beginexclusive() usage
>> was completely incorrect and didn't actually lock anything.
>>
>> This series of patches reworks internal locking (and related event
>> system) to work around limitations of isc_task_beginexclusive()
>> mechanism.
>>
>> It would be better to get rid of isc_task_beginexclusive() completely
>> but IMHO it is not possible because of BIND's dns_view*() functions
>> have to be guarded with it.
>>
>>
>> Testing is going to be interesting because we are speaking about race
>> conditions.
>>
>> I used ~ 100 DNS zones, each zone had ~ 100 random domain names
>> inside with random A/AAAA/TXT RRs. My LDIF is here:
>> http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz
>>
>> I was able to randomly reproduce various crashes when BIND was
>> running with more threads than usually.
>>
>> You can try to run BIND with this command (as root) and play games
>> with -n parameter:
>> $ export KRB5_KTNAME="/etc/named.keytab"
>> $ named -4 -g -u named -m record -n 10
>>
>> Please test also the case where BIND receives SIGINT during start-up.
>> It is possible to run BIND with commands above and wait for message:
>> 11-Sep-2014 11:54:58.092 running
>>
>> At this point send SIGINT (CTRL+C) to BIND and see what happens. It
>> could crash or deadlock.
>>
>> It is necessary to send the signal before BIND prints this message:
>> 11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304
>>
>> Let me know if you need any assistance.
>>
> I need your assistance, I haven't been able to reproduce it.
>
> Martin
>
I applied the patchset, and NACK
#1
If named is running and I randomly choose few zones and delete them, it
causes named failure
dig @localhost A r1.z12.test
; <<>> DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20 <<>> @localhost A r1.z12.test
; (2 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached
* SIGINT doesn't work
* rndc doesn't work
* DS worksSIGINT signal stops working.
Output:
<snip>
11-Sep-2014 11:26:37.495 client 127.0.0.1#62615: received notify for
zone 'z99.test'
^C^C^C^C^C^C^C^C
Process:
named 29125 1.1 2.9 789972 45976 pts/0 Sl+ 11:26 0:02 named
-4 -g -u named -m record -n 10
I have to kill it with kill -9
#2
same as #1 If new zone is added,
#3
same as #1 If new record is added
#4
same as #1 If record is deleted
--
Martin Basti
More information about the Freeipa-devel
mailing list