[Freeipa-devel] [PATCH 0291-0294] Fix locking to prevent crashes and deadlocks

Martin Basti mbasti at redhat.com
Fri Sep 12 12:45:51 UTC 2014


On 11/09/14 21:58, Petr Spacek wrote:
> On 11.9.2014 18:34, Martin Basti wrote:
>> On 11/09/14 15:57, Martin Basti wrote:
>>> On 11/09/14 11:59, Petr Spacek wrote:
>>>> Hello,
>>>>
>>>> I was fighting with random crashes for couple of days ... and 
>>>> discovered
>>>> that run_exclusive_enter()/isc_task_beginexclusive() usage was 
>>>> completely
>>>> incorrect and didn't actually lock anything.
>>>>
>>>> This series of patches reworks internal locking (and related event 
>>>> system)
>>>> to work around limitations of isc_task_beginexclusive() mechanism.
>>>>
>>>> It would be better to get rid of isc_task_beginexclusive() 
>>>> completely but
>>>> IMHO it is not possible because of BIND's dns_view*() functions 
>>>> have to be
>>>> guarded with it.
>>>>
>>>>
>>>> Testing is going to be interesting because we are speaking about race
>>>> conditions.
>>>>
>>>> I used ~ 100 DNS zones, each zone had ~ 100 random domain names 
>>>> inside with
>>>> random A/AAAA/TXT RRs. My LDIF is here:
>>>> http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz
>>>>
>>>> I was able to randomly reproduce various crashes when BIND was 
>>>> running with
>>>> more threads than usually.
>>>>
>>>> You can try to run BIND with this command (as root) and play games 
>>>> with -n
>>>> parameter:
>>>> $ export KRB5_KTNAME="/etc/named.keytab"
>>>> $ named -4 -g -u named -m record -n 10
>>>>
>>>> Please test also the case where BIND receives SIGINT during 
>>>> start-up. It is
>>>> possible to run BIND with commands above and wait for message:
>>>> 11-Sep-2014 11:54:58.092 running
>>>>
>>>> At this point send SIGINT (CTRL+C) to BIND and see what happens. It 
>>>> could
>>>> crash or deadlock.
>>>>
>>>> It is necessary to send the signal before BIND prints this message:
>>>> 11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304
>>>>
>>>> Let me know if you need any assistance.
>>>>
>>> I need your assistance, I haven't been able to reproduce it.
>>>
>>> Martin
>>>
>> I applied the patchset, and NACK
>
> I don't understand how I could possibly miss this. I was convinced 
> that the patch set was thoroughly tested ...
>
> Anyway, attached patch should fix the problem you were facing. Please 
> re-test it.
>
> Thank you!
>
> Petr^2 Spacek
>
>> #1
>> If named is running and I randomly choose few zones and delete them, 
>> it causes
>> named failure
>>
>> dig @localhost A r1.z12.test
>>
>> ; <<>> DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20 <<>> @localhost A 
>> r1.z12.test
>> ; (2 servers found)
>> ;; global options: +cmd
>> ;; connection timed out; no servers could be reached
>>
>> * SIGINT doesn't work
>> * rndc doesn't work
>> * DS worksSIGINT signal stops working.
>>
>> Output:
>> <snip>
>> 11-Sep-2014 11:26:37.495 client 127.0.0.1#62615: received notify for 
>> zone
>> 'z99.test'
>>
>> ^C^C^C^C^C^C^C^C
>>
>>
>> Process:
>> named    29125  1.1  2.9 789972 45976 pts/0    Sl+  11:26   0:02 
>> named -4 -g
>> -u named -m record -n 10
>>
>> I have to kill it with kill -9
>>
>> #2
>> same as #1 If new zone is added,
>>
>> #3
>> same as #1 If new record is added
>>
>> #4
>> same as #1 If record is deleted
Functional ACK

-- 
Martin Basti




More information about the Freeipa-devel mailing list