[Freeipa-devel] [PATCH 0291-0294] Fix locking to prevent crashes and deadlocks

Petr Spacek pspacek at redhat.com
Thu Sep 11 19:58:17 UTC 2014


On 11.9.2014 18:34, Martin Basti wrote:
> On 11/09/14 15:57, Martin Basti wrote:
>> On 11/09/14 11:59, Petr Spacek wrote:
>>> Hello,
>>>
>>> I was fighting with random crashes for couple of days ... and discovered
>>> that run_exclusive_enter()/isc_task_beginexclusive() usage was completely
>>> incorrect and didn't actually lock anything.
>>>
>>> This series of patches reworks internal locking (and related event system)
>>> to work around limitations of isc_task_beginexclusive() mechanism.
>>>
>>> It would be better to get rid of isc_task_beginexclusive() completely but
>>> IMHO it is not possible because of BIND's dns_view*() functions have to be
>>> guarded with it.
>>>
>>>
>>> Testing is going to be interesting because we are speaking about race
>>> conditions.
>>>
>>> I used ~ 100 DNS zones, each zone had ~ 100 random domain names inside with
>>> random A/AAAA/TXT RRs. My LDIF is here:
>>> http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz
>>>
>>> I was able to randomly reproduce various crashes when BIND was running with
>>> more threads than usually.
>>>
>>> You can try to run BIND with this command (as root) and play games with -n
>>> parameter:
>>> $ export KRB5_KTNAME="/etc/named.keytab"
>>> $ named -4 -g -u named -m record -n 10
>>>
>>> Please test also the case where BIND receives SIGINT during start-up. It is
>>> possible to run BIND with commands above and wait for message:
>>> 11-Sep-2014 11:54:58.092 running
>>>
>>> At this point send SIGINT (CTRL+C) to BIND and see what happens. It could
>>> crash or deadlock.
>>>
>>> It is necessary to send the signal before BIND prints this message:
>>> 11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304
>>>
>>> Let me know if you need any assistance.
>>>
>> I need your assistance, I haven't been able to reproduce it.
>>
>> Martin
>>
> I applied the patchset, and NACK

I don't understand how I could possibly miss this. I was convinced that the 
patch set was thoroughly tested ...

Anyway, attached patch should fix the problem you were facing. Please re-test it.

Thank you!

Petr^2 Spacek

> #1
> If named is running and I randomly choose few zones and delete them, it causes
> named failure
>
> dig @localhost A r1.z12.test
>
> ; <<>> DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20 <<>> @localhost A r1.z12.test
> ; (2 servers found)
> ;; global options: +cmd
> ;; connection timed out; no servers could be reached
>
> * SIGINT doesn't work
> * rndc doesn't work
> * DS worksSIGINT signal stops working.
>
> Output:
> <snip>
> 11-Sep-2014 11:26:37.495 client 127.0.0.1#62615: received notify for zone
> 'z99.test'
>
> ^C^C^C^C^C^C^C^C
>
>
> Process:
> named    29125  1.1  2.9 789972 45976 pts/0    Sl+  11:26   0:02 named -4 -g
> -u named -m record -n 10
>
> I have to kill it with kill -9
>
> #2
> same as #1 If new zone is added,
>
> #3
> same as #1 If new record is added
>
> #4
> same as #1 If record is deleted
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bind-dyndb-ldap-pspacek-0291-2-Rework-locking-in-settings.c-module.patch
Type: text/x-patch
Size: 13277 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20140911/24ef78c4/attachment.bin>


More information about the Freeipa-devel mailing list