[libvirt] Two core dumps are generated in multi-thread scenarios

Matthias Bolte matthias.bolte at googlemail.com
Sun Oct 7 10:03:22 UTC 2012


Hi,

I pushed the proposed fix of setting CURLOPT_NOSIGNAL to 1. This
effectively makes libcurl lose its timeout ability for synchronous DNS
lookups. Asynchronous DNS lookups via the c-ares library are not
effected.

You backtrace shows a timeout of a synchronous DNS lookup, I think
(see the Curl_resolv_timeout Curl_failf call sequence). This is how
you found the problem. But setting CURLOPT_NOSIGNAL to 1 makes libcurl
lose its timeout ability for synchronous DNS lookups and a call to
Curl_resolv_timeout can now take longer than a given timeout or might
never return at all.

So we're are replacing a possible segfault with with a possibly DNS
lookup that takes too long or never returns.

Regards,
Matthias

2012/10/2 Benjamin Wang (gendwang) <gendwang at cisco.com>:
> Hi Matthias,
>    This can't be reproduced 100%. I reproduce this case twice. But when I set the CURLOPT_NOSIGNAL to 1. I didn't find the similar
> core again. And it seems that everything works well. What do you mean " stuck in a DNS lookup"?
>
> B.R.
> Benjamin Wang
>
> -----Original Message-----
> From: Matthias Bolte [mailto:matthias.bolte at googlemail.com]
> Sent: 2012年9月30日 4:20
> To: Benjamin Wang (gendwang)
> Cc: libvir-list at redhat.com; Yang Zhou (yangzho)
> Subject: Re: Two core dumps are generated in multi-thread scenarios
>
> 2012/9/23 Benjamin Wang (gendwang) <gendwang at cisco.com>:
>> Hi,
>>   I found two core dumps generated in multi-thread scenarios in ESX part.
>>
>> Case1: libcurl support multi-thread
>> core dump:
>> #12 0x00002aaabea89712 in addbyter () from /usr/local/lib/libcurl.so.4
>> #13 0x00002aaabea89b86 in dprintf_formatf () from
>> /usr/local/lib/libcurl.so.4
>> #14 0x00002aaabea8b055 in curl_mvsnprintf () from
>> /usr/local/lib/libcurl.so.4
>> #15 0x00002aaabea7678f in Curl_failf () from
>> /usr/local/lib/libcurl.so.4
>> #16 0x00002aaabea6d871 in Curl_resolv_timeout () from
>> /usr/local/lib/libcurl.so.4
>> #17 0x00000006e8a8f230 in ?? ()
>>
>> Fix code:
>> esxVI_CURL_Connect() in esx_vi.c:
>> I add a new line as following:
>> curl_easy_setopt(curl->handle, CURLOPT_NOSIGNAL, 1);
>
> It took me a moment reading libcurl code until I figured out what might be happening here. The problem is that Curl_resolv_timeout uses SIGALRM + sigsetjmp/siglongjmp to realize the timeout logic. This implementation is not thread-safe as the SIGALRM might be executed on a different thread than the original thread that started the call to Curl_resolv_timeout. This in turn results in the call to Curl_resolv_timeout being continued via siglongjmp (called from the SIGALRM handler) on different thread. Setting CURLOPT_NOSIGNAL to 1 makes libcurl avoid the SIGALRM + sigsetjmp/siglongjmp implementation.
> This solves the problem but with the cost of losing the timeout capability.
>
> In your case a DNS lookup took longer than libcurl was willing to wait and a timeout aborted it. But the call to Curl_failf (as part of the timeout error handling) was made on the wrong thread (I think) making it segfault. IMHO there is no ideal solution here, because with CURLOPT_NOSIGNAL set to 0 (the default) libcurl can realize DNS lookup with timeout, but the error handling might occur on the wrong thread.
> But with CURLOPT_NOSIGNAL set to 1 the segfault is avoided but libcurl might get stuck in a DNS lookup.
>
> Are you able to reproduce this problem and can you confirm that setting CURLOPT_NOSIGNAL to 1 fixes it?
>
> --
> Matthias Bolte
> http://photron.blogspot.com

-- 
Matthias Bolte
http://photron.blogspot.com




More information about the libvir-list mailing list