[libvirt] [PATCH] Fix error report from nl_recvmsg

Thu Feb 28 16:59:42 UTC 2013

On 02/28/2013 11:16 AM, Daniel P. Berrange wrote:
> On Thu, Feb 28, 2013 at 11:11:53AM -0500, Laine Stump wrote:
>> On 02/28/2013 08:37 AM, Daniel P. Berrange wrote:
>>> From: "Daniel P. Berrange" <berrange at redhat.com>
>>>
>>> The nl_recvmsg does not always set errno. Instead it returns
>>> its own custom set of error codes. Thus we were reporting the
>>> wrong data.
>>> ---
>>>  src/util/virnetlink.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c
>>> index 0b36fdc..8b47ede 100644
>>> --- a/src/util/virnetlink.c
>>> +++ b/src/util/virnetlink.c
>>> @@ -335,8 +335,9 @@ virNetlinkEventCallback(int watch,
>>>      if (length == 0)
>>>          return;
>>>      if (length < 0) {
>>> -        virReportSystemError(errno,
>>> -                             "%s", _("nl_recv returned with error"));
>>> +        virReportError(VIR_ERR_INTERNAL_ERROR,
>>> +                       _("nl_recv returned with error: %s"),
>>> +                       nl_geterror(length));
>> My recollection is that we specifically avoided calling nl_geterror()
>> because it isn't threadsafe.
>>
>> I'll go take another look to verify.
> I did check this, but only for libnl3 which merely does a static
> string table lookup:
>
> const char *nl_geterror(int error)
> {
>         error = abs(error);
>
>         if (error > NLE_MAX)
>                 error = NLE_FAILURE;
>
>         return errmsg[error];
> }

nl_geterror() in libnl-1.1 calls strerror() which isn't threadsafe:

  char *nl_geterror(void)
  {
      if (errbuf)
          return errbuf;

      if (nlerrno)
          return strerror(nlerrno);

      return "Sucess\n";
  }

Of course strerror is only called from here if errbuf hasn't been set,
and I *think* that is rare, and we added a patch to all Fedora/RHEL
builds of libnl-1.1 that put errbuf and nlerrno into thread-local
storage quite a long time ago (August 2010:
https://bugzilla.redhat.com/show_bug.cgi?id=617291 ).

*But* the function that sets errbuf, __nl_error(), itself calls
strerror(). And when I look at strerror() in glibc, I'm not exactly
getting warm fuzzies about what could happen if it was called
simultaneously from two threads. It mallocs a global char* (after
freeing the original) then uses strerror_r to duplicate the standard
system error string into the malloc'ed buffer. So it's possible for one
thread to be dereferencing a pointer to a buffer that has already been
free'd by another thread.

Of course that's all academic, since __nl_error() is called when an
error occurs anyway, regardless of whether you subsequently decide to
call nl_geterror() or not.

So the conclusion is that I see no extra harm in calling nl_geterror(). ACK.