[libvirt] [PATCH 11/14] rpc: virnetserver: Fix race on srv->nclients_unauth

Thu Dec 21 12:57:30 UTC 2017

On Thu, Dec 21, 2017 at 01:18 PM +0100, John Ferlan <jferlan at redhat.com> wrote:
> On 12/21/2017 06:29 AM, Marc Hartmayer wrote:
>> On Fri, Dec 15, 2017 at 07:16 PM +0100, John Ferlan <jferlan at redhat.com> wrote:
>>> On 12/12/2017 06:36 AM, Marc Hartmayer wrote:
>>>> There is a race between virNetServerProcessClients (main thread) and
>>>> remoteDispatchAuthList/remoteDispatchAuthPolkit/remoteSASLFinish (worker
>>>> thread) that can lead to decrementing srv->nclients_unauth when it's
>>>> zero. Since virNetServerCheckLimits relies on the value
>>>> srv->nclients_unauth the underrun causes libvirtd to stop accepting
>>>> new connections forever.
>>>>
>>>> Example race scenario (assuming libvirtd is using policykit and the
>>>> client is privileged):
>>>>   1. The client calls the RPC remoteDispatchAuthList =>
>>>>      remoteDispatchAuthList is executed on a worker thread (Thread
>>>>      T1). We're assuming now the execution stops for some time before
>>>>      the line 'virNetServerClientSetAuth(client, 0)'
>>>>   2. The client closes the connection irregularly. This causes the
>>>>      event loop to wake up and virNetServerProcessClient to be
>>>>      called (on the main thread T0). During the
>>>>      virNetServerProcessClients the srv lock is hold. The condition
>>>>      virNetServerClientNeedAuth(client) will be checked and as the
>>>>      authentication is not finished right now
>>>>      virNetServerTrackCompletedAuthLocked(srv) will be called =>
>>>>      --srv->nclients_unauth => 0
>>>>   3. The Thread T1 continues, marks the client as authenticated, and
>>>>      calls virNetServerTrackCompletedAuthLocked(srv) =>
>>>>      --srv->nclients_unauth => --0 => wrap around as nclient_unauth is
>>>>      unsigned
>>>>   4. virNetServerCheckLimits(srv) will disable the services forever
>>>>
>>>> To fix it, add an auth_pending field to the client struct so that it
>>>> is now possible to determine if the authentication process has already
>>>> been handled for this client.
>>>>
>>>> Setting the authentication method to none for the client in
>>>> virNetServerProcessClients is not a proper way to indicate that the
>>>> counter has been decremented, as this would imply that the client is
>>>> authenticated.
>>>>
>>>> Additionally, adjust the existing test cases for this new field.
>>>>
>>>
>>> In any case, I think perhaps this can be split into two patches - one
>>> that just sets/clears the auth_pending and then what's leftover as the
>>> fix.
>>
>> I’m not sure it’s necessary as the correct usage of auth_pending is
>> already the fix. But let’s see then.
>>
>
> OK. I understand. I assume by this point I was juggling in my head a few
> of the various adjustments so I figured that by adding the flag and
> separating the logic between just the set/clear flag and the reason it
> was added would be easier to review, but then again that would make the
> set/clear patch essentially meaningless.
>
>>>
>>>> diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms
>>>> index 7fcae970484d..cef2794e1122 100644
>>>> --- a/src/libvirt_remote.syms
>>>> +++ b/src/libvirt_remote.syms
>>>> @@ -138,6 +138,7 @@ virNetServerClientGetUNIXIdentity;
>>>>  virNetServerClientImmediateClose;

[…snip…]

>>>> @@ -375,6 +376,7 @@ static virNetServerClientPtr
>>>>  virNetServerClientNewInternal(unsigned long long id,
>>>>                                virNetSocketPtr sock,
>>>>                                int auth,
>>>> +                              bool auth_pending,
>>>>  #ifdef WITH_GNUTLS
>>>>                                virNetTLSContextPtr tls,
>>>>  #endif
>>>> @@ -384,6 +386,12 @@ virNetServerClientNewInternal(unsigned long long id,
>>>>  {
>>>>      virNetServerClientPtr client;
>>>>
>>>> +    /* If the used authentication method implies that the new client
>>>> +     * is automatically authenticated, the authentication cannot be
>>>> +     * pending */
>>>> +    if (auth_pending && virNetServerClientAuthMethodImpliesAuthenticated(auth))
>>>                            ^^^^^^
>>> Why check twice? I'm not clear on the need for this check.  The setting
>>> of auth_pending is based on @auth in the caller or perhaps the value of
>>> auth_pending already. If there's some sort of check to be made perhaps
>>> it's in ExecRestart.
>>
>> Normally I would prefer an assert here… The intent behind this guard was
>> that it should prevent the wrong usage of this function.
>>
>> But I’ll move the check.
>>
>
> I think this is perhaps why I made the separate the patches comment. I
> was starting to overflow my heap.
>
>>>
>>> Also, if we returned NULL here without some sort of error, then couldn't
>>> we get the libvirt failed for some reason error message which is always
>>> not fun to track down.
>>
>> Does it makes sense to leave the check here (and additionally add it to
>> ExecRestart) and throw an error message here?
>>
>
> Adding a check, a message, and having a NULL return to guard against
> some future change passing the wrong parameters perhaps means that
> future patch submitter (and eventual reviewer) didn't clearly understand
> the purpose of the function...

Yep. That’s why I like asserts… (for functions used internally
only). You can 'tell' other developers what are the
preconditions/postconditions and help them to understand the code
flow. Sorry for the off topic…

Short answer, I’ll move the check.

>
> Still this is a *NewInternal function - I would hope passing correct
> arguments wouldn't be problem for it.

Me too.

>
>>>
>>>> +        return NULL;
>>>> +
>>>>      if (virNetServerClientInitialize() < 0)
>>>>          return NULL;
>>>>
>>>> @@ -393,6 +401,7 @@ virNetServerClientNewInternal(unsigned long long id,
>>>>      client->id = id;
>>>>      client->sock = virObjectRef(sock);
>>>>      client->auth = auth;
>>>> +    client->auth_pending = auth_pending;
>>>>      client->readonly = readonly;
>>>>  #ifdef WITH_GNUTLS
>>>>      client->tlsCtxt = virObjectRef(tls);
>>>> @@ -440,6 +449,7 @@ virNetServerClientPtr virNetServerClientNew(unsigned long long id,
>>>>  {
>>>>      virNetServerClientPtr client;
>>>>      time_t now;
>>>> +    bool auth_pending = !virNetServerClientAuthMethodImpliesAuthenticated(auth);
>>>
>>> I think it's much easier for readability to have result be auth !=
>>> VIR_NET_SERVER_SERVICE_AUTH_NONE instead of this call... Perhaps even
>>> easier to find all instances of AUTH_NONE.
>>
>> This function/condition is used in four places. But sure, if you’re
>> still saying it’s easier to read and to understand that a user is
>> authenticated (and therefore the authentication is no longer pending) if
>> auth_method == VIR_NET_SERVER_SERVICE_AUTH_NONE then I’ll replace this
>> function at all places :)
>>
>
> By this time I wasn't sure what was going to happen from Dan's patch 8
> comment and the virNetServerClientNeedAuth[Locked] functions vs. the
> name chosen (ImpliesAuthenticated) that was AFAICT trying to convey what
> you discovered.
>
> Whether the comparison is hidden behind some call or a direct comparison
> I guess just becomes an implementation detail. I use cscope a lot to
> find things, so searching on NONE is perhaps easier that searching on
> NONE, finding some API that returning true/false based off of NONE, and
> then searching on callers for that too.  OTOH, it's cleaner to have a
> helper <sigh>.
>
> Shorter answer, your call. If you want the helper, that's fine.

I’ll think about it.

[…snip]

--
Beste Grüße / Kind regards
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294