[libvirt] [PATCH 8/8] secret: Handle object list removal and deletion properly

Fri Jul 14 12:48:06 UTC 2017

On 07/14/2017 04:15 AM, Michal Privoznik wrote:
> On 07/13/2017 07:19 PM, John Ferlan wrote:
>>
>>
>> On 07/11/2017 11:52 AM, Michal Privoznik wrote:
>>> On 06/03/2017 03:27 PM, John Ferlan wrote:
>>>> Rather than rely on virSecretObjEndAPI to make the final virObjectUnref
>>>> after the call virSecretObjListRemove, be more explicit by calling
>>>> virObjectUnref and setting @obj to NULL for secretUndefine and in
>>>> the error path of secretDefineXML.
>>>>
>>>> This also fixes a leak during virSecretLoad if the virSecretLoadValue
>>>> fails the code jumps to cleanup without setting @ret = obj, thus calling
>>>> virSecretObjListRemove which only accounts for the object reference
>>>> related to adding the object to the list during virSecretObjListAdd,
>>>> but does not account for the reference to the object itself as the
>>>> return of @ret would be NULL so the caller wouldn't call virSecretObjEndAPI
>>>> on the object recently added thus reducing the refcnt to zero. Thus
>>>> cleaning up the virSecretLoadValue error path to make it clearer what
>>>> needs to be done on failure.
>>>
>>> I think the real reason is that we cannot call virSecretObjEndAPI()
>>> because that *unlocks* the secret object. However, the object is already
>>> unlocked at that point by virSecretObjListRemove() and thus we would
>>> unlock twice while locking just once. Honestly, I'd rather see that
>>> explanation in the commit message. But it's your call.
>>>
>>
>> Partially true - although calling Unlock on something already Unlocked
>> by the same thread IIRC doesn't do much other than cause an error, but
>> we don't fail on that error.
> 
> I don't think this is true. Just consider the following general example:
> 
>     Thread A   |   Thread B
> ------------------------------
> 1)  lock(X)    |
> 2)             |  lock(x)
> 3) unlock(X)   |
> 4) unlock(X)   |
> 5)             |  work(X)
> 
> In this scenario, thread B starts working on X thinking it's locked
> while it isn't. Also, man-page for pthread_mutex_lock says that
> unlocking an unlocked lock leads to undefined behaviour (we use NORMAL
> non-robust mutexes. And yes, I had to search for what does it mean
> non-robust mutex).
> 

I assume that step 2 is supposed to be (X) and not (x)?

In your scenario above ThreadB gains the lock after step 3 since that's
the moment in time when ThreadA gives it up. When ThreadA goes to unlock
in step 4, it fails because it doesn't own the lock as only ThreadB can
unlock. Without ThreadB in the picture, then the second unlock would be
a failure because the thread isn't locked, cannot unlock an unlocked object.

IMO:  undefined behavior is what I'd call a way to say you could get an
error that the lock is already unlocked or you could get an error that
you don't own the lock to unlock, but we don't want to explain that so
we'll just it's undefined. For some reason I have this recollection
about fork'd children that really make things crazy, but I'm not sure I
can put it into words.

>>
>> It's ironic this relates to something Erik and I discussed during the
>> virNodeDev* changes w/r/t the "owner" (driver) after the Add/Append now
>> has a reference on the object and would always need to Unref it even
>> after removing it from the List.
> 
> About that. I think that if we do our ref counting right, we don't need
> to do any special unref(). I mean:
> 
> undefine(X) {
>   obj = lookup(X);  // obj now has refcnt = 2, one beacuse it's in the
>                     // list, one for the undefine()
>   listRemove(obj);  // obj.refcnt = 1, the list reference is gone
>   endApi(&obj);     // undefine() reference is gone, obj is freed
> }
> 
> Of course, if list uses two hash tables (one for UUIDs one for names) it
> holds two references, but then listRemove() also unref()-s twice.
> 

When one enters a listRemove function, there will be either 2 or 3 refs
on the object (1 or 2 hash tables) and the object is locked.

When one leaves a listRemove function, there is 1 reference on the
object and the object is unlocked.

So is the problem that listRemove shouldn't make that last Unlock? Or
that the caller shouldn't make an EndAPI call after calling Remove?

The corollary is that the listAdd would leave with 2 or 3 refs (again
number of hash tables) and the object locked.  The caller is then
expected to run EndAPI. The obj is then left with 1 or 2 refs because
it's in a table and not because it's being used again (since EndAPI will
set *obj = NULL).

My "chosen" solution is to keep the Unlock in listRemove, but I can be
swayed to have listRemove keep the lock thus having callers use EndAPI
instead of Unref. That would be something I would want to propagate to
other series so as to keep things similar between all consumers.

>>
>> This just happened to be where I had that oh sh*t moment and realized
>> that when calling Remove we were essentially unlocking and thus yes
>> calling EndAPI would unlock and unlocked object.  I'll add something in
>> about that..
>>
>> John
>>
>> As an aside, this is exactly why I started down the path of common
>> objects. Consider how the callers to virDomainObjListAdd go through
>> great lengths to managed the returned object some quite differently
>> based on what they know about how many refs are returned and whether the
>> calling function calls virDomainObjEndAPI.  If it does, there's always a
>> virObjectRef done on the object after the Add returns. It's a subtle
>> thing, but confusing nonetheless.
>>
>> The thing is the *Remove API will call virHashRemoveEntry which
>> decrements a refcnt for each table; however, the *Add API only
>> increments the refcnt once for adding into each table.  Callers are very
>> careful to understand and manage that.
> 
> I haven't looked into the code, but if this is true then listAdd() and
> listRemove() functions are broken because they mangle refcount. It
> should be able to do the following and having the refcount of an object
> unchanged at the end:
> 
> listAdd(obj);
> listRemove(obj);
> 
> If the refcount is not the same at the end as it was at the beginning,
> the list functions are broken and need to be fixed
> 

Go check callers of virDomainObjListAdd... Every single qemu case will
always virObjectRef(vm) after a call... Other consumers will sometimes
use EndAPI and other times use Unlock based primarily on whether they
called Ref as well. It's at best confusing. The problem in my mind is
that the Add was only refcnt'ing once for two lists while the Remove was
having two unref's.

I think if obj = Add is successful, then it's still up to whatever code
does the Remove(obj) code to subsequently indicate it's really done with
@obj by performing the Unref.  At least that's the goal I've been
working towards...

John

>>
>> I'd rather see that mgmt go away. It should be "safe" to call EndAPI obj
>> the @obj regardless if Add or Lookup was used.  The callers shouldn't
>> have to know they need to either use *EndAPI or ObjUnlock.  In any case,
>> I digress and that's a different issue for another day
> 
> I totally agree on this!
> 
> Michal
>