[libvirt] RFC: libxl race fixes

Jim Fehlig jfehlig at suse.com
Fri Jan 18 21:33:19 UTC 2013


Daniel P. Berrange wrote:
> On Thu, Jan 17, 2013 at 10:29:35PM -0700, Jim Fehlig wrote:
>   
>> Jim Fehlig wrote:
>>     
>>> I've been investigating some races in the libxl driver and would like to
>>> get comments on some potential solutions.
>>>
>>> The first race is in the fd/timeout event handling code, which maps
>>> libxl's osevent interface to libvirt's event loop interface.  This
>>> mapping opens the possibility for libvirt's event loop to invoke
>>> event callbacks after libxl has already deregistered the event,
>>> potentially accessing an event object that has already been freed.
>>>
>>> One solution to this race I've found successful is reference counting the
>>> objects associated with the events.  When libxl registers an event, an
>>> object encapsulating the event is created and it's reference count is set
>>> to 1.  When the event is injected into libvirt's event loop, another
>>> reference is taken on the object.  When libxl deregisters the event, it's
>>> reference count is decremented.  Once the event is removed from libvirt's
>>> event loop, the final reference is decremented and the object is disposed.
>>>   
>>>       
>> While rebasing this solution to use danpb's recent virObjectLockable
>> change, I revisited using only a lock in the libxlDomainObject,
>> acquiring the lock in the registration, deregistration, modify, and
>> callback functions.  I recall some problems with this approach before,
>> but now find it to be sufficient.  Perhaps a misplaced lock drove me to
>> the unneeded complexity of this double lock nonsense...
>>
>>     
>>> This approach ensures the object is not disposed until it is removed
>>> from libvirt's event loop *and* libxl had explicitly deregistered the event.
>>> The notion of an event being 'disabled' found in libvirt's event loop impl
>>> also had to be added to the libxl event object, to ensure the driver doesn't
>>> call into libxl for a previously deregistered event.
>>>
>>> The second race is between destroying a vm (i.e., calling
>>> privateDataFreeFunc, which frees the libxl ctx) and deregistration/cleanup
>>> of all events that have been registered by libxl.
>>>
>>> One solution for this race is to convert libxlDomainObjPrivate to a
>>> virObject and increment its reference for each fd and timeout registration.
>>>   
>>>       
>> Only change here is using the new virObjectLockable.
>>     
>
> I'm surprised that you need to have a separate lock for libxlDomainObjPrivate.
> The lifetime of that object is tied to the lifetime of the virDomainObjPtr,
> which already has a lock. Isn't it sufficient to do locking on the latter,
> whenever the libxlDomainObjPrivate need protecting ?
>   

Yeah, I thought it would be sufficient too, but fd/timeout events can
happen during vm operations where the virDomainObj is locked.  E.g.
during vm creation, the virDomainObj is locked but fd/timeout events
occur as libxl reads/writes to xenstore, sets watches, reads a
kernel/initrd from host, etc.  It seems cleaner to have a lock in the
libxlDomainObjPrivate for accessing libxl's event registrations.

Regards,
Jim




More information about the libvir-list mailing list