[libvirt] [PATCH v4 14/14] nodedev: Remove driver locks around object list mgmt code

Erik Skultety eskultet at redhat.com
Fri Jul 14 09:23:23 UTC 2017


On Thu, Jul 13, 2017 at 07:23:58PM -0400, John Ferlan wrote:
>
>
> On 07/11/2017 10:38 AM, Erik Skultety wrote:
> > On Mon, Jul 03, 2017 at 05:25:30PM -0400, John Ferlan wrote:
> >> Since virnodedeviceobj now has a self-lockable hash table, there's no
> >> need to lock the table from the driver for processing. Thus remove the
> >> locks from the driver for NodeDeviceObjList mgmt.
> >>
> >> Signed-off-by: John Ferlan <jferlan at redhat.com>
> >
> > [..]
> >
> >> @@ -601,8 +565,6 @@ nodeDeviceDestroy(virNodeDevicePtr device)
> >>          return -1;
> >>      def = virNodeDeviceObjGetDef(obj);
> >>
> >> -    nodeDeviceLock();
> >> -
> >
> > Consider the following scenario handling the same device:
> >
>
> What if I told you that's impossible?  You cannot "have" a scsi_hostN,
> delete a scsi_hostN, and then have a new one created with the same name.

Except I wasn't considering creation, rather than plain change. Although I
didn't manage to find out under what circumstances would kernel actually emit
the 'change' event (I tried to write to various writable attributes - but since
it either lacks documentation completely or it's just well hidden from me - I
wasn't able to trigger it), other than explicitly triggering it by writing to
sysfs' uevent - so in that aspect, as I wrote below in my previous response,
it's highly unlikely but not impossible to hit the race.

>
> The scsi_hostN's are an ever increasing value of N. Once created and
> deleted, the N will not be reused.
>
> > Thread 1 (udev event handler callback) | Thread 2 (nodeDeviceDestroy)
> > =======================================|=============================
> >                                        | # attempt to destroy a device
> >                                        | obj = nodeDeviceObjFindByName()
> >                                        |
> >                                        | # @obj is locked
> >                                        | def = virNodeDeviceObjGetDef(obj)
> >                                        |
> >                                        | virNodeDeviceObjEndAPI(&obj)
> >                                        | # @obj is unlocked
> >                                   <------
> > # change event                         |
> > udevAddOneDevice()                     |
> >                                        |
> >   obj = virNodeDeviceObjListFindByName |
> >   # @obj locked                        |
> >   new_device = false                   |
> >   # @obj unlocked                      |
> >                                        |
> >   obj = virNodeDeviceObjListAssignDef  |
> >   # @obj locked                        |
> >   virNodeDeviceObjEndAPI(&obj)         |
> >   # @obj unlocked and @def changed     |
> >                                       ------>
> >                                        | virNodeDeviceObjListGetParentHost()
> >                                        |    if (def->parent) ===> SIGSEGV
> >
> > In patch 12/14 this would have been a deadlock because you first locked the
> > @obj, then nodedev driver while udevAddOneDevice did in the reverse order. I
> > don't know about NPIV so I'm not sure whether there is another way of removing
> > a device and updating the parent device tree, might require an updated model,
> > but for now, you need to make a deep copy of @def. I can see that the chance of
> > getting an 'change' event from udev on a vHBA device is low, but I'm trying to
> > look at it from a higher perspective, as we want to be able to remove mdevs
> > this way, possibly other devices in the future.
>
> I think what happens is code from virNodeDeviceGetWWNs through
> virVHBAManageVport gets placed into a function that handles vHBA's on
> deletion.  Similarly for CreateXML, vHBA specific code ends up in a
> helper function. Those helpers would be called based on the type of
> object/device we're talking about (vHBA/mdev).

Despite the likelihood of the case I'm describing, the main point I'm trying to
make is that the lock protects a mutable resource (@def) and by releasing it
followed by querying it without actually holding the lock violates thread
safety.

>
> BTW: I recall doing a review suggesting perhaps creating an mdev pool
> driver of sorts. Daniel essentially responded that using the node device
> driver and augmenting it to fit the needs would be OK. At the time, I
> wasn't specifically thinking about this case, but it certainly qualifies
> as something of concern where the existing node device code can make
> some assumptions about names that the underlying udev code provides.
>
> You may need to add some extra layer of protection if names can be
> reused especially because of this event mgmt "problem"... You may also
> need to modify that "if (dev)" code to check if the dev is an mdev and
> if so, do something different.
>
> Looks like the code was added by commit id '546fa3ef'. Perhaps as a way
> to "follow" how other drivers did things.
>
> "Fortunately" the only consumer of CreateXML/Destroy ends up being vHBA
> and we know there's rules around the naming. IIRC - you were letting
> MDEV also set the name, right? That is, a 'name' on input is essentially

No, mdev's name is a readonly attribute that is optional and exposed by the
vendor, the only thing libvirt can currently write to the mdev's sysfs
interface is UUID.

Erik

> and happily ignored. So what creates that name? And can you be assured
> it's going to be unique for the life/run time of the host?  If so,
> there's no way a CreateXML could reuse a name that AddOneDevice would be
> using, right?
>
> Maybe I need to think some more - it's been a long day already
>
> John
>
> >
> > The rest of the patch looks okay, but I want to try it first.
> >
> > Erik
> >




More information about the libvir-list mailing list