[libvirt] [PATCH v4 14/14] nodedev: Remove driver locks around object list mgmt code

Erik Skultety eskultet at redhat.com
Mon Jul 17 06:57:05 UTC 2017


On Fri, Jul 14, 2017 at 11:37:36AM -0400, John Ferlan wrote:
>
>
> On 07/14/2017 05:23 AM, Erik Skultety wrote:
> > On Thu, Jul 13, 2017 at 07:23:58PM -0400, John Ferlan wrote:
> >>
> >>
> >> On 07/11/2017 10:38 AM, Erik Skultety wrote:
> >>> On Mon, Jul 03, 2017 at 05:25:30PM -0400, John Ferlan wrote:
> >>>> Since virnodedeviceobj now has a self-lockable hash table, there's no
> >>>> need to lock the table from the driver for processing. Thus remove the
> >>>> locks from the driver for NodeDeviceObjList mgmt.
> >>>>
> >>>> Signed-off-by: John Ferlan <jferlan at redhat.com>
> >>>
> >>> [..]
> >>>
> >>>> @@ -601,8 +565,6 @@ nodeDeviceDestroy(virNodeDevicePtr device)
> >>>>          return -1;
> >>>>      def = virNodeDeviceObjGetDef(obj);
> >>>>
> >>>> -    nodeDeviceLock();
> >>>> -
> >>>
> >>> Consider the following scenario handling the same device:
> >>>
> >>
> >> What if I told you that's impossible?  You cannot "have" a scsi_hostN,
> >> delete a scsi_hostN, and then have a new one created with the same name.
> >
> > Except I wasn't considering creation, rather than plain change. Although I
> > didn't manage to find out under what circumstances would kernel actually emit
> > the 'change' event (I tried to write to various writable attributes - but since
> > it either lacks documentation completely or it's just well hidden from me - I
> > wasn't able to trigger it), other than explicitly triggering it by writing to
> > sysfs' uevent - so in that aspect, as I wrote below in my previous response,
> > it's highly unlikely but not impossible to hit the race.
> >
>
> Not quite sure what would trigger a change event. A vHBA has a wwnn/wwpn
> and a parent wwnn/wwpn and WWN. If any of those change, the vHBA would
> need to be deleted and recreated.
>
> I do have a faint recollection of considering the ramifications of
> dropping the obj lock in that path and the race drawback, but I
> dismissed it mainly because of "how" vHBA's are created and what could
> constitute a change event for a vHBA essentially redefines it.
>
> >>
> >> The scsi_hostN's are an ever increasing value of N. Once created and
> >> deleted, the N will not be reused.
> >>
> >>> Thread 1 (udev event handler callback) | Thread 2 (nodeDeviceDestroy)
> >>> =======================================|=============================
> >>>                                        | # attempt to destroy a device
> >>>                                        | obj = nodeDeviceObjFindByName()
> >>>                                        |
> >>>                                        | # @obj is locked
> >>>                                        | def = virNodeDeviceObjGetDef(obj)
> >>>                                        |
> >>>                                        | virNodeDeviceObjEndAPI(&obj)
> >>>                                        | # @obj is unlocked
> >>>                                   <------
> >>> # change event                         |
> >>> udevAddOneDevice()                     |
> >>>                                        |
> >>>   obj = virNodeDeviceObjListFindByName |
> >>>   # @obj locked                        |
> >>>   new_device = false                   |
> >>>   # @obj unlocked                      |
> >>>                                        |
> >>>   obj = virNodeDeviceObjListAssignDef  |
> >>>   # @obj locked                        |
> >>>   virNodeDeviceObjEndAPI(&obj)         |
> >>>   # @obj unlocked and @def changed     |
> >>>                                       ------>
> >>>                                        | virNodeDeviceObjListGetParentHost()
> >>>                                        |    if (def->parent) ===> SIGSEGV
> >>>
> >>> In patch 12/14 this would have been a deadlock because you first locked the
> >>> @obj, then nodedev driver while udevAddOneDevice did in the reverse order. I
> >>> don't know about NPIV so I'm not sure whether there is another way of removing
> >>> a device and updating the parent device tree, might require an updated model,
> >>> but for now, you need to make a deep copy of @def. I can see that the chance of
> >>> getting an 'change' event from udev on a vHBA device is low, but I'm trying to
> >>> look at it from a higher perspective, as we want to be able to remove mdevs
> >>> this way, possibly other devices in the future.
> >>
> >> I think what happens is code from virNodeDeviceGetWWNs through
> >> virVHBAManageVport gets placed into a function that handles vHBA's on
> >> deletion.  Similarly for CreateXML, vHBA specific code ends up in a
> >> helper function. Those helpers would be called based on the type of
> >> object/device we're talking about (vHBA/mdev).
> >
> > Despite the likelihood of the case I'm describing, the main point I'm trying to
> > make is that the lock protects a mutable resource (@def) and by releasing it
> > followed by querying it without actually holding the lock violates thread
> > safety.
> >
>
> I understand the position and while the likelihood is essentially next
> to zero that something like that could happen it's also possible to
> remove @def and pass copies of the name, parent, parent_wwnn,
> parent_wwpn, and parent_fabric_wwn.
>
> So before I do that - can we close on patches 1-12?

Yeah, I wasn't explicit with the ACK for 1-12 because I first wanted to have a
small discussion about the changes, but we can now focus solely on the rest.

So, ACK to 1-12 with the proposed change to virNode*ObjNew not consuming @def
(I don't have a strong preference here that I could back with some reasonable
pros/cons, to me, having it behave like a constructor for OOP, which we're
essentially trying to copy in libvirt, naturally makes sense, but I also see
Michal's point).

Erik




More information about the libvir-list mailing list