[libvirt] [PATCH 22/24] hostdev: Streamline device ownership tracking

Andrea Bolognani abologna at redhat.com
Tue Mar 15 15:45:22 UTC 2016


On Sun, 2016-03-13 at 10:06 -0400, John Ferlan wrote:
> 
> On 03/07/2016 12:24 PM, Andrea Bolognani wrote:
>> > After this patch, ownership of virPCIDevice instances is very easy
> > to keep track of: for each host PCI device, the only instance that
> > actually matters is the one inside one of the bookkeeping list.
>> > Whenever some operation needs to be performed on a PCI device, the
> > actual device is looked up first; when this is not the case, a
> > comment explains the reason.
> > ---
> >  src/util/virhostdev.c | 118 +++++++++++++++++++++++++++-----------------------
> >  1 file changed, 65 insertions(+), 53 deletions(-)
>> FWIW: from your cover - splitting this up... I think it's possible that
> the changes for calling virHostdevReattachPCIDevice with an actual
> device could be in a separate patch...  But I certainly see how this and
> that are "tied" together especially if you're considering the failure
> path through reattachdevs where you've definitely put things on the
> inactiveList.
> 
> There may also be a 4 patches overall in here since I see the Reattach
> code could place an unmanaged device on the inactiveList (which you
> modify in this patch) and I think the ReAttach code adjustments could be
> separate too.

I'll be honest: if we can agree that the code, with this patch
applied, is correct, I see little value in splitting it further.

It's not like we're really going to want to have just *some* of
these changes but not the rest... They're very much connected.

> I think you're on the right track, with all these changes - it's some
> minor details. I've been looking at the changes, walking away, looking
> at them, so hopefully the following is followable!
> 
> BTW: I chose not to look at the last 2 patches as I think there are too
> many questions left over

Fair enough :)

But you'll see that those patches are absolutely trivial - this
patch right here is the whole point of the series, and it's all
smooth sailing from here on :)

> > diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c
> > index d1529c5..b2f54cd 100644
> > --- a/src/util/virhostdev.c
> > +++ b/src/util/virhostdev.c
> > @@ -618,28 +618,22 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr,
> >           last_processed_hostdev_vf = i;
> >      }
> >  
> > -    /* Step 5: Now mark all the devices as active */
> > +    /* Step 5: Move devices from the inactive list to the active list */
> >      for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> >          virPCIDevicePtr pci = virPCIDeviceListGet(pcidevs, i);
> > +        virPCIDevicePtr actual;
> >  
> > -        VIR_DEBUG("Adding PCI device %s to active list",
> > +        VIR_DEBUG("Removing PCI device %s from inactive list",
> >                    virPCIDeviceGetName(pci));
> > -        if (virPCIDeviceListAdd(mgr->activePCIHostdevs, pci) < 0)
> > -            goto inactivedevs;
> > -    }
> > -
> > -    /* Step 6: Now remove the devices from inactive list. */
> > -    for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> > -        virPCIDevicePtr pci = virPCIDeviceListGet(pcidevs, i);
> > +        actual = virPCIDeviceListSteal(mgr->inactivePCIHostdevs, pci);
> >  
> > -        VIR_DEBUG("Removing PCI device %s from inactive list",
> > +        VIR_DEBUG("Adding PCI device %s to active list",
> >                    virPCIDeviceGetName(pci));
> > -        virPCIDeviceListDel(mgr->inactivePCIHostdevs, pci);
> > +        if (!actual || virPCIDeviceListAdd(mgr->activePCIHostdevs, actual) < 0)
> > +            goto inactivedevs;
> >      }
> 
> This is so much more logical to me than the previous code.

Yay :)

> One question
> though - why not use virPCIDeviceListStealIndex?  Theoretically the
> inactiveList and the pcidevs list is the same right? So for every
> devices on pcidevs - steal index 0 of inactiveList and place it on
> activeList.  It's just a slight optimization.

This is in no way guaranteed to be the case.

The inactive list keeps track of all inactive devices, including
the ones that have manually detached from the host by the user
and that we have no intention of ever attaching to a guest.

Consider the case of an Ethernet adapter with 4 ports that are
presented as 4 separate PCI devices, all in the same IOMMU
group: if the user wants to pass only a single port to the
guest, it will detach all of them from the host using 'virsh
nodedev-detach' or similar, and then call 'virsh attach-device'
a single time. The three remaining devices will stay in the
inactive list until the user calls 'virsh nodedev-reattach' on
them.

Another example: the user might decide, for whatever reason, to
detach a PCI device from the host, and then proceed to assign a
completely unrelated host PCI device to a guest, leaving the
previous one unassigned.

> > -    /* Step 7: Now set the used_by_domain of the device in
> > -     * activePCIHostdevs as domain name.
> > -     */
> > +    /* Step 6: Set driver and domain information */
> >      for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> >          virPCIDevicePtr pci, actual;
> >  
> > @@ -655,7 +649,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr,
> >              virPCIDeviceSetUsedBy(actual, drv_name, dom_name);
> >      }
> >  
> > -    /* Step 8: Now set the original states for hostdev def */
> > +    /* Step 7: Now set the original states for hostdev def */
> >      for (i = 0; i < nhostdevs; i++) {
> >          virPCIDevicePtr actual;
> >          virDomainHostdevDefPtr hostdev = hostdevs[i];
> > @@ -690,23 +684,25 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr,
> >          }
> >      }
> >  
> > -    /* Step 9: Now steal all the devices from pcidevs */
> > -    while (virPCIDeviceListCount(pcidevs) > 0)
> > -        virPCIDeviceListStealIndex(pcidevs, 0);
> > -
> 
> Help me out... Why were these lines removed?  The inactiveDevs is
> initially a copy of whatever is on 'pcidevs'... Thus activeDevs takes
> from there and not pcidevs, right?

Before the changes, objects from 'pcidevs' were *added* to the
active list, so it was critical that 'pcidevs' was emptied here:
if that didn't happen, the call to virObjectUnref() would have
destroyed not only the device list, but the devices themselves,
rendering the contents of the active list partially invalid.

After the changes, the objects in 'pcidevs' are *always*
disposable: when devices are detached from the host, they are
*copied* instead of added to the inactive list, and later on
those *copies* are *moved* to the active list.

So we no longer need to be careful here: all the data we care
about is either in the active list or the inactive list, and
'pcidevs', along with all its contents, can be released on a
whim in the cleanup path.

There's more on this below.

>> >      ret = 0;
> >      goto cleanup;
> >  
> >   inactivedevs:
> > -    /* Only steal all the devices from activePCIHostdevs. We will
> > -     * free them in virObjectUnref().
> > -     */
> > +    /* Move devices back to the inactive list so that they can be
> > +     * processed properly below (reattachdevs label) */
> >      for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> >          virPCIDevicePtr pci = virPCIDeviceListGet(pcidevs, i);
> > +        virPCIDevicePtr actual;
> >  
> >          VIR_DEBUG("Removing PCI device %s from active list",
> >                    virPCIDeviceGetName(pci));
> > -        virPCIDeviceListSteal(mgr->activePCIHostdevs, pci);
> > +        if (!(actual = virPCIDeviceListSteal(mgr->activePCIHostdevs, pci)))
> > +            continue;
> > +
> > +        VIR_DEBUG("Adding PCI device %s to inactive list",
> > +                  virPCIDeviceGetName(pci));
> > +        if (virPCIDeviceListAdd(mgr->inactivePCIHostdevs, actual) < 0)
> > +            continue;
> 
> Another option is to VIR_WARN (INFO, DEBUG, whatever) the failure. You
> may also want to clear the error as well (virResetLastError).  There are
> two possible errors - it already exists on the list and memory
> allocation errors.

The first one would not really be a failure, though - we might have
processed only part of the devices before encountering an error, so
it's totally possible that some of them have not been added to the
active list yet.

As for the second one... There really isn't much you can do if your
memory allocation fails and you're already in the error path, is
there? :(

> The continue is superfluous since this is the end of the loop - it only
> matters if someone adds more checks afterwards

I'd rather keep it for clarity's sake.

> >   resetvfnetconfig:
> > @@ -716,16 +712,22 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr,
> >   reattachdevs:
> >      for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> >          virPCIDevicePtr pci = virPCIDeviceListGet(pcidevs, i);
> > +        virPCIDevicePtr actual;
> >  
> > -        if (virPCIDeviceGetManaged(pci)) {
> > +        /* We need to look up the actual device because that's what
> > +         * virPCIDeviceReattach() exepects as its argument */
> 
> "expects"

Right.

> As I read it virPCIDeviceReattach look for the device on the activeList,
> but will "blindly" remove from the inactiveList *if* provided.  That's
> only true for xen_driver and virpcitest.
> 
> I believe this process is the more correct one.

virPCIDeviceReattach() only checks whether the device is on the
active list for safety purposes; you still need to pass in an
actual device (eg. instance taken from one of the bookkeeping
lists, the inactive list in this case) because
virPCIDeviceUnbindFromStub(), which performs the actual task of
reattaching the device to the host, needs to have information
such as unbind_from_stub, remove_slot and reprobe, which had
been stored in the actual device by virPCIDeviceDetach().

> > +        if (!(actual = virPCIDeviceListFind(mgr->inactivePCIHostdevs, pci)))
> > +            continue;
> > +
> > +        if (virPCIDeviceGetManaged(actual)) {
> >              VIR_DEBUG("Reattaching managed PCI device %s",
> > -                      virPCIDeviceGetName(pci));
> > -            ignore_value(virPCIDeviceReattach(pci,
> > +                      virPCIDeviceGetName(actual));
> > +            ignore_value(virPCIDeviceReattach(actual,
> >                                                mgr->activePCIHostdevs,
> >                                                mgr->inactivePCIHostdevs));
> >          } else {
> >              VIR_DEBUG("Not reattaching unmanaged PCI device %s",
> > -                      virPCIDeviceGetName(pci));
> > +                      virPCIDeviceGetName(actual));
> >          }
> >      }
> >  
> > @@ -745,17 +747,6 @@ static void
> >  virHostdevReattachPCIDevice(virHostdevManagerPtr mgr,
> >                              virPCIDevicePtr actual)
> >  {
> > -    /* If the device is not managed and was attached to guest
> > -     * successfully, it must have been inactive.
> > -     */
> > -    if (!virPCIDeviceGetManaged(actual)) {
> > -        VIR_DEBUG("Adding unmanaged PCI device %s to inactive list",
> > -                  virPCIDeviceGetName(actual));
> > -        if (virPCIDeviceListAdd(mgr->inactivePCIHostdevs, actual) < 0)
> > -            virPCIDeviceFree(actual);
> > -        return;
> > -    }
> > -
> 
> Is it just me getting really confused - this code used to add devices to
> the inactiveList even though (at some point in time) the Prepare code
> would "drop" them?  E.G. step 2 before any of these changes.
> 
> Was this perhaps what caused you to add that PCIDeviceListFind code in
> step 2 of patch 21 that I questioned?
> 
> Perhaps this alone deserves it's own patch and I think is it's own
> problem since the lookaside lists are for devices that are managed, right?

This used to take care of the unmanaged devices by adding them
to the inactive list and returning early to skip the call to
virPCIDeviceReattach() later on.

Now we take care of that in the caller, ReAttachPCIDevices(),
by moving *all devices* from the active list to the inactive
list, regardless of whether they were managed or not, and
only calling ReattachPCIDevice() on managed devices.

The end result is the same.

> >      /* Wait for device cleanup if it is qemu/kvm */
> >      if (virPCIDeviceGetStubDriver(actual) == VIR_PCI_STUB_DRIVER_KVM) {
> >          int retries = 100;
> > @@ -774,7 +765,6 @@ virHostdevReattachPCIDevice(virHostdevManagerPtr mgr,
> >                    err ? err->message : _("unknown error"));
> >          virResetError(err);
> >      }
> > -    virPCIDeviceFree(actual);
> 
> Hmm... mental note - what changes here... to cause us not to need to
> free.... Answered my question below [1] during cleanup.

My reply is below as well :)

>> >  }
> >  
> >  /* @oldStateDir:
> > @@ -836,17 +826,29 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr,
> >              continue;
> >          }
> >  
> So step1 now removes all pcidevs that are on the inactive list and that
> are unmanaged since "by rule" only a managed device could be on the
> activeList.

There is no such rule or assumptions: both managed and unmanaged
devices belong to the active list. The only criteria for being
on the active list is, well, being active, ie. assigned to a
guest.

After the changes, step 1 makes sure that 'pcidevs' (our working
set) only contains (pointers to) devices that were both *active*
and *used by the correct domain and driver* to begin with, which
is what we expect to be dealing with in the rest of the function.

Actually, this was already the case as of patch 15, and the
logic was the same even before that. The only change here is
splitting the code that changes / validates 'pcidevs' from the
one that changes the contents of the active list.

> I think you need to update the comment for step1 in any
> case since there's no removal from the activeList.

Yeah, thanks for spotting that.

>> > +        i++;
> > +    }
> > +
> > +    /* Step 2: Move devices from the active list to the inactive list */
> > +    for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> > +        virPCIDevicePtr pci = virPCIDeviceListGet(pcidevs, i);
> > +        virPCIDevicePtr actual;
> > +
> >          VIR_DEBUG("Removing PCI device %s from active list",
> >                    virPCIDeviceGetName(pci));
> > -        virPCIDeviceListDel(mgr->activePCIHostdevs, pci);
> > -        i++;
> > +        actual = virPCIDeviceListSteal(mgr->activePCIHostdevs, pci);
> > +
> > +        if (actual) {
> > +            VIR_DEBUG("Adding PCI device %s to inactive list",
> > +                      virPCIDeviceGetName(pci));
> > +            virPCIDeviceListAdd(mgr->inactivePCIHostdevs, actual);
> 
> Coverity notes that there's no error checking here like there is in
> other calls to virPCIDeviceListAdd.  Perhaps similar to step 5 in
> prepare, we could use VIR_WARN.

I'll add error checking.

> I also think it might be worthwhile to make a common routine to perform
> this step - the only difference being perhaps a 'return/fail on error'
> flag that would be set for Prepare and clear for ReAttach

This might be a nice idea for further improvement, although it
might not matter as much once the code is refactored as
originally outlined in [1], which is going to be my next step.

> > -    /* At this point, any device that had been used by the guest is in
> > -     * pcidevs, but has been removed from activePCIHostdevs.
> > -     */
> > +    /* At this point, any device that had been used by the guest has been
> > +     * moved to the inactive list */
> 
> So yes, step 3 makes perfect sense now.

Yay :)

> > -    /* Step 2: restore original network config of hostdevs that used
> > +    /* Step 3: restore original network config of hostdevs that used
> >       * <interface type='hostdev'>
> >       */
> >      for (i = 0; i < nhostdevs; i++) {
> > @@ -871,7 +873,7 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr,
> >          }
> >      }
> >  
> > -    /* Step 3: perform a PCI Reset on all devices */
> > +    /* Step 4: perform a PCI Reset on all devices */
> >      for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> >          virPCIDevicePtr pci = virPCIDeviceListGet(pcidevs, i);
> >  
> > @@ -888,16 +890,26 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr,
> >          }
> >      }
> >  
> > -    /* Step 4: reattach devices to their host drivers (if managed) or place
> > -     * them on the inactive list (if not managed)
> > -     */
> > -    while (virPCIDeviceListCount(pcidevs) > 0) {
> > -        virPCIDevicePtr pci = virPCIDeviceListStealIndex(pcidevs, 0);
> > -        virHostdevReattachPCIDevice(mgr, pci);
> > +    /* Step 5: Reattach managed devices to their host drivers; unmanaged
> > +     *         devices don't need to be processed further */
> 
> Why use pcidevs?  Why not just walk inactivePCIHostdevs?

For consistency, and because as explained above some of the
devices in the inactive list might be completely unrelated
to the task at hand.

> > +    for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> > +        virPCIDevicePtr pci = virPCIDeviceListGet(pcidevs, i);
> > +        virPCIDevicePtr actual;
> > +
> > +        /* We need to look up the actual device because that's what
> > +         * virHostdevReattachPCIDevice() expects as its argument */
> > +        if (!(actual = virPCIDeviceListFind(mgr->inactivePCIHostdevs, pci)))
> > +            continue;
> > +
> > +        if (virPCIDeviceGetManaged(actual))
> > +            virHostdevReattachPCIDevice(mgr, actual);
> > +        else
> > +            VIR_DEBUG("Not reattaching unmanaged PCI device %s",
> > +                      virPCIDeviceGetName(actual));
> 
> If I've read things right - there shouldn't be any unmanaged device in
> pcidevs at this point as they should have been removed during step1. In
> any case, if you walk inactiveList, then all this is moot.

Not true, there can definitely be unmanaged devices. And we
can't just loop over the inactive list because of the reasons
explained above.

> Also now that we're no longer managing the device it should be removed
> from inactive and free'd.
> 
> IOW: THis loop becomes a while inactivePCIHostdevs has elements, call
> Reattach, then free the device.

virHostdevReattachPCIDevice() calls virPCIDeviceReattach(),
which takes care of removing the device from the inactive
list and releasing the associated memory.

> >      }
> >  
> > -    virObjectUnref(pcidevs);
> >   cleanup:
> 
> [1]
> There could be a memory leak here - I think we need to walk the
> remaining pcidevs entry and free each entry before unreffing the
> containing object.  I think that's what that the previous step4 would do
> with the removed virPCIDeviceFree after the ListStealIndex.

There is no memory leak: virObjectUnref() is now called
unconditionally on 'pcidevs', and that takes care of releasing
both the container itself *and* the instances that had been
added to it.

As explained above, the original code allowed the same instance
to be part of two different lists at a time, 'pcidevs' and one
of the bookkeeping lists. (Both of them, in some cases, but
that was just for a limited period of time during state
transitions.)

What the new code does is regard all objects in 'pcidevs' as
eminently disposable, and store the actual objects in the
bookkeeping lists *every single time*. Each virPCIDevice
instance, be it valuable or not, is only ever in a single list
at a time.

The downside is that we have to copy objects around slightly
more frequently - even though some other changes might very
well offset that overhead already - but the upside is that we
can be way more ham-fisted when it comes to deleting stuff,
which is exactly what we're doing here. It doesn't matter
what's inside 'pcidevs', it's always going to be stuff we
already have an authoritative copy of somewhere else, so let's
just release it in the cleanup path and forget about it.

Way easier to reason about, at least for my brain :)

> > +    virObjectUnref(pcidevs);
> >      virObjectUnlock(mgr->activePCIHostdevs);
> >      virObjectUnlock(mgr->inactivePCIHostdevs);
> >  }
>
Thanks for getting this far - keeping a message this long
understandable is pretty challenging, hopefully I've done a
somewhat decent job at it.

I believe there was some disconnect between the role I intended
a variable to have and the role you understood they would
have, and that caused lots of head scratching on your part...

This is something I tried to address with patch 19, but I can
see it was not as effective as I hoped it would be. Any
suggestions on how to improve the situation?

Cheers.


[1] https://www.redhat.com/archives/libvir-list/2016-January/msg01066.html
-- 
Andrea Bolognani
Software Engineer - Virtualization Team




More information about the libvir-list mailing list