[libvirt] [PATCH v2 1/9] hostdev: Add virHostdevOnlyReattachPCIDevice()

Andrea Bolognani abologna at redhat.com
Wed Jan 27 17:26:19 UTC 2016


On Tue, 2016-01-26 at 18:55 -0500, John Ferlan wrote:
> 
> w/r/t: your [0/7] from initial series...
> 
> As much as you don't want to keep living Groundhog Day - resolution of
> bugs like this are job security :-)...

Groundhog Day is in less than a week, by the way! :)

> w/r/t Suggestions for deamon restart issues... Seems like we need a way
> to save/restore the hostdev_mgr active/inactive lists using XML/JSON
> similar to how domains, storage, etc. handle it. Guess I just assumed
> that was already there ;-) since /var/run/libvirt/hostdevmgr exists. It
> seems that network stuff can be restored - virHostdevNetConfigRestore.
> 
> Do you really think this series should be "held up" waiting to create
> some sort of status tracking?

I will look into your suggestion. I believe such save / restore
functionality has to be in place by the time this series is merged if
we don't want to break everything on daemon restart.

> On 01/25/2016 11:20 AM, Andrea Bolognani wrote:
> > This function replaces virHostdevReattachPCIDevice() and, unlike it,
> > does not perform list manipulation, leaving it to the calling function.
>> > This means virHostdevReAttachPCIDevices() had to be updated to cope
> > with the updated requirements.
> > ---
> >  src/util/virhostdev.c | 136 +++++++++++++++++++++++++++++++++-----------------
> >  1 file changed, 90 insertions(+), 46 deletions(-)
> 
> Since I reviewed them all... I think the comment changes from 7/9 should
> just be inlined here and patch 4 instead of a separate patch

Will do - it was that way in v1 as well.

> > diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c
> > index f31ad41..66629b4 100644
> > --- a/src/util/virhostdev.c
> > +++ b/src/util/virhostdev.c
> > @@ -526,6 +526,74 @@ virHostdevNetConfigRestore(virDomainHostdevDefPtr hostdev,
> >      return ret;
> >  }
> >  
> > +/**
> > + * virHostdevOnlyReattachPCIDevice:
> 
> Why not just reuse the function name you deleted? IOW: Is there a reason
> for "Only"? (not that I'm one that can complain about naming functions,
> but this just seems strange).

It's an attempt to make it stand out a bit from

  virHostdevPCINodeDeviceReAttach()
  virHostdevReAttachPCIDevices()

in the same file. Mostly the latter.

The reasoning behind "Only" is that the function performs "Only" the job
of reattaching the device to the host, with the error checking,
bookkeeping and additional steps left to the caller.

Which is, strictly speaking, not true :)

Maybe something like virHostdevReattachPCIDeviceCommon(), to express the
fact that this basically contains as much functionality as it was
possible to split off to a reusable routine?

> > + * @mgr: hostdev manager
> > + * @pci: PCI device to be reattached
> 
> Interesting ... In 2 instances, this will be a pointer to the "copy"
> element, while in the third instance this will be the "actual" on
> inactive list element.  For a copy element, we'd *have* to search
> inactive; however, for an 'actual' we don't "need" to.

Good point.

I will try to find a solution that

  1. avoids searching the list twice
  2. avoids duplicating code
  3. respects the Principle of Least Surprise

I can't guarantee I'll be able to :)

> > + * @skipUnmanaged: whether to skip unmanaged devices
> > + *
> > + * Reattach a PCI device to the host.
> > + *
> > + * This function only performs the base reattach steps that are required
> > + * regardless of whether the device is being detached from a domain or
> > + * had been simply detached from the host earlier.
> > + *
> > + * @pci must have already been marked as inactive, and the PCI related
> > + * parts of @mgr (inactivePCIHostdevs, activePCIHostdevs) must have been
> > + * locked beforehand using virObjectLock().
> > + *
> > + * Returns: 0 on success, <0 on failure
> > + */
> > +static int
> > +virHostdevOnlyReattachPCIDevice(virHostdevManagerPtr mgr,
> > +                                virPCIDevicePtr pci,
> > +                                bool skipUnmanaged)
> > +{
> > +    virPCIDevicePtr actual;
> > +    int ret = -1;
> > +
> > +    /* Retrieve the actual device from the inactive list */
> > +    if (!(actual = virPCIDeviceListFind(mgr->inactivePCIHostdevs, pci))) {
> > +        VIR_DEBUG("PCI device %s is not marked as inactive",
> > +                  virPCIDeviceGetName(pci));
> 
> This is tricky - the only time we care about the return status (now) is
> when skipUnmanaged == false, a/k/a the path where we pass the onlist
> element..
> 
> In my first pass through the changes I thought - oh no if we return -1,
> then this would be a path that could get that generic libvirt failed for
> some reason message; however, closer inspection noted that we guaranteed
> it was on the inactive list before calling here.

So we should be good, right? :)

> > +        goto out;
> > +    }
> > +
> > +    /* Skip unmanaged devices if asked to do so */
> > +    if (!virPCIDeviceGetManaged(actual) && skipUnmanaged) {
> 
> <sigh>, unrelated and existing - virPCIDeviceGetManaged probably should
> return bool instead of unsigned int

Yup, good catch. The same applies to

  virPCIDeviceGetUnbindFromStub()
  virPCIDeviceGetRemoveSlot()
  virPCIDeviceGetReprobe()

as well. I'll fix them in a separate commit.

> > +        VIR_DEBUG("Not reattaching unmanaged PCI device %s",
> > +                  virPCIDeviceGetName(actual));
> > +        ret = 0;
> > +        goto out;
> > +    }
> > +
> > +    /* Wait for device cleanup if it is qemu/kvm */
> > +    if (virPCIDeviceGetStubDriver(actual) == VIR_PCI_STUB_DRIVER_KVM) {
> > +        int retries = 100;
> > +        while (virPCIDeviceWaitForCleanup(actual, "kvm_assigned_device")
> > +               && retries) {
> > +            usleep(100*1000);
> > +            retries--;
> > +        }
> > +    }
> 
> Existing, but if retries == 0, then cleanup never succeeded; however,
> perhaps one can assume that the subsequent call would fall over and play
> dead? Not that it really gets checked...

I recall raising the issue at some point in the past, but I don't
remember the outcome of that discussion... Maybe this can be assessed
again at a later time?

> > +
> > +    VIR_DEBUG("Reattaching PCI device %s", virPCIDeviceGetName(actual));
> > +    if (virPCIDeviceReattach(actual, mgr->activePCIHostdevs,
> > +                             mgr->inactivePCIHostdevs) < 0) {
> > +        virErrorPtr err = virGetLastError();
> > +        VIR_ERROR(_("Failed to reattach PCI device %s: %s"),
> > +                  virPCIDeviceGetName(actual),
> > +                  err ? err->message : _("unknown error"));
> > +        virResetError(err);
> > +        goto out;
> > +    }
> > +
> > +    ret = 0;
> > +
> > + out:
> > +    return ret;
> > +}
> > +
> >  int
> >  virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr,
> >                              const char *drv_name,
> > @@ -753,45 +821,6 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr,
> >      return ret;
> >  }
> >  
> > -/*
> > - * Pre-condition: inactivePCIHostdevs & activePCIHostdevs
> > - * are locked
> > - */
> > -static void
> > -virHostdevReattachPCIDevice(virPCIDevicePtr dev, virHostdevManagerPtr mgr)
> > -{
> > -    /* If the device is not managed and was attached to guest
> > -     * successfully, it must have been inactive.
> > -     */
> > -    if (!virPCIDeviceGetManaged(dev)) {
> > -        VIR_DEBUG("Adding unmanaged PCI device %s to inactive list",
> > -                  virPCIDeviceGetName(dev));
> > -        if (virPCIDeviceListAdd(mgr->inactivePCIHostdevs, dev) < 0)
> > -            virPCIDeviceFree(dev);
> > -        return;
> > -    }
> > -
> > -    /* Wait for device cleanup if it is qemu/kvm */
> > -    if (virPCIDeviceGetStubDriver(dev) == VIR_PCI_STUB_DRIVER_KVM) {
> > -        int retries = 100;
> > -        while (virPCIDeviceWaitForCleanup(dev, "kvm_assigned_device")
> > -               && retries) {
> > -            usleep(100*1000);
> > -            retries--;
> > -        }
> > -    }
> > -
> > -    VIR_DEBUG("Reattaching PCI device %s", virPCIDeviceGetName(dev));
> > -    if (virPCIDeviceReattach(dev, mgr->activePCIHostdevs,
> > -                             mgr->inactivePCIHostdevs) < 0) {
> > -        virErrorPtr err = virGetLastError();
> > -        VIR_ERROR(_("Failed to re-attach PCI device: %s"),
> > -                  err ? err->message : _("unknown error"));
> > -        virResetError(err);
> > -    }
> > -    virPCIDeviceFree(dev);
> > -}
> > -
> >  /* @oldStateDir:
> >   * For upgrade purpose: see virHostdevNetConfigRestore
> >   */
> > @@ -803,7 +832,7 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr hostdev_mgr,
> >                               int nhostdevs,
> >                               const char *oldStateDir)
> >  {
> > -    virPCIDeviceListPtr pcidevs;
> > +    virPCIDeviceListPtr pcidevs = NULL;
> >      size_t i;
> >  
> >      if (!nhostdevs)
> > @@ -848,11 +877,25 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr hostdev_mgr,
> >                      continue;
> >                  }
> >          }
> > +        i++;
> > +    }
> 
> Curious why the decision for a second loop - if activeDev matches, then
> it almost seems a shame to loop again. Why not (within if (activeDev):
> 
>     actual = virPCIDeviceListSteal(hostdev_mgr->activePCIHostdevs,
>                                    activeDev);
>     /* !actual should never happen, but better safe than sorry */
>     if (actual &&
>         virPCIDeviceListAdd(hostdev_mgr->inactivePCIHostdevs,
>                             actual) < 0)
>             virPCIDeviceFree(actual);
>             /* You could also... */
>             virPCIDeviceListDel(pcidevs, dev);
>     }

Mostly because I consider moving the devices from one list to another
as a separate step.

We could merge the two steps, and that would bring us down to 4 (or 5
if you count the one implicit in virHostdevGetActivePCIHostDeviceList())
loops, but I'm not sure whether that would be a significant gain in
performance or whether it would just make the code a little more
convoluted...

> NOTE: Whether there is one or two loops, the second loop would need to
> call virPCIDeviceFree(actual) since we'd leak it otherwise.

You mean on error? Because otherwise I don't see the leak: the actual
device is stolen from the active list and added (itself, not a copy)
to the inactive list.

> You'll also note I didn't keep the goto cleanup. Previously the code
> would completely go through the pcidevs Loop 4 regardless of whether
> virHostdevReattachPCIDevice code had failures. The new code has the
> subtle difference of jumping to cleanup if a failure is found. That
> could leave things in an awkward state especially since
> virHostdevReAttachPCIDevices is a void.
> 
> Since failure for DeviceListAdd is because 1. device is already there
> (which I would *hope* isn't the case) or 2. memory allocation failure
> (in which case not much else successful will probably happen anyway),
> then perhaps continuing on rather than jumping to cleanup isn't a bad
> idea... We could be returning some memory that someone may find useful.
> 
> My concerns about jumping to cleanup are that this API is called in
> error recovery paths as well as part of the ominous comment "For upgrade
> purpose:..." (comment before function header). So it seems the "existing
> logic" is try to clean up as many as possible. By potentially erroring
> out too soon could lead to more problems.
> 
> So the question becomes what havoc is introduced if we cannot add to the
> inactive list but decide to continue as before... It seems we'll end up
> "failing" in virHostdevOnlyReattachPCIDevice since it's not in the
> inactiveList, but our Loop 4 logic doesn't care. Of course we could
> delete 'dev' from 'pcidevs' too before then...
> 
> Hopefully this makes sense... It's been an 'edit in process'...

See the comment at the end of the message.

> > +
> > +    /* Step 2: move all devices from the active list to the inactive list */
> > +    for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> > +        virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i);
> > +        virPCIDevicePtr actual;
> >  
> >          VIR_DEBUG("Removing PCI device %s from active list",
> >                    virPCIDeviceGetName(dev));
> > -        virPCIDeviceListDel(hostdev_mgr->activePCIHostdevs, dev);
> 
> This was a curious placement for *ListDel... If !activeDev, then calling
> *ListDel also won't find 'dev' on activelist...

If 'activeDev != NULL', then driver name and domain name are checked,
which may cause the 'dev' to be removed from 'pcidev' and the loop to
restart.

If that does not happen 'dev' is removed from the active list, even
thought it might not have been in that list in the first place. But
the code is doing the right thing in all situations.

> > -        i++;
> > +        if (!(actual = virPCIDeviceListSteal(hostdev_mgr->activePCIHostdevs,
> > +                                             dev)))
> > +            goto cleanup;
> 
> If the choice is to use two loops (and perhaps keep the cleanups)...
> 
> 1. If this Steal fails, then something is seriously wrong, but we don't
> even give it a VIR_DEBUG message.
> 
> 2. If this Steal fails, then going to cleanup is again a subtle
> difference with the prior logic that said, well I couldn't do anything
> with this, but I'm going to keep processing.
> 
> 3. If we keep processing, then something on 'pcidevs' won't be in
> 'inactivePCIHostdevs', causing virHostdevOnlyReattachPCIDevice to fail.
> But that does not matter since we ignore the return value in Loop 4.
> 
> 4. If we do Steal and if the subsequent Add fails, then we leak
> 'actual', so prior to the goto cleanup call virPCIDeviceFree(actual);
> (or instead if the goto cleanup;'s are removed).

Thanks for looking into this in such detail. I will go through the
existing code again myself and either become confident that the code
is doing the right thing or change it so that it does :)

On the other hand, there's this patch I'm working on that changes the
way bookkeeping is performed quite substantially... My idea was to
propose it as a follow-up to this series, since it basically replaces
some constructs with some other "equivalent" constructs without altering
the overall control flow, but maybe at this point it could be worth it
to merge everything together and hopefully avoid such pitfalls, and make
the whole thing easier to reason about.

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team




More information about the libvir-list mailing list