[libvirt] [PATCH v2 1/9] hostdev: Add virHostdevOnlyReattachPCIDevice()

John Ferlan jferlan at redhat.com
Thu Jan 28 13:50:49 UTC 2016



On 01/27/2016 12:26 PM, Andrea Bolognani wrote:
> On Tue, 2016-01-26 at 18:55 -0500, John Ferlan wrote:
>>  
>> w/r/t: your [0/7] from initial series...
>>  
>> As much as you don't want to keep living Groundhog Day - resolution of
>> bugs like this are job security :-)...
> 
> Groundhog Day is in less than a week, by the way! :)
> 
>> w/r/t Suggestions for deamon restart issues... Seems like we need a way
>> to save/restore the hostdev_mgr active/inactive lists using XML/JSON
>> similar to how domains, storage, etc. handle it. Guess I just assumed
>> that was already there ;-) since /var/run/libvirt/hostdevmgr exists. It
>> seems that network stuff can be restored - virHostdevNetConfigRestore.
>>  
>> Do you really think this series should be "held up" waiting to create
>> some sort of status tracking?
> 
> I will look into your suggestion. I believe such save / restore
> functionality has to be in place by the time this series is merged if
> we don't want to break everything on daemon restart.
> 

I assume that restart is already broken... This series does fix some
issues in the "normal" flow though. Perhaps a chicken and egg type
problem. If you fix restart first, what of this series would be
beneficial to ensure restart doesn't have issues...

>> On 01/25/2016 11:20 AM, Andrea Bolognani wrote:
>>> This function replaces virHostdevReattachPCIDevice() and, unlike it,
>>> does not perform list manipulation, leaving it to the calling function.
>>>  
>>> This means virHostdevReAttachPCIDevices() had to be updated to cope
>>> with the updated requirements.
>>> ---
>>>   src/util/virhostdev.c | 136 +++++++++++++++++++++++++++++++++-----------------
>>>   1 file changed, 90 insertions(+), 46 deletions(-)
>>  
>> Since I reviewed them all... I think the comment changes from 7/9 should
>> just be inlined here and patch 4 instead of a separate patch
> 
> Will do - it was that way in v1 as well.
> 

Right - I started looking at v2

>>> diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c
>>> index f31ad41..66629b4 100644
>>> --- a/src/util/virhostdev.c
>>> +++ b/src/util/virhostdev.c
>>> @@ -526,6 +526,74 @@ virHostdevNetConfigRestore(virDomainHostdevDefPtr hostdev,
>>>       return ret;
>>>   }
>>>   
>>> +/**
>>> + * virHostdevOnlyReattachPCIDevice:
>>  
>> Why not just reuse the function name you deleted? IOW: Is there a reason
>> for "Only"? (not that I'm one that can complain about naming functions,
>> but this just seems strange).
> 
> It's an attempt to make it stand out a bit from
> 
>   virHostdevPCINodeDeviceReAttach()
>   virHostdevReAttachPCIDevices()
> 
> in the same file. Mostly the latter.
> 
> The reasoning behind "Only" is that the function performs "Only" the job
> of reattaching the device to the host, with the error checking,
> bookkeeping and additional steps left to the caller.
> 
> Which is, strictly speaking, not true :)
> 
> Maybe something like virHostdevReattachPCIDeviceCommon(), to express the
> fact that this basically contains as much functionality as it was
> possible to split off to a reusable routine?
> 

virHostdevReattachPCIDeviceVeryCarefully   :-)

But since it's in the comment of the code:

virHostdevReattachPCIDeviceToHost

>>> + * @mgr: hostdev manager
>>> + * @pci: PCI device to be reattached
>>  
>> Interesting ... In 2 instances, this will be a pointer to the "copy"
>> element, while in the third instance this will be the "actual" on
>> inactive list element.  For a copy element, we'd *have* to search
>> inactive; however, for an 'actual' we don't "need" to.
> 
> Good point.
> 
> I will try to find a solution that
> 
>   1. avoids searching the list twice
>   2. avoids duplicating code
>   3. respects the Principle of Least Surprise
> 
> I can't guarantee I'll be able to :)
> 

I kept losing focus on when something was on the inactive list or not.
Then of course trying to reconcile 'pci' and 'dev' variable name usage.

>>> + * @skipUnmanaged: whether to skip unmanaged devices
>>> + *
>>> + * Reattach a PCI device to the host.
>>> + *
>>> + * This function only performs the base reattach steps that are required
>>> + * regardless of whether the device is being detached from a domain or
>>> + * had been simply detached from the host earlier.
>>> + *
>>> + * @pci must have already been marked as inactive, and the PCI related
>>> + * parts of @mgr (inactivePCIHostdevs, activePCIHostdevs) must have been
>>> + * locked beforehand using virObjectLock().
>>> + *
>>> + * Returns: 0 on success, <0 on failure
>>> + */
>>> +static int
>>> +virHostdevOnlyReattachPCIDevice(virHostdevManagerPtr mgr,
>>> +                                virPCIDevicePtr pci,
>>> +                                bool skipUnmanaged)
>>> +{
>>> +    virPCIDevicePtr actual;
>>> +    int ret = -1;
>>> +
>>> +    /* Retrieve the actual device from the inactive list */
>>> +    if (!(actual = virPCIDeviceListFind(mgr->inactivePCIHostdevs, pci))) {
>>> +        VIR_DEBUG("PCI device %s is not marked as inactive",
>>> +                  virPCIDeviceGetName(pci));
>>  
>> This is tricky - the only time we care about the return status (now) is
>> when skipUnmanaged == false, a/k/a the path where we pass the onlist
>> element..
>>  
>> In my first pass through the changes I thought - oh no if we return -1,
>> then this would be a path that could get that generic libvirt failed for
>> some reason message; however, closer inspection noted that we guaranteed
>> it was on the inactive list before calling here.
> 
> So we should be good, right? :)
> 

I think so - a lot of typing out loud which at least gives you my review
context...   I finally convinced myself there wasn't an issue even
though it's strange to return something and in the end, no one really
cares...

Perhaps using a 4th parameter 'actual' (could be NULL) would make it
more obvious that we knew on input that the device was already found on
the inactive list. Determining whether we have a copy or an actual
wasn't readily apparent. Consider some future "user" of this function -
how easy is it to decide what to pass?

>>> +        goto out;
>>> +    }
>>> +
>>> +    /* Skip unmanaged devices if asked to do so */
>>> +    if (!virPCIDeviceGetManaged(actual) && skipUnmanaged) {
>>  
>> <sigh>, unrelated and existing - virPCIDeviceGetManaged probably should
>> return bool instead of unsigned int
> 
> Yup, good catch. The same applies to
> 
>   virPCIDeviceGetUnbindFromStub()
>   virPCIDeviceGetRemoveSlot()
>   virPCIDeviceGetReprobe()
> 
> as well. I'll fix them in a separate commit.
> 

I saw that today...

>>> +        VIR_DEBUG("Not reattaching unmanaged PCI device %s",
>>> +                  virPCIDeviceGetName(actual));
>>> +        ret = 0;
>>> +        goto out;
>>> +    }
>>> +
>>> +    /* Wait for device cleanup if it is qemu/kvm */
>>> +    if (virPCIDeviceGetStubDriver(actual) == VIR_PCI_STUB_DRIVER_KVM) {
>>> +        int retries = 100;
>>> +        while (virPCIDeviceWaitForCleanup(actual, "kvm_assigned_device")
>>> +               && retries) {
>>> +            usleep(100*1000);
>>> +            retries--;
>>> +        }
>>> +    }
>>  
>> Existing, but if retries == 0, then cleanup never succeeded; however,
>> perhaps one can assume that the subsequent call would fall over and play
>> dead? Not that it really gets checked...
> 
> I recall raising the issue at some point in the past, but I don't
> remember the outcome of that discussion... Maybe this can be assessed
> again at a later time?
> 

That's fine - it was just another of those typing out loud moments.

>>> +
>>> +    VIR_DEBUG("Reattaching PCI device %s", virPCIDeviceGetName(actual));
>>> +    if (virPCIDeviceReattach(actual, mgr->activePCIHostdevs,
>>> +                             mgr->inactivePCIHostdevs) < 0) {
>>> +        virErrorPtr err = virGetLastError();
>>> +        VIR_ERROR(_("Failed to reattach PCI device %s: %s"),
>>> +                  virPCIDeviceGetName(actual),
>>> +                  err ? err->message : _("unknown error"));
>>> +        virResetError(err);
>>> +        goto out;
>>> +    }
>>> +
>>> +    ret = 0;
>>> +
>>> + out:
>>> +    return ret;
>>> +}
>>> +
>>>   int
>>>   virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr,
>>>                               const char *drv_name,
>>> @@ -753,45 +821,6 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr,
>>>       return ret;
>>>   }
>>>   
>>> -/*
>>> - * Pre-condition: inactivePCIHostdevs & activePCIHostdevs
>>> - * are locked
>>> - */
>>> -static void
>>> -virHostdevReattachPCIDevice(virPCIDevicePtr dev, virHostdevManagerPtr mgr)
>>> -{
>>> -    /* If the device is not managed and was attached to guest
>>> -     * successfully, it must have been inactive.
>>> -     */
>>> -    if (!virPCIDeviceGetManaged(dev)) {
>>> -        VIR_DEBUG("Adding unmanaged PCI device %s to inactive list",
>>> -                  virPCIDeviceGetName(dev));
>>> -        if (virPCIDeviceListAdd(mgr->inactivePCIHostdevs, dev) < 0)
>>> -            virPCIDeviceFree(dev);
>>> -        return;
>>> -    }
>>> -
>>> -    /* Wait for device cleanup if it is qemu/kvm */
>>> -    if (virPCIDeviceGetStubDriver(dev) == VIR_PCI_STUB_DRIVER_KVM) {
>>> -        int retries = 100;
>>> -        while (virPCIDeviceWaitForCleanup(dev, "kvm_assigned_device")
>>> -               && retries) {
>>> -            usleep(100*1000);
>>> -            retries--;
>>> -        }
>>> -    }
>>> -
>>> -    VIR_DEBUG("Reattaching PCI device %s", virPCIDeviceGetName(dev));
>>> -    if (virPCIDeviceReattach(dev, mgr->activePCIHostdevs,
>>> -                             mgr->inactivePCIHostdevs) < 0) {
>>> -        virErrorPtr err = virGetLastError();
>>> -        VIR_ERROR(_("Failed to re-attach PCI device: %s"),
>>> -                  err ? err->message : _("unknown error"));
>>> -        virResetError(err);
>>> -    }
>>> -    virPCIDeviceFree(dev);
>>> -}
>>> -
>>>   /* @oldStateDir:
>>>    * For upgrade purpose: see virHostdevNetConfigRestore
>>>    */
>>> @@ -803,7 +832,7 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr hostdev_mgr,
>>>                                int nhostdevs,
>>>                                const char *oldStateDir)
>>>   {
>>> -    virPCIDeviceListPtr pcidevs;
>>> +    virPCIDeviceListPtr pcidevs = NULL;
>>>       size_t i;
>>>   
>>>       if (!nhostdevs)
>>> @@ -848,11 +877,25 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr hostdev_mgr,
>>>                       continue;
>>>                   }
>>>           }
>>> +        i++;
>>> +    }
>>  
>> Curious why the decision for a second loop - if activeDev matches, then
>> it almost seems a shame to loop again. Why not (within if (activeDev):
>>  
>>      actual = virPCIDeviceListSteal(hostdev_mgr->activePCIHostdevs,
>>                                     activeDev);
>>      /* !actual should never happen, but better safe than sorry */
>>      if (actual &&
>>          virPCIDeviceListAdd(hostdev_mgr->inactivePCIHostdevs,
>>                              actual) < 0)
>>              virPCIDeviceFree(actual);
>>              /* You could also... */
>>              virPCIDeviceListDel(pcidevs, dev);
>>      }
> 
> Mostly because I consider moving the devices from one list to another
> as a separate step.
> 
> We could merge the two steps, and that would bring us down to 4 (or 5
> if you count the one implicit in virHostdevGetActivePCIHostDeviceList())
> loops, but I'm not sure whether that would be a significant gain in
> performance or whether it would just make the code a little more
> convoluted...
> 

Your call - we go through the pcidevs list many times so it's not that
big a deal...

>> NOTE: Whether there is one or two loops, the second loop would need to
>> call virPCIDeviceFree(actual) since we'd leak it otherwise.
> 
> You mean on error? Because otherwise I don't see the leak: the actual
> device is stolen from the active list and added (itself, not a copy)
> to the inactive list.
> 

Yes, the error path - if you fail to add actual to inactive, then it was
dropped on the floor

>> You'll also note I didn't keep the goto cleanup. Previously the code
>> would completely go through the pcidevs Loop 4 regardless of whether
>> virHostdevReattachPCIDevice code had failures. The new code has the
>> subtle difference of jumping to cleanup if a failure is found. That
>> could leave things in an awkward state especially since
>> virHostdevReAttachPCIDevices is a void.
>>  
>> Since failure for DeviceListAdd is because 1. device is already there
>> (which I would *hope* isn't the case) or 2. memory allocation failure
>> (in which case not much else successful will probably happen anyway),
>> then perhaps continuing on rather than jumping to cleanup isn't a bad
>> idea... We could be returning some memory that someone may find useful.
>>  
>> My concerns about jumping to cleanup are that this API is called in
>> error recovery paths as well as part of the ominous comment "For upgrade
>> purpose:..." (comment before function header). So it seems the "existing
>> logic" is try to clean up as many as possible. By potentially erroring
>> out too soon could lead to more problems.
>>  
>> So the question becomes what havoc is introduced if we cannot add to the
>> inactive list but decide to continue as before... It seems we'll end up
>> "failing" in virHostdevOnlyReattachPCIDevice since it's not in the
>> inactiveList, but our Loop 4 logic doesn't care. Of course we could
>> delete 'dev' from 'pcidevs' too before then...
>>  
>> Hopefully this makes sense... It's been an 'edit in process'...
> 
> See the comment at the end of the message.
> 
>>> +
>>> +    /* Step 2: move all devices from the active list to the inactive list */
>>> +    for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
>>> +        virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i);
>>> +        virPCIDevicePtr actual;
>>>   
>>>           VIR_DEBUG("Removing PCI device %s from active list",
>>>                     virPCIDeviceGetName(dev));
>>> -        virPCIDeviceListDel(hostdev_mgr->activePCIHostdevs, dev);
>>  
>> This was a curious placement for *ListDel... If !activeDev, then calling
>> *ListDel also won't find 'dev' on activelist...
> 
> If 'activeDev != NULL', then driver name and domain name are checked,
> which may cause the 'dev' to be removed from 'pcidev' and the loop to
> restart.
> 
> If that does not happen 'dev' is removed from the active list, even
> thought it might not have been in that list in the first place. But
> the code is doing the right thing in all situations.
> 

Understood, but if !activeDev, then the virPCIDeviceListDel calls
virPCIDeviceListSteal which calls virPCIDeviceListStealIndex using
virPCIDeviceListFindIndex...

You'll note activeList is sourced by calling virPCIDeviceListFind which
calls virPCIDeviceListFindIndex

So I was pointing out how pointless it would be to call
virPCIDeviceListDel if activeDev == NULL (because it too would do nothing).

>>> -        i++;
>>> +        if (!(actual = virPCIDeviceListSteal(hostdev_mgr->activePCIHostdevs,
>>> +                                             dev)))
>>> +            goto cleanup;
>>  
>> If the choice is to use two loops (and perhaps keep the cleanups)...
>>  
>> 1. If this Steal fails, then something is seriously wrong, but we don't
>> even give it a VIR_DEBUG message.
>>  
>> 2. If this Steal fails, then going to cleanup is again a subtle
>> difference with the prior logic that said, well I couldn't do anything
>> with this, but I'm going to keep processing.
>>  
>> 3. If we keep processing, then something on 'pcidevs' won't be in
>> 'inactivePCIHostdevs', causing virHostdevOnlyReattachPCIDevice to fail.
>> But that does not matter since we ignore the return value in Loop 4.
>>  
>> 4. If we do Steal and if the subsequent Add fails, then we leak
>> 'actual', so prior to the goto cleanup call virPCIDeviceFree(actual);
>> (or instead if the goto cleanup;'s are removed).
> 
> Thanks for looking into this in such detail. I will go through the
> existing code again myself and either become confident that the code
> is doing the right thing or change it so that it does :)
> 
> On the other hand, there's this patch I'm working on that changes the
> way bookkeeping is performed quite substantially... My idea was to
> propose it as a follow-up to this series, since it basically replaces
> some constructs with some other "equivalent" constructs without altering
> the overall control flow, but maybe at this point it could be worth it
> to merge everything together and hopefully avoid such pitfalls, and make
> the whole thing easier to reason about.

Try to keep it under 100 patches please ;-)


John
> 
> Cheers.
> 
> -- 
> Andrea Bolognani
> Software Engineer - Virtualization Team
> 




More information about the libvir-list mailing list