[libvirt] [PATCH v2 1/9] hostdev: Add virHostdevOnlyReattachPCIDevice()
John Ferlan
jferlan at redhat.com
Thu Jan 28 13:50:49 UTC 2016
On 01/27/2016 12:26 PM, Andrea Bolognani wrote:
> On Tue, 2016-01-26 at 18:55 -0500, John Ferlan wrote:
>>
>> w/r/t: your [0/7] from initial series...
>>
>> As much as you don't want to keep living Groundhog Day - resolution of
>> bugs like this are job security :-)...
>
> Groundhog Day is in less than a week, by the way! :)
>
>> w/r/t Suggestions for deamon restart issues... Seems like we need a way
>> to save/restore the hostdev_mgr active/inactive lists using XML/JSON
>> similar to how domains, storage, etc. handle it. Guess I just assumed
>> that was already there ;-) since /var/run/libvirt/hostdevmgr exists. It
>> seems that network stuff can be restored - virHostdevNetConfigRestore.
>>
>> Do you really think this series should be "held up" waiting to create
>> some sort of status tracking?
>
> I will look into your suggestion. I believe such save / restore
> functionality has to be in place by the time this series is merged if
> we don't want to break everything on daemon restart.
>
I assume that restart is already broken... This series does fix some
issues in the "normal" flow though. Perhaps a chicken and egg type
problem. If you fix restart first, what of this series would be
beneficial to ensure restart doesn't have issues...
>> On 01/25/2016 11:20 AM, Andrea Bolognani wrote:
>>> This function replaces virHostdevReattachPCIDevice() and, unlike it,
>>> does not perform list manipulation, leaving it to the calling function.
>>>
>>> This means virHostdevReAttachPCIDevices() had to be updated to cope
>>> with the updated requirements.
>>> ---
>>> src/util/virhostdev.c | 136 +++++++++++++++++++++++++++++++++-----------------
>>> 1 file changed, 90 insertions(+), 46 deletions(-)
>>
>> Since I reviewed them all... I think the comment changes from 7/9 should
>> just be inlined here and patch 4 instead of a separate patch
>
> Will do - it was that way in v1 as well.
>
Right - I started looking at v2
>>> diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c
>>> index f31ad41..66629b4 100644
>>> --- a/src/util/virhostdev.c
>>> +++ b/src/util/virhostdev.c
>>> @@ -526,6 +526,74 @@ virHostdevNetConfigRestore(virDomainHostdevDefPtr hostdev,
>>> return ret;
>>> }
>>>
>>> +/**
>>> + * virHostdevOnlyReattachPCIDevice:
>>
>> Why not just reuse the function name you deleted? IOW: Is there a reason
>> for "Only"? (not that I'm one that can complain about naming functions,
>> but this just seems strange).
>
> It's an attempt to make it stand out a bit from
>
> virHostdevPCINodeDeviceReAttach()
> virHostdevReAttachPCIDevices()
>
> in the same file. Mostly the latter.
>
> The reasoning behind "Only" is that the function performs "Only" the job
> of reattaching the device to the host, with the error checking,
> bookkeeping and additional steps left to the caller.
>
> Which is, strictly speaking, not true :)
>
> Maybe something like virHostdevReattachPCIDeviceCommon(), to express the
> fact that this basically contains as much functionality as it was
> possible to split off to a reusable routine?
>
virHostdevReattachPCIDeviceVeryCarefully :-)
But since it's in the comment of the code:
virHostdevReattachPCIDeviceToHost
>>> + * @mgr: hostdev manager
>>> + * @pci: PCI device to be reattached
>>
>> Interesting ... In 2 instances, this will be a pointer to the "copy"
>> element, while in the third instance this will be the "actual" on
>> inactive list element. For a copy element, we'd *have* to search
>> inactive; however, for an 'actual' we don't "need" to.
>
> Good point.
>
> I will try to find a solution that
>
> 1. avoids searching the list twice
> 2. avoids duplicating code
> 3. respects the Principle of Least Surprise
>
> I can't guarantee I'll be able to :)
>
I kept losing focus on when something was on the inactive list or not.
Then of course trying to reconcile 'pci' and 'dev' variable name usage.
>>> + * @skipUnmanaged: whether to skip unmanaged devices
>>> + *
>>> + * Reattach a PCI device to the host.
>>> + *
>>> + * This function only performs the base reattach steps that are required
>>> + * regardless of whether the device is being detached from a domain or
>>> + * had been simply detached from the host earlier.
>>> + *
>>> + * @pci must have already been marked as inactive, and the PCI related
>>> + * parts of @mgr (inactivePCIHostdevs, activePCIHostdevs) must have been
>>> + * locked beforehand using virObjectLock().
>>> + *
>>> + * Returns: 0 on success, <0 on failure
>>> + */
>>> +static int
>>> +virHostdevOnlyReattachPCIDevice(virHostdevManagerPtr mgr,
>>> + virPCIDevicePtr pci,
>>> + bool skipUnmanaged)
>>> +{
>>> + virPCIDevicePtr actual;
>>> + int ret = -1;
>>> +
>>> + /* Retrieve the actual device from the inactive list */
>>> + if (!(actual = virPCIDeviceListFind(mgr->inactivePCIHostdevs, pci))) {
>>> + VIR_DEBUG("PCI device %s is not marked as inactive",
>>> + virPCIDeviceGetName(pci));
>>
>> This is tricky - the only time we care about the return status (now) is
>> when skipUnmanaged == false, a/k/a the path where we pass the onlist
>> element..
>>
>> In my first pass through the changes I thought - oh no if we return -1,
>> then this would be a path that could get that generic libvirt failed for
>> some reason message; however, closer inspection noted that we guaranteed
>> it was on the inactive list before calling here.
>
> So we should be good, right? :)
>
I think so - a lot of typing out loud which at least gives you my review
context... I finally convinced myself there wasn't an issue even
though it's strange to return something and in the end, no one really
cares...
Perhaps using a 4th parameter 'actual' (could be NULL) would make it
more obvious that we knew on input that the device was already found on
the inactive list. Determining whether we have a copy or an actual
wasn't readily apparent. Consider some future "user" of this function -
how easy is it to decide what to pass?
>>> + goto out;
>>> + }
>>> +
>>> + /* Skip unmanaged devices if asked to do so */
>>> + if (!virPCIDeviceGetManaged(actual) && skipUnmanaged) {
>>
>> <sigh>, unrelated and existing - virPCIDeviceGetManaged probably should
>> return bool instead of unsigned int
>
> Yup, good catch. The same applies to
>
> virPCIDeviceGetUnbindFromStub()
> virPCIDeviceGetRemoveSlot()
> virPCIDeviceGetReprobe()
>
> as well. I'll fix them in a separate commit.
>
I saw that today...
>>> + VIR_DEBUG("Not reattaching unmanaged PCI device %s",
>>> + virPCIDeviceGetName(actual));
>>> + ret = 0;
>>> + goto out;
>>> + }
>>> +
>>> + /* Wait for device cleanup if it is qemu/kvm */
>>> + if (virPCIDeviceGetStubDriver(actual) == VIR_PCI_STUB_DRIVER_KVM) {
>>> + int retries = 100;
>>> + while (virPCIDeviceWaitForCleanup(actual, "kvm_assigned_device")
>>> + && retries) {
>>> + usleep(100*1000);
>>> + retries--;
>>> + }
>>> + }
>>
>> Existing, but if retries == 0, then cleanup never succeeded; however,
>> perhaps one can assume that the subsequent call would fall over and play
>> dead? Not that it really gets checked...
>
> I recall raising the issue at some point in the past, but I don't
> remember the outcome of that discussion... Maybe this can be assessed
> again at a later time?
>
That's fine - it was just another of those typing out loud moments.
>>> +
>>> + VIR_DEBUG("Reattaching PCI device %s", virPCIDeviceGetName(actual));
>>> + if (virPCIDeviceReattach(actual, mgr->activePCIHostdevs,
>>> + mgr->inactivePCIHostdevs) < 0) {
>>> + virErrorPtr err = virGetLastError();
>>> + VIR_ERROR(_("Failed to reattach PCI device %s: %s"),
>>> + virPCIDeviceGetName(actual),
>>> + err ? err->message : _("unknown error"));
>>> + virResetError(err);
>>> + goto out;
>>> + }
>>> +
>>> + ret = 0;
>>> +
>>> + out:
>>> + return ret;
>>> +}
>>> +
>>> int
>>> virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr,
>>> const char *drv_name,
>>> @@ -753,45 +821,6 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr,
>>> return ret;
>>> }
>>>
>>> -/*
>>> - * Pre-condition: inactivePCIHostdevs & activePCIHostdevs
>>> - * are locked
>>> - */
>>> -static void
>>> -virHostdevReattachPCIDevice(virPCIDevicePtr dev, virHostdevManagerPtr mgr)
>>> -{
>>> - /* If the device is not managed and was attached to guest
>>> - * successfully, it must have been inactive.
>>> - */
>>> - if (!virPCIDeviceGetManaged(dev)) {
>>> - VIR_DEBUG("Adding unmanaged PCI device %s to inactive list",
>>> - virPCIDeviceGetName(dev));
>>> - if (virPCIDeviceListAdd(mgr->inactivePCIHostdevs, dev) < 0)
>>> - virPCIDeviceFree(dev);
>>> - return;
>>> - }
>>> -
>>> - /* Wait for device cleanup if it is qemu/kvm */
>>> - if (virPCIDeviceGetStubDriver(dev) == VIR_PCI_STUB_DRIVER_KVM) {
>>> - int retries = 100;
>>> - while (virPCIDeviceWaitForCleanup(dev, "kvm_assigned_device")
>>> - && retries) {
>>> - usleep(100*1000);
>>> - retries--;
>>> - }
>>> - }
>>> -
>>> - VIR_DEBUG("Reattaching PCI device %s", virPCIDeviceGetName(dev));
>>> - if (virPCIDeviceReattach(dev, mgr->activePCIHostdevs,
>>> - mgr->inactivePCIHostdevs) < 0) {
>>> - virErrorPtr err = virGetLastError();
>>> - VIR_ERROR(_("Failed to re-attach PCI device: %s"),
>>> - err ? err->message : _("unknown error"));
>>> - virResetError(err);
>>> - }
>>> - virPCIDeviceFree(dev);
>>> -}
>>> -
>>> /* @oldStateDir:
>>> * For upgrade purpose: see virHostdevNetConfigRestore
>>> */
>>> @@ -803,7 +832,7 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr hostdev_mgr,
>>> int nhostdevs,
>>> const char *oldStateDir)
>>> {
>>> - virPCIDeviceListPtr pcidevs;
>>> + virPCIDeviceListPtr pcidevs = NULL;
>>> size_t i;
>>>
>>> if (!nhostdevs)
>>> @@ -848,11 +877,25 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr hostdev_mgr,
>>> continue;
>>> }
>>> }
>>> + i++;
>>> + }
>>
>> Curious why the decision for a second loop - if activeDev matches, then
>> it almost seems a shame to loop again. Why not (within if (activeDev):
>>
>> actual = virPCIDeviceListSteal(hostdev_mgr->activePCIHostdevs,
>> activeDev);
>> /* !actual should never happen, but better safe than sorry */
>> if (actual &&
>> virPCIDeviceListAdd(hostdev_mgr->inactivePCIHostdevs,
>> actual) < 0)
>> virPCIDeviceFree(actual);
>> /* You could also... */
>> virPCIDeviceListDel(pcidevs, dev);
>> }
>
> Mostly because I consider moving the devices from one list to another
> as a separate step.
>
> We could merge the two steps, and that would bring us down to 4 (or 5
> if you count the one implicit in virHostdevGetActivePCIHostDeviceList())
> loops, but I'm not sure whether that would be a significant gain in
> performance or whether it would just make the code a little more
> convoluted...
>
Your call - we go through the pcidevs list many times so it's not that
big a deal...
>> NOTE: Whether there is one or two loops, the second loop would need to
>> call virPCIDeviceFree(actual) since we'd leak it otherwise.
>
> You mean on error? Because otherwise I don't see the leak: the actual
> device is stolen from the active list and added (itself, not a copy)
> to the inactive list.
>
Yes, the error path - if you fail to add actual to inactive, then it was
dropped on the floor
>> You'll also note I didn't keep the goto cleanup. Previously the code
>> would completely go through the pcidevs Loop 4 regardless of whether
>> virHostdevReattachPCIDevice code had failures. The new code has the
>> subtle difference of jumping to cleanup if a failure is found. That
>> could leave things in an awkward state especially since
>> virHostdevReAttachPCIDevices is a void.
>>
>> Since failure for DeviceListAdd is because 1. device is already there
>> (which I would *hope* isn't the case) or 2. memory allocation failure
>> (in which case not much else successful will probably happen anyway),
>> then perhaps continuing on rather than jumping to cleanup isn't a bad
>> idea... We could be returning some memory that someone may find useful.
>>
>> My concerns about jumping to cleanup are that this API is called in
>> error recovery paths as well as part of the ominous comment "For upgrade
>> purpose:..." (comment before function header). So it seems the "existing
>> logic" is try to clean up as many as possible. By potentially erroring
>> out too soon could lead to more problems.
>>
>> So the question becomes what havoc is introduced if we cannot add to the
>> inactive list but decide to continue as before... It seems we'll end up
>> "failing" in virHostdevOnlyReattachPCIDevice since it's not in the
>> inactiveList, but our Loop 4 logic doesn't care. Of course we could
>> delete 'dev' from 'pcidevs' too before then...
>>
>> Hopefully this makes sense... It's been an 'edit in process'...
>
> See the comment at the end of the message.
>
>>> +
>>> + /* Step 2: move all devices from the active list to the inactive list */
>>> + for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
>>> + virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i);
>>> + virPCIDevicePtr actual;
>>>
>>> VIR_DEBUG("Removing PCI device %s from active list",
>>> virPCIDeviceGetName(dev));
>>> - virPCIDeviceListDel(hostdev_mgr->activePCIHostdevs, dev);
>>
>> This was a curious placement for *ListDel... If !activeDev, then calling
>> *ListDel also won't find 'dev' on activelist...
>
> If 'activeDev != NULL', then driver name and domain name are checked,
> which may cause the 'dev' to be removed from 'pcidev' and the loop to
> restart.
>
> If that does not happen 'dev' is removed from the active list, even
> thought it might not have been in that list in the first place. But
> the code is doing the right thing in all situations.
>
Understood, but if !activeDev, then the virPCIDeviceListDel calls
virPCIDeviceListSteal which calls virPCIDeviceListStealIndex using
virPCIDeviceListFindIndex...
You'll note activeList is sourced by calling virPCIDeviceListFind which
calls virPCIDeviceListFindIndex
So I was pointing out how pointless it would be to call
virPCIDeviceListDel if activeDev == NULL (because it too would do nothing).
>>> - i++;
>>> + if (!(actual = virPCIDeviceListSteal(hostdev_mgr->activePCIHostdevs,
>>> + dev)))
>>> + goto cleanup;
>>
>> If the choice is to use two loops (and perhaps keep the cleanups)...
>>
>> 1. If this Steal fails, then something is seriously wrong, but we don't
>> even give it a VIR_DEBUG message.
>>
>> 2. If this Steal fails, then going to cleanup is again a subtle
>> difference with the prior logic that said, well I couldn't do anything
>> with this, but I'm going to keep processing.
>>
>> 3. If we keep processing, then something on 'pcidevs' won't be in
>> 'inactivePCIHostdevs', causing virHostdevOnlyReattachPCIDevice to fail.
>> But that does not matter since we ignore the return value in Loop 4.
>>
>> 4. If we do Steal and if the subsequent Add fails, then we leak
>> 'actual', so prior to the goto cleanup call virPCIDeviceFree(actual);
>> (or instead if the goto cleanup;'s are removed).
>
> Thanks for looking into this in such detail. I will go through the
> existing code again myself and either become confident that the code
> is doing the right thing or change it so that it does :)
>
> On the other hand, there's this patch I'm working on that changes the
> way bookkeeping is performed quite substantially... My idea was to
> propose it as a follow-up to this series, since it basically replaces
> some constructs with some other "equivalent" constructs without altering
> the overall control flow, but maybe at this point it could be worth it
> to merge everything together and hopefully avoid such pitfalls, and make
> the whole thing easier to reason about.
Try to keep it under 100 patches please ;-)
John
>
> Cheers.
>
> --
> Andrea Bolognani
> Software Engineer - Virtualization Team
>
More information about the libvir-list
mailing list