[libvirt] [PATCH 20/24] hostdev: Save netdev configuration of actual device

John Ferlan jferlan at redhat.com
Tue Mar 15 18:33:05 UTC 2016



On 03/14/2016 02:00 PM, Andrea Bolognani wrote:
> Word of warning, since this is very likely to get extremely
> verbose: I fully expect to slip, here and there, and refer not
> to the *current* situation as of this patch, but to the
> *desired* situation at the end of the series.

I've rebased to top of tree after your recent pushes... Seems like a
good place to reset focus!  I think I need a virtual whiteboard.

> 
> Such *desired* situation is (not explicitly enough?) laid out
> in patch 19, but I'll expand on it here for clarity:
> 
>   * the active list and inactive list contain the actual devices

I understand/agree with the goal for actual devices in each list. One
mental block is usage of virPCIDeviceListAdd for activeList and
virPCIDeviceListAddCopy for inactiveList. The second mental block is
whether these are for only managed devices (more on that later).

> 
>   * each device is contained either in the active list, or in
>     the inactive list. It can't be in both

Sure that's where you're headed, but with the current code this isn't
necessarily true as a device could be on the inactiveList and activeList
during patch 5, but that's the goal of patch 22.

> 
>   * a device that's neither in the active list nor in the
>     inactive list is assigned to the host

OK.. true... hopefully!

> 
>   * for each PCI device, only the virPCIDevice instance that is
>     part of one of the bookkeeping lists matter

right.

> 
>   * when an operation is to be performed on a PCI device, the
>     relevant virPCIDevice instance must be looked up in the
>     appropriate bookkeeping list

sure, one would hope!  Of course they need to be populated that way...

> 
>   * virPCIDevice instances in 'pcidevs' are only used for
>     looking up actual devices in the bookkeeping lists, contain
>     no valuable data and are disposable at any time

This is where things start getting a bit fuzzy. I agree with the
concept; however, the current implementation has a gotcha. Consider how
VIR_APPEND_ELEMENT works. Initially 'pcidevs' is created as a new List,
then for each 'hostdev' device, a new virPCIDevicePtr is created and
added to the pcidevs list using virPCIDeviceListAdd. It's taken a bit to
have VIR_APPEND_ELEMENT processing sink in for me ;-), but as I read the
Prepare code it makes use of this processing model instead of creating a
separate copy for the activeList.

> 
>   * exceptions to the above points are to be documented
> 
> Hopefully this makes it easier to see the meaning behind some
> of the changes :)
> 
> 
> On Fri, 2016-03-11 at 14:40 -0500, John Ferlan wrote:
>> On 03/10/2016 06:02 PM, John Ferlan wrote:
>>> On 03/07/2016 12:24 PM, Andrea Bolognani wrote:
>> So I went through this one again (new day, fresh start, partially clear
>> mind).
> 
> That often helps :)
> 

So here again, I'm trying to start fresh - go through each patch 1 by 1
so we can "agree" and move to the next one. Save a few electrons, too.
Let's try to get to a point where patch22 can be further dissected.

>> First off - perhaps something for the previous documentation patch -
>> PrepareDevices is called during the qemuProcessLaunch processing (eg
>> domain startup) for all known host devices. It's also called during
>> qemuDomainAttachHostPCIDevice or libxlDomainAttachHostPCIDevice
>> (hotplug) for the *one* new device to be attached.
>>  
>> The purpose of the function is to take the passed hostdev list, move the
>> devices to a managed active host device list for the guest. Devices that
>> do not have the managed attribute are not processed.
> 
> That's not true, though: unmanaged devices *are* processed, even
> if less operations are performed on them. The part about
> detaching them from the host is skipped, because it's supposed
> to have been performed by the user alread, but that's it;
> everything else (resetting them, moving them to the appropriate
> bookkeeping list, storing information about what driver and
> domain is using them) is just the same as managed devices.
> 

I think this is where the confusion for me lies. I seem to have also
mistyped a bit - perhaps it's the blurred lines between current and
future perfect (hah!) state.

Initially an unmanaged device is not placed on the inactiveList (e.g.
step2 in the PreparePCIDevices code), but it is placed on the activeList
(step5).

Later on, we can place an unmanaged device on the inactiveList either in
virHostdevReattachPCIDevice using the "stolen" 'pcidevs' entry during
virHostdevReAttachPCIDevices or virHostdevPCINodeDeviceDetach where
virPCIDeviceDetach uses a copy of the device. Although patch22 removes
the addition into the inactiveList if !managed (new step5 and deletion
of lines in ReattachPCIDevice).  Still leaves the NodeDeviceDetach path
to understand from whence that request came.

>>>> @@ -650,7 +650,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr,
>>>>   
>> So after step 6, all the pcidevs are on the activePCIHostdevs list.
>> Theoretically at least ;-).  None are on the inactivePCIHostdevs list.
>>  
>> Why in step 7 do we run through pcidevs, to find the device on the
>> activePCIHostdev list in order to set the UsedBy of the actual device on
>> the active list. Why not run the 'activePCIHostdev' list here too?
>>  
>> Since we really don't need pcidevs any more - can we delete it now?
>> E.G. Step 9 it... Steps 7 & 8 don't really need pcidevs, right?  There
>> is no error path for either.  I'm not asking for a patch - just making
>> sure I'm reading things right!
> 
> I guess we *could* get rid of it early (even though I haven't
> really checked to make sure it would work), but doing so wouldn't
> give us any advantage IMHO.
> 
> As it is now (well, not as of patch 20, but as of patch 22 which
> is basically what this whole series is building towards) 'pcidevs'
> is used only for iteration, and iteration is performed only over
> 'pcidevs'. This is exactly the kind of straightforward knowledge
> we need to hold on to if we are ever to get this code in a
> manageable state.

OK - it was a suggestion; however, as you'll see below I there's a
caveat with this pcidevs list and the activeList...

> 
> Examples of stuff that is easier to reason about if we keep
> 'pcidevs' around and only ever iterate over it: to obtain an
> actual device, we *always* need to look it up in the appropriate
> list, no ambiguity; in the cleanup path, we can *always* just
> destroy all of 'pcidevs', because no valuable data is stored in
> there.
> 
> So, even if it might mean performing a couple more lookups here
> and there, I think the separation between 'pcidevs' (a list of
> instances we can use to look up the actual data) and the actual
> bookkeeping lists (where the data we care about is stored) is
> crucial and should be enforced.
> 

OK, understood - I agree. The extra lookups though can be expensive, but
safety first.

>>>>       /* Step 8: Now set the original states for hostdev def */
>>>>       for (i = 0; i < nhostdevs; i++) {
>>>> -        virPCIDevicePtr pci;
>>>> +        virPCIDevicePtr actual;
>>>>           virDomainHostdevDefPtr hostdev = hostdevs[i];
>>>>           virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci;
>>>>   
>>>> @@ -659,24 +659,27 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr,
>>>>           if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
>>>>               continue;
>>>>   
>>>> -        pci = virPCIDeviceListFindByIDs(pcidevs,
>>>> -                                        pcisrc->addr.domain,
>>>> -                                        pcisrc->addr.bus,
>>>> -                                        pcisrc->addr.slot,
>>>> -                                        pcisrc->addr.function);
>>>> +        /* We need to look up the actual device because it's the one
>>>> +         * that contains the information we care about (unbind_from_stub,
>>>> +         * remove_slot, reprobe) */
>>>
>>> When a device goes from the inactivePCIHostdevs list to the
>>> activePCIHostdevs list "at some point in time" in the future - does it
>>> do a similar save?
>>  
>> I'll answer my own question - because this is the Prepare function and
>> we don't have to worry about it since everything is on the active list.
> 
> Exactly, all devices that go from the inactive list to the active
> list do so via this function... We don't need to handle this
> anywhere else. And by the time this specific lookup is performed,
> all devices have already been moved to the active list (step 5).
> 

Right, but if I read patch22 correctly (jumping slightly ahead), only
managed devices would be placed onto the activeList. Currently all
pcidevs are moved to activeList, but your change there moves from
inactiveList to activeList, where AFAICT the inactiveList only has
managed devices.

See my conundrum now?  The good news is the fridge is stocked with cold
ones to help me ;-)

>>> That is this change only grabs devices from the active list for the
>>> save; whereas, prior to this change it would pull from all pcidevs
>>>  
>> So, I believe this part is correct albeit a bit confusing...
> 
> There are *no* devices that are in 'pcidevs' but *not* in the
> active list by this point - again, all objects in 'pcidevs' are
> used (or supposed to be used) for lookup purposes only.
> 
> I wonder if a better name would help making this easier to keep
> in mind even for someone who's looking at the code for the first
> time, or even myself two months from now :)
> 
> What about 'tmpDevices'? 'lookupDevices'?
> 

pcidevs is fine to me... it's just a list of PCI hostdevs.

>>>> +        actual = virPCIDeviceListFindByIDs(mgr->activePCIHostdevs,
>>>> +                                           pcisrc->addr.domain,
>>>> +                                           pcisrc->addr.bus,
>>>> +                                           pcisrc->addr.slot,
>>>> +                                           pcisrc->addr.function);
>>>>   
>>>>           /* Appropriate values for the unbind_from_stub, remove_slot
>>>>            * and reprobe properties of the device were set earlier
>>>>            * by virPCIDeviceDetach() */
>>>> -        if (pci) {
>>>> +        if (actual) {
>>>>               VIR_DEBUG("Saving network configuration of PCI device %s",
>>>> -                      virPCIDeviceGetName(pci));
>>>> +                      virPCIDeviceGetName(actual));
>>>>               hostdev->origstates.states.pci.unbind_from_stub =
>>>> -                virPCIDeviceGetUnbindFromStub(pci);
>>>> +                virPCIDeviceGetUnbindFromStub(actual);
>>>>               hostdev->origstates.states.pci.remove_slot =
>>>> -                virPCIDeviceGetRemoveSlot(pci);
>>>> +                virPCIDeviceGetRemoveSlot(actual);
>>>>               hostdev->origstates.states.pci.reprobe =
>>>> -                virPCIDeviceGetReprobe(pci);
>>>> +                virPCIDeviceGetReprobe(actual);
>>>>           }
>>>>       }
>>>>   
>>>> @@ -844,17 +847,17 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr,
>> More documentation thoughts... Because what exists now for the function
>> doesn't really make sense - it's only describing one variable.
>>  
>> Anyway, the purpose of this function is to take the device(s) passed on
>> the hostdev list and return them to the host (?if they are managed?).
>> This function is called either from hotplug device remove, hotplug
>> device attach failure path, or from qemuProcessStop during process
>> shutdown.  When called from the host device remove or attach failure
>> paths, there is only 1 hostdev in the list.
> 
> More documentation is definitely always a good thing, and this pretty
> much sums up what the function does :)
> 
>> Prior to patch 15, the pcidev's was a copy of the "active list", now
>> it's a list of all pcidevs found. Hopefully on either one of the two
>> lists. That is I hope a found hostdev couldn't be on neither list.
> 
> See above: it's not a list of all PCI devices, just a list of PCI
> devices we're going to detach from the guest (and possibly reattach
> to the host) at this particular time.
> 

I've flushed patch 15 from local cache ;-)

>> Prior to patch 15, step1 would either remove the device from pcidevs or
>> from the activeList depending on the matching driver name. When done,
>> the pcidevs would contain the list of devices just removed from the
>> activeList. Note that virPCIDeviceListDel will repeat the active list
>> virPCIDeviceListFind (essentially)...
>>  
>> After patch 15, the pcidevs gets reduced for devices not on the
>> activeList or without a matching drv/dom name. Whether or not that is on
>> the inactiveList isn't checked, but assumed.
>>  
>> Step 2 is I believe the antecedent of step 4 during the Prepare path.
>> Step 4 would be called after Reset and before placing devices on active
>> list replacing the net config for any PCINet device...
>>  
>> Prior to patch 15, step2 would walk the hostdev's list looking for
>> devices on the pcidevs list (eg. the ones removed from the activeList).
>> It would then take any PCINet device and restore it's config.
>>  
>> After patch 15 and with these changes, walk the inactiveList and perform
>> the same call. This is not the devices we just removed from the
>> activeList, so I don't think using the inactiveList in this case is right.
> 
> Yeah, this is totally bogus.
> 
> Looking up the device in the inactive list is the right thing
> to do *only after* patch 22 has been applied and step 2 is "Move
> devices from the active list to the inactive list".
> 
> With that in place we are guaranteed that the actual device will
> be in the inactive list, and the code above will be correct.
> 
> I believe I actually wrote this code after patch 22 and then
> squashed it in patch 20 on a misguided attempt to present changes
> in a more linear fashion :(
> 

Yeah that makes perfect sense!

>> Before patch 15, step 3 would take any of the devices we removed from
>> the activeList and perform a reset on them.
>>  
>> After patch 15, step 3 would do a similar function assuming that no
>> device could not be on neither list.
> 
> NO device could NOT be on NEITHER list... My head is spinning :)
> 
> Actually, when we get here the devices are *guaranteed* to be in
> neither list, because we removed them from the active list in
> step 1. But that was the case even before patch 15, so it should
> not be a problem.
> 
>> Hopefully this all makes sense - I keep reading it and going back and
>> fort between old and new code.  The comment left in the code after step
>> 1 is most helpful...
> 
> I know how you feel! Thanks for sticking with it, hopefully the
> comments I've provided so far (especially the ones at the top of
> this message) are helpful in understanding the thought process
> behind the changes.
> 
> I'll get back to you with more comments tomorrow - bet you can't
> wait for them! ;)

No good deed goes unpunished ;-):


As I read the current PreparePCIDevices code:

Step1: Walk pcidevs, make some VFIO/ACS check... Make some PCI Node
Device check... assume everything works as documented ;-)

Step2: Walk pcidevs, if managed, call virPCIDeviceDetach to place *copy*
of pci onto the inactive list, else VIR_DEBUG message

Step3: Walk pcidevs, call virPCIDeviceReset on all devices (both those
managed in inactiveList and those that are not managed)

Step4: Walk hostdevs, SRIOV specific to replace network config (close my
eyes, pray this works)

Step5: Walk pcidevs, *adding* each device to activeList,. Since not
copy, then device is on activeList *and* pcidevs at the same time,
right? Step9 will "steal" from pcidevs to ensure virObjectUnref of
pcidevs doesn't have virPCIDeviceListDispose free what's in activeList.
(This is a key point...)

Step6: Walk pcidevs, remove pci from inactiveList (free all memory of
the copy initially placed on inactiveList).

Step7: Walk pcidevs, if device found on activeList, call SetUsedBy

Step8: Walk hostdevs, if device on pcidevs, then set unbind_from_stub,
remove_slot, and reprobe (since everything on pcidevs is also in the
activeList - magically the activeList data gets updated).

Step9: Walk pcidevs, Stealing each element off of pcidevs (but doesn't
virPCIDeviceFree the element so that virObjectUnref(pcidevs) won't find
anything and do the free - because that element is on the activeList!).

My notes (or what keeps bouncing around while working through this):

1. During the Prepare step, activeList add is not a copy, it uses
VIR_APPEND_ELEMENT. IIUC, this means if 'dev' was dereferenced in
virPCIDeviceListAdd after the APPEND, then there'd be a problem since
dev would be set to NULL after the macro; however, since 'dev' is passed
by value, that means the caller (e.g. the Prepare code) still can
reference the element - it's just that the virPCIDevicePtr is listed in
*two places*. You'll note other places in the code will Steal from one
list and place on another, but this code doesn't do that until step9
where it steals all the pcidevs entries into apparently vaporware.
"Theoretically" we really shouldn't reference activeList elements in
pcidevs, but if we do reference them, then "magically" (so to speak) the
activeList element would be updated. Personally it may be easier to use
a *copy* for the activeList too and then let virObjectUnref handle the
pcidevs entry deletion.

2. During Prepare, inactiveList add is done by copy (although one could
argue that virPCIDeviceListAddCopy could be replaced by
VIR_APPEND_ELEMENT_COPY I suppose). So now there are *two* copies of the
same device - this is good and again, I think the model the activeList
should use.

3. Current step9 just "steals" (or VIR_DELETE_ELEMENT) the element from
pcidevs. This is "fine" since it's already/also on the activeList.
You'll note other users of the macro would unref/free whatever memory
was contained in the structure prior to removing from the list. Good
thing we don't do that here ;-). If we don't steal the entry, then
virObjectUnref(pcidevs) will call virPCIDeviceListDispose for us - your
patch22 change to remove this may not be right...

So given the current state of things, here's my view on the changes
assuming my VIR_APPEND_ELEMENT viewpoint:

Change #1: (what';s in the commit message)
Alters step8 to use activeList instead of pcidevs. This part looks
correct (perhaps worthy of it's own ACKable patch).

Change #2: (not in the commit message)
You jump into virHostdevReAttachPCIDevices and I then have to focus on a
different algorithm and step process. Here's where I definitely believe
you had patch22 in mind.

But focusing on the code as it exists now... This code walks the
hostdev's and will find anything on the pcidevs list which was on the
former activeList and ensures the NetConfigRestore is run (e.g. the
corollary of step4 in Prepare).

Your change to ensure the device is on the inactiveList doesn't seem
right at this point in time of the code. When you insert step2 in
patch22, it would perhaps be more correct...

Without peeking ahead too much, I think the insert unmanaged device onto
inactiveList done in virHostdevReattachPCIDevice is incorrect at this
point. Even if not incorrect, it's confusing.

Thinking in terms of current code - step4 will take the everything that
was formerly on the active list and:

1. Steal the pcidevs entry

2. Call virHostdevReattachPCIDevice with the stolen lookaside pcidevs
entry...  While it appears to be an actual, it isn't. Of course, I think
you address that in patch 22, but trying not to get too far ahead.

   2a. If the device is not managed, then add to the inactiveList and
return (call virPCIDeviceFree if unsuccessfully add). What really
confuses me here is why we place onto the inactiveList if not managed;
however, you address that in patch22. Still it could be addressed
earlier if that is a separate bug...

   2b. If the device is managed, call virPCIDeviceReattach

      2b1. Fail if on activeList (miserable)
      2b2. Unbind stub
      2b3. Call virPCIDeviceListDel
        -> Steal from inactiveList (search inactiveList for matching
device by domain address, bus slot of the passed lookside pci device)
        -> call virPCIDeviceFree on the inactiveList entry

   2c. Call virPCIDeviceFree on the lookaside list entry.

3. virObjectUnref(pcidevs)

  -> If I read the code right - each entry was stolen and properly
removed so the virPCIDeviceListDispose has nothing to do.

So it seems at this point, I would surmise virPCIDeviceReattach expects
the pcidevs lookaside entry. The problem being insertion onto the
inactiveList of an unmanaged device (something we avoided in Prepare).
And yes, I'm still curious about virHostdevPCINodeDeviceDetach and how
that plays into things.





John




More information about the libvir-list mailing list