[PATCH] nodedev: transient mdev update on nodeDeviceCreateXML

Jonathon Jongsma jjongsma at redhat.com
Wed Jun 28 17:22:27 UTC 2023


On 6/28/23 3:40 AM, Boris Fiuczynski wrote:
> On 6/28/23 12:03 AM, Jonathon Jongsma wrote:
>> On 6/23/23 5:43 AM, Boris Fiuczynski wrote:
>>> Update the optional mdev attributes on the new created nodedev object as
>>> they otherwise would not get set until the next mdevctl update cycle.
>>>
>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143158
>>> Signed-off-by: Boris Fiuczynski <fiuczy at linux.ibm.com>
>>> ---
>>>   src/node_device/node_device_driver.c | 18 ++++++++++++++++--
>>>   1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/node_device/node_device_driver.c 
>>> b/src/node_device/node_device_driver.c
>>> index a2d0600560..5134d246f3 100644
>>> --- a/src/node_device/node_device_driver.c
>>> +++ b/src/node_device/node_device_driver.c
>>> @@ -847,6 +847,9 @@ static virNodeDevicePtr
>>>   nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>>                           virNodeDeviceDef *def)
>>>   {
>>> +    virNodeDeviceObj *obj;
>>> +    virNodeDeviceDef *new_def;
>>> +    virNodeDevicePtr device;
>>>       g_autofree char *uuid = NULL;
>>>       if (!def->parent) {
>>> @@ -864,8 +867,19 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>>           def->caps->data.mdev.uuid = g_steal_pointer(&uuid);
>>>       }
>>> -    return nodeDeviceFindNewMediatedDevice(conn, 
>>> def->caps->data.mdev.uuid,
>>> - def->caps->data.mdev.parent_addr);
>>> +    device = nodeDeviceFindNewMediatedDevice(conn, 
>>> def->caps->data.mdev.uuid,
>>> + def->caps->data.mdev.parent_addr);
>>
>> So, this function waits up to 60s for the given mediated device to 
>> show up in the list of devices known to the driver (which is stored in 
>> driver->devs and gets populated by handlers that respond to udev or 
>> mdevctl events)...
>>
>>> +    /* check on def for attributes and try update */
>>> +    if (def->caps->data.mdev.nattributes > 0) {
>>> +        /* ignore failures as mdevctl updates will recover later */
>>> +        if (!(obj = nodeDeviceObjFindByName(device->name)))
>>> +            return device;
>>
>> If the device was found and the device has mdev attributes, you then 
>> query the exact same list for a device with the name of the device you 
>> just found...
>>
>>> +        new_def = virNodeDeviceObjGetDef(obj);
>>> +        nodeDeviceDefCopyFromMdevctl(new_def, def);
>>
>> If the device was found in the list (which it should be, because the 
>> nodeDeviceFindNewMediatedDevice() function had already found it in the 
>> list of devices), then we simply copy the user-supplied device 
>> definition into the definition we received from the nodedev driver's 
>> device list.
> 
> Please let me know which procedure to use if the node device object 
> which is missing the attributes is created by the udev ADD event which 
> the method nodeDeviceFindNewMediatedDevice is waiting for to happen.
> 
>>
>> This is just papering over the issue. By copying the user-supplied 
>> definition into the system-queried definition, we're just assuming 
>> that the device was created properly with the requested attributes 
>> rather than verifying that it was created correctly.
> 
> I expect virMdevctlCreate to fail if attributes are not properly set on 
> the transient mdev created by mdevctl. If that happens this code is 
> never run anyway.

As far as I know, that might be driver-dependent behavior.

But regardless, you could sort of make the same case for all of the 
other non-attribute information as well. Why don't we just copy the 
entire user-supplied definition into our internal device list instead of 
querying udev/mdevctl if the device creation was successful? I would 
argue that we want our internal device list to accurately represent the 
current state of the system, not what we expect the state of the system 
to be.

> And speaking about verification:
> mdevctl only returns attributes if a callout script is present 
> supporting the mdev type and has a method for event "get" and action 
> "attributes" implemented.
> Therefore no "reliable" verification is possible! The only reliable 
> source of information is the outcome of virMdevctlCreate.

That's true, but if you copy the user-supplied attributes directly into 
the internal device list for a transient device that doesn't implement 
the 'get-attributes' callout, you will lose those attributes when the 
libvirt daemon restarts. That's also problematic.


>> The root issue seems to be that when a transient mediated device is 
>> created, our udev event handler runs and adds the device to the device 
>> list, but the mdevctl handler is never run to query additional 
>> mdev-related information about the device. mdevctl is only re-queried 
>> when the config files for defined devices are changed or when a new 
>> physical mdev parent device is added or removed.
> 
> Actually that was the first place I looked at.
> There is already an mdevctl update triggered in method udevAddOneDeive 
> if the device is an mdev parent device. This happens usually on either a 
> vfio* module load or a device being bound to a vfio* device driver.
> Introducing an update on every mdev device added would create an high 
> impact especially as the udevAddOneDeive is used in multiple contexts,
> 1) add and change udev event
> 2) move usev event
> 3) processing the each entry in a udev process device list, which is 
> usually run when the node device daemon initializes. In this case a 
> device unrelated/unconditional mdevctl update is triggered once already 
> (nodeStateInitializedEnumerate).
> 
> In cases 1) and 2) the context of what triggered the add udev event is 
> unknown and we are only missing the update if a transient mdev device 
> with attributes is being created. Therefore I choose the "shortcut" of 
> added these where the context is known. Please also note that the 
> callout script restriction (mentioned above) also applies here.

Can you quantify what you mean by high-impact here? How many mdevs are 
you talking about on a single host? hundreds? thousands? How regularly 
are they being started and stopped?

Jonathon



More information about the libvir-list mailing list