[libvirt] Introduce vGPU mdev framework to libvirt

Martin Polednik mpolednik at redhat.com
Tue Feb 14 15:50:14 UTC 2017


On 07/02/17 12:29 -0700, Alex Williamson wrote:
>On Tue, 7 Feb 2017 17:26:51 +0100
>Erik Skultety <eskultet at redhat.com> wrote:
>
>> On Mon, Feb 06, 2017 at 09:33:14AM -0700, Alex Williamson wrote:
>> > On Mon,  6 Feb 2017 13:19:42 +0100
>> > Erik Skultety <eskultet at redhat.com> wrote:
>> >
>> > > Finally. It's here. This is the initial suggestion on how libvirt might
>> > > interract with the mdev framework, currently only focussing on the non-managed
>> > > devices, i.e. those pre-created by the user, since that will be revisited once
>> > > we all settled on how the XML should look like, given we might not want to use
>> > > the sysfs path directly as an attribute in the domain XML. My proposal on the
>> > > XML is the following:
>> > >
>> > > <hostdev mode='subsystem' type='mdev'>
>> > >     <source>
>> > >         <!-- this is the host's physical device address -->
>> > >         <address domain='0x0000' bus='0x00' slot='0x00' function='0x00'>
>> > >         <uuid>vGPU_UUID<uuid>
>> > >     <source>
>> > >     <!-- target PCI address can be omitted to assign it automatically -->
>> > > </hostdev>
>> > >
>> > > So the mediated device is identified by the physical parent device visible on
>> > > the host and a UUID which allows us to construct the sysfs path by ourselves,
>> > > which we then put on the QEMU's command line.
>> >
>> > Based on your test code, I think you're creating something like this:
>> >
>> > -device vfio-pci,sysfsdev=/sys/class/mdev_bus/0000:00:03.0/53764d0e-85a0-42b4-af5c-2046b460b1dc
>> >
>> > That would explain the need for the parent device address, but that's
>> > an entirely self inflicted requirement.  For a managed="no" scenarios,
>> > we shouldn't need the parent, we can get to the mdev device
>> > via /sys/bus/mdev/devices/53764d0e-85a0-42b4-af5c-2046b460b1dc.  So it
>>
>> True, for managed="no" would this path be a nice optimization.
>>
>> > seems that the UUID should be the only required source element for
>> > managed="no".
>> >
>> > For managed="yes", it seems like the parent device is still an optional
>>
>> The reason I went with the parent address element (and purposely neglecting the
>> sample mtty driver) was that I assumed any modern mdev capable HW would be
>> accessible through the PCI bus on the host. Also I wanted to explicitly hint
>> libvirt as much as possible which parent device a vGPU device instance should
>> be created on in case there are more than one of them, rather then scanning
>> sysfs for a suitable parent which actually supports the given vGPU type.
>>
>> > field.  The most important thing that libvirt needs to know when
>> > creating a mdev device for a VM is the mdev type name.  The parent
>> > device should be an optional field to help higher level management
>> > tools deal with placement of the device for locality or load balancing.
>> > Also, we can't assume that the parent device is a PCI device, the
>> > sample mtty driver already breaks this assumption.
>>
>> Since we need to assume non-PCI devices and we still need to enable management
>> to hint libvirt about the parent to utilize load balancing and stuff, I've come
>> up with the following adjustments/ideas on how to reflect that in the XML:
>> - still use the address element but use it with the 'type' attribute [1] (still
>>   breaks the sample mtty driver though) while making the element truly optional
>>   if I'm going to be outvoted in favor of scanning the directory for a suitable
>>   parent device on our own, rather than requiring the user to provide that
>>
>> - providing either an attribute or a standalone element for the parent device
>>   name, like a string version of the PCI address or whatever form the parent
>>   device comes in (doesn't break the mtty driver but I don't quite like this)
>>
>> - providing a path element/attribute to sysfs pointing to the parent device
>>   which I'm afraid is what Daniel is not in favor of libvirt doing
>>
>> So, this is what I've so far come up with in terms of hinting libvirt about the
>> parent device, do you have any input on this, maybe some more ideas on how we
>> should identify the parent device?
>
>IMO, if we cannot account for the mtty sample driver, we're doing it
>wrong.  I suppose we can leave it unspecified how one selects a parent
>device for the mtty driver, but it should be possible to expand the
>syntax to include it.  So I think that means that when the parent
>address is provided, the parent address type needs to be specified as
>PCI.  So...
>
> <hostdev mode='subsystem' type='mdev'>
>
>This needs to encompass the device API or else the optional VM address
>cannot be resolved.  Perhaps model='vfio-pci' here?  Seems similar to
>how we specify the device type for PCI controllers where we have
>multiple options:
>
> <hostdev mode='subsystem' type='mdev' model='vfio-pci'>
>
>   <source>
>
>For managed='no', I don't see that anything other than the mdev UUID is
>useful.
>
>     <uuid>MDEV_UUID</uuid>
>
>If libvirt gets into the business of creating mdev devices and we call
>that managed='yes', then the mdev type to create is required.  I don't
>know whether there's anything similar we can steal syntax from:
>
>     <type>"nvidia-11"</type>
>
>That's pretty horrible, needs some xml guru love.
>
>We need to provide for specifying a parent, but we can't assume the

>From higher level perspective, I believe it would be "good
enough" for most of the cases to only specify the type. Libvirt will
anyway have to be able to enumerate the devices for listAllDevices
afaik.

My wish would be specifying
<hostdev mode='subsystem' type='mdev'>
    <type>nvidia-11</type>
</hostdev>
unless the user has specific requests or some other decision (mmio
numa placement) takes place.

We would additionally need (allocated instances/max instances of that
type) in listAllDevices to account for the specific assignment
possibility.

I'm not sure what the decision was wrt type naming, can 2 different
cards have similarly named type with different meaning?

Sorry if I've missed some obvious reason why this can't work.

mpolednik

>type and address format of the parent.  If we say the parent is a
>string, then we don't care, libvirt simply matches
>the /sys/clas/mdev_bus/$STRING/ device.  If we want libvirt to
>interpret the parent address (I can't figure what value this adds, but
>aiui raw strings are frowned upon), then the address type would need to
>be specified. So instead of:
>
>     <address domain='0x0000' bus='0x00' slot='0x00' function='0x00'>
>
>Perhaps:
>
>     <address type='pci' domain='0x0000' bus='0x00'...>
>
>If we want to support mtty here, we'd need a different type, but at
>least we have the possibility to define that (though if parent is
>simply a string then even mtty works).
>
>   <source>
>
>The type of the optional <address> here would be based on the "model"
>of the hostdev.
>
> </hostdev>
>
>Clearly I'm less than a novice at defining xml tags, this is just what
>I think needs to be there.
>
>> > Also, grep'ing through the patches, I don't see that the "device_api"
>>
>> Yep, this was also on purpose since as you write below, right now the only
>> functioning mdev devices we have to work with are vfio-pci capable only, so
>> with this RFC I wanted to gather some feedback on whether I'm moving the right
>> direction in the first place. So yeah, I thought this could be added at any point
>> later.
>
>Hmm, I think that might be painting us into a corner for the future.
>
>> [1] http://libvirt.org/formatdomain.html#elementsAddress
>
>I see here that we already do the <address type='pci'...> thing, so
>using that as the parent source element fits.
>
>The VM <address> though is optional, so we can't rely on queues from
>that to tell us what mdev type to expect, which implies we do need
>something like the model='vfio-pci' to determine the VM address format.
>Thanks,
>
>Alex
>
>--
>libvir-list mailing list
>libvir-list at redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list




More information about the libvir-list mailing list