[libvirt] libvirt support for mediated devices

Erik Skultety eskultet at redhat.com
Mon Jan 9 09:17:10 UTC 2017


> Going back to the beginning, with slightly more detail:
> 
> 1) "Unmanaged" mediated device assignment - assigning an existing device to
> a virtual machine
> 
> This will assume that the desired child device has already been created, and
> can be found in /sys/bus/mdev/devices/$UUID. Here's a first attempt at what
> the XML would look like:
> 
>     <hostdev mode='subsystem' type='pci' /managed='no'>/
>         <source>  <!-- (maybe add "type='mdev'" ???) -->
>             <mdev uuid='$uuid'/>
>         </source>
>         <address type='pci' blah blah blah/> <!-- PCI address in the guest
> -->
>      </hostdev>
> 
> In the past, the "type" attribute of hostdev described the type on both the
> host and the guest. With mediated devices, the device isn't visible on the
> host as a PCI device, but just as a software device. So the type attribute
> in <hostdev> now gives the type of the device on the guest, and the device
> type on the host is determined from the contents of <source>.
> 
> Erik had a different suggestion for this (which I think he's already working
> on patches for) - that the type attribute in <hostdev> should be the type of
> the device in the *host*, and the type in the guest would be that given in
> the <address>. Something like this I think:
> 
>     <hostdev mode='subsystem' type='mdev' /managed='no'/>
>         <source>
>             <mdev uuid='$uuid'/>
>         </source>
>         <address type='pci' blah blah blah/>
>      </hostdev>
> 
> (Is this correct, Erik?)
> 

Yes, that's the way I decided to go prior you sending the mail. My reasoning
when looking at the code was that it potentially could lead to a cleaner code,
since there's a quite complex logic going on in PCI-related methods the vast
majority of which is unrelated to MDEV (basically the most interesting common
parts are checking whether there's VFIO driver available on the host and PCI
address assignment for the guest) which would lead to constant special casing
of MDEV and calling the appropriate mdev methods. So to sum it up code
cleanliness had the major impact to my reasoning to go with a new hostdev type
'mdev' rather than reuse the existing one. I must admit that I haven't realized
the part with the 'managed' attribute until I read your paragraph below. 

To the matter of guest device type being determined by the address field, that
was my initial idea. If the address field was missing (very likely) we would
have to guess the address type from the os architecture which I think we would
need to do anyway for the managed devices. The other idea I've got is similar
to specifying the driver element for various PCI backends for the assignment.
We could either reuse it and add some attributes (I think this wouldn't be the
preferred one) or introduce a new one that would specify the device api to be
used with the assignment (the value of which would correspond to what you can
find in /sys/class/mdev_bus/<vendor>/mdev_supported_types/<type>/device_api).

Erik

> (I arrived at my suggestion by the thinking that, in other places where
> there are similar attributes for the host and guest side, e.g. the IP
> addresses and routes that can be added on both the host and guest side of an
> <interface>, everything related to the host side is in the <source>
> subelement, while things related to the guest are directly under the
> toplevel of the device element. On the other hand, the "managed" attribute
> isn't something related to the guest, but to the host, and his idea has less
> redundancy, so maybe he's onto something...)
> 
> (NB: a mediated device could be exposed to the guest as a PCI device, a CCW
> device, or anything else supported by vfio. The type of device that the
> guest will see can be determined from the contents of
> mdev_supported_types/<type-id>/device_api under the parent device's
> directory in sysfs (it will be, e.g., "vfio-pci" or "vfio-ccw"). But libvirt
> assigns guest-side addresses at the time a domain is defined, and it's
> possible that the mdev child device won't be created yet at define time (and
> therefore we won't know which parent device it's associated with, and so we
> won't be able to look at device_api). In such situations, it will be up to
> management to know something about the device it will be creating and assume
> a type. Fortunately this is a reasonably safe thing to do - on x86 platforms
> we can be fairly certain that the device will be a PCI device. (And, because
> this also makes a difference for some machinetypes, that it will be a PCI
> Express device). We will want to check device_api at runtime though, to
> validate that the guest-side device really is a PCI device.
> 
> ==
> 
> 2) Reporting parent and child mediated devices and their capabilities in the
> node device API.
> 
> There are 3 stages to this:
> 
> a) add mediated child devices to the list of devices provided by "virsh
> nodedev-list". These will be called "mdev_$UUID", and will show up as
> descendents of their respective parent devices in "virsh nodedev-list
> --tree". The list of all these devices can easily be retrieved by
> enumerating the links in /sys/bus/mdev/devices/$UUID.
> 
> b) report the capabilities of parent devices in their dumpxml output. This
> will included supported child device types and a list of current children.
> 
> I don't have any experience with nodedev reporting for SCSI devices, but
> recently noticed that nodedev-list can report lists of devices with certain
> capabilities, e.g. "virsh nodedev-list --cap=scsi_host". Based on this, I
> guess it would be useful for the parent devices to show something like this
> (using the sample mtty driver as an example):
> 
>      <device>
>         <name>pci_0000_02_00_0</name>
>         <parent>pci_0000_00_04_0</parent>
>         <driver>
>           <name>mtty</name>
>         </driver>
>        <capability type='mdev_parent'>
>           [list of supported types, each with number allowed]
>           [list of current child devices (just giving uuid or device name
> ("mdev_$uuid"?)]
>           [other info about parent/children?]
>        </capability>
>        ...
> 
> Likewise, a nodedev-dumpxml of a child device should contain a pointer to
> the parent device.
> 
> c) respond to dumpxml requests for mediated child devices. This should
> include at least the uuid/type of the child device, and a link back to the
> parent device (and I suppose somehow include <capability type='mdev_child'>
> so that it can be filtered with virsh modedev-list?)
> 
> ==
> 
> (3), (4), and (5) need more thought that I haven't gotten to yet. TBD (if
> anyone else has thoughts on those, please share!)
> 
> 

> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list




More information about the libvir-list mailing list