[libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

Fri Sep 2 05:21:45 UTC 2016

On 9/2/2016 10:18 AM, Michal Privoznik wrote:
> On 01.09.2016 18:59, Alex Williamson wrote:
>> On Thu, 1 Sep 2016 18:47:06 +0200
>> Michal Privoznik <mprivozn at redhat.com> wrote:
>>
>>> On 31.08.2016 08:12, Tian, Kevin wrote:
>>>>> From: Alex Williamson [mailto:alex.williamson at redhat.com]
>>>>> Sent: Wednesday, August 31, 2016 12:17 AM
>>>>>
>>>>> Hi folks,
>>>>>
>>>>> At KVM Forum we had a BoF session primarily around the mediated device
>>>>> sysfs interface.  I'd like to share what I think we agreed on and the
>>>>> "problem areas" that still need some work so we can get the thoughts
>>>>> and ideas from those who weren't able to attend.
>>>>>
>>>>> DanPB expressed some concern about the mdev_supported_types sysfs
>>>>> interface, which exposes a flat csv file with fields like "type",
>>>>> "number of instance", "vendor string", and then a bunch of type
>>>>> specific fields like "framebuffer size", "resolution", "frame rate
>>>>> limit", etc.  This is not entirely machine parsing friendly and sort of
>>>>> abuses the sysfs concept of one value per file.  Example output taken
>>>>> from Neo's libvirt RFC:
>>>>>
>>>>> cat /sys/bus/pci/devices/0000:86:00.0/mdev_supported_types
>>>>> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, framebuffer,
>>>>> max_resolution
>>>>> 11      ,"GRID M60-0B",      16,       2,      45,     512M,    2560x1600
>>>>> 12      ,"GRID M60-0Q",      16,       2,      60,     512M,    2560x1600
>>>>> 13      ,"GRID M60-1B",       8,       2,      45,    1024M,    2560x1600
>>>>> 14      ,"GRID M60-1Q",       8,       2,      60,    1024M,    2560x1600
>>>>> 15      ,"GRID M60-2B",       4,       2,      45,    2048M,    2560x1600
>>>>> 16      ,"GRID M60-2Q",       4,       4,      60,    2048M,    2560x1600
>>>>> 17      ,"GRID M60-4Q",       2,       4,      60,    4096M,    3840x2160
>>>>> 18      ,"GRID M60-8Q",       1,       4,      60,    8192M,    3840x2160
>>>>>
>>>>> The create/destroy then looks like this:
>>>>>
>>>>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>>>> 	/sys/bus/pci/devices/.../mdev_create
>>>>>
>>>>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>>>> 	/sys/bus/pci/devices/.../mdev_destroy
>>>>>
>>>>> "vendor_specific_argument_list" is nebulous.
>>>>>
>>>>> So the idea to fix this is to explode this into a directory structure,
>>>>> something like:
>>>>>
>>>>> ├── mdev_destroy
>>>>> └── mdev_supported_types
>>>>>     ├── 11
>>>>>     │   ├── create
>>>>>     │   ├── description
>>>>>     │   └── max_instances
>>>>>     ├── 12
>>>>>     │   ├── create
>>>>>     │   ├── description
>>>>>     │   └── max_instances
>>>>>     └── 13
>>>>>         ├── create
>>>>>         ├── description
>>>>>         └── max_instances
>>>>>
>>>>> Note that I'm only exposing the minimal attributes here for simplicity,
>>>>> the other attributes would be included in separate files and we would
>>>>> require vendors to create standard attributes for common device classes.  
>>>>
>>>> I like this idea. All standard attributes are reflected into this hierarchy.
>>>> In the meantime, can we still allow optional vendor string in create 
>>>> interface? libvirt doesn't need to know the meaning, but allows upper
>>>> layer to do some vendor specific tweak if necessary.  
>>>
>>> This is not the best idea IMO. Libvirt is there to shadow differences
>>> between hypervisors. While doing that, we often hide differences between
>>> various types of HW too. Therefore in order to provide good abstraction
>>> we should make vendor specific string as small as possible (ideally an
>>> empty string). I mean I see it as bad idea to expose "vgpu_type_id" from
>>> example above in domain XML. What I think the better idea is if we let
>>> users chose resolution and frame buffer size, e.g.: <video
>>> resolution="1024x768" framebuffer="16"/> (just the first idea that came
>>> to my mind while writing this e-mail). The point is, XML part is
>>> completely free of any vendor-specific knobs.
>>
>> That's not really what you want though, a user actually cares whether
>> they get an Intel of NVIDIA vGPU, we can't specify it as just a
>> resolution and framebuffer size.  The user also doesn't want the model
>> changing each time the VM is started, so not only do you *need* to know
>> the vendor, you need to know the vendor model.  This is the only way to
>> provide a consistent VM.  So as we discussed at the BoF, the libvirt
>> xml will likely reference the vendor string, which will be a unique
>> identifier that encompasses all the additional attributes we expose.
>> Really the goal of the attributes is simply so you don't need a per
>> vendor magic decoder ring to figure out the basic features of a given
>> vendor string.  Thanks,
> 
> Okay, maybe I'm misunderstanding something. I just thought that users
> will consult libvirt's nodedev driver (e.g. virsh nodedev-list && virsh
> nodedev-dumpxml $id) to fetch vGPU capabilities and then use that info
> to construct domain XML.

I'm not familiar with libvirt code, curious how libvirt's nodedev driver
enumerates devices in the system?

> Also, I guess libvirt will need some sort of understanding of vGPUs in
> sense that if there are two vGPUs in the system 

I think you meant two physical GPUs in the system, right?

> (say both INTEL and
> NVIDIA) libvirt must create mdev on the right one. I guess we can't rely
> solely on vgpu_type_id uniqueness here, can we.
> 

When two GPUs are present in the system, both INTEL and NVIDIA, these
devices have unique domain:bus:device:function. 'mdev_create' sysfs file
for mdev would be present for each device in their device directory (as
per v7 version patch below is the path of 'mdev_create')
    /sys/bus/pci/devices/<domain:bus:device:function>/mdev_create

So libvirt need to know on which physical device mdev device need to be
created.

Thanks,
Kirti

> Michal
>