[libvirt] [RFC v2] libvirt vGPU QEMU integration

Kirti Wankhede kwankhede at nvidia.com
Wed Sep 21 18:34:14 UTC 2016



On 9/20/2016 10:20 PM, Alex Williamson wrote:
> On Tue, 20 Sep 2016 21:53:16 +0530
> Kirti Wankhede <kwankhede at nvidia.com> wrote:
> 
>> On 9/20/2016 8:13 PM, Alex Williamson wrote:
>>> On Tue, 20 Sep 2016 19:51:58 +0530
>>> Kirti Wankhede <kwankhede at nvidia.com> wrote:
>>>   
>>>> On 9/20/2016 3:06 AM, Alex Williamson wrote:  
>>>>> On Tue, 20 Sep 2016 02:05:52 +0530
>>>>> Kirti Wankhede <kwankhede at nvidia.com> wrote:
>>>>>     
>>>>>> Hi libvirt experts,
>>>>>>
>>>>>>
>>>>>> 'create':
>>>>>>     Write-only file. Mandatory.
>>>>>>     Accepts string to create mediated device.
>>>>>>
>>>>>> 'name':
>>>>>>     Read-Only file. Mandatory.
>>>>>>     Returns string, the name of that type id.
>>>>>>
>>>>>> 'fb_length':
>>>>>>     Read-only file. Mandatory.
>>>>>>     Returns <number>{K,M,G}, size of framebuffer.    
>>>>>
>>>>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
>>>>> just one user of mediated devices.
>>>>>     
>>>>
>>>> As per our discussion in BOF, we would define class and its attributes.
>>>> As I mentioned above these are attributes of "display" class.  
>>>
>>> Just as I didn't know here, how does libvirt know the class of a given
>>> type id?
>>>    
>>
>> Yes, we need some way to identify the class as Daniel mentioned in his
>> suggestion. Add a file 'class' in mdev_supported_types that would return
>> the class.
>>
>>
>>>>>  Can all parameters be changed dynamically?    
>>>>
>>>> Parameters here would be extra parameters which are not listed above in
>>>> mandatory list. If such parameters are required to set, these parameters
>>>> should be set per mdev device before VM is booted.
>>>>  
>>>>>  Do parameter changes apply to existing devices or only future devices?    
>>>>
>>>> No, it should be applied at the time when mdev device is created or
>>>> after mdev device is created but before VM is booted. It will not be
>>>> applicable to existing devices.
>>>>  
>>>>>  This is a poorly specified
>>>>> interface.  I'd prefer this be done via module options on the vendor
>>>>> driver.
>>>>>     
>>>>
>>>> Module option is not suitable here, in that case such parameters would
>>>> be applied to all mdev device created for vendor driver and that is not
>>>> what we want to.  
>>>
>>> Then use attributes on the mdev device itself, this is not an
>>> acceptable interface.
>>>   
>>>>>>
>>>>>> With above example, vGPU device XML would look like:
>>>>>>
>>>>>>    <device>
>>>>>>      <name>my-vgpu</name>
>>>>>>      <parent>pci_0000_86_00_0</parent>
>>>>>>      <capability type='mdev'>
>>>>>>        <type id='11'/>
>>>>>>        <group>1</group>
>>>>>>        <params>'frame_rate_limiter=0'</params>
>>>>>>      </capability>
>>>>>>    </device>
>>>>>>
>>>>>> 'type id' is mandatory.    
>>>>>
>>>>> I was under the impression that the vendor supplied "name" was the one
>>>>> unique identifier.  We're certainly not going to create a registrar to
>>>>> hand out type ids to each vendor, so what makes type id unique?    
>>>>
>>>> Type id is unique, 'name' is the name (string) of device that would be
>>>> displayed in guest OS for that type of mdev device.  
>>>
>>> We were told at the BoF that name was the unique string which would be
>>> consistent and you again indicate below that "GRID-M60-0B" has a fixed
>>> set of attributes, so why do we care what type id is associated with
>>> that name?
>>>    
>>>>>  I have
>>>>> a hard time believing that a given vendor can even allocate unique type
>>>>> ids for their own devices.  Unique type id across vendors is not
>>>>> practical.  So which attribute are we actually using to identify which
>>>>> type of mdev device we need and why would we ever specify additional
>>>>> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
>>>>> M60-0B" has a fixed setup of those attributes?
>>>>>     
>>>>
>>>> Specifying attributes here is not our requirement. Yes we have fixed set
>>>> of attributes for "GRID-M60-0B" and on.
>>>> We are defining the attributed here for "display" class for all other
>>>> vendor of gpu can use.
>>>>
>>>>  
>>>>>> 'group' is optional. It should be a unique number in the system among
>>>>>> all the groups created for mdev devices. Its usage is:
>>>>>>   - not needed if single vGPU device is being assigned to a domain.
>>>>>>   - only need to be set if multiple vGPUs need to be assigned to a
>>>>>> domain and vendor driver have 'requires_group' file in type id directory.
>>>>>>   - if type id directory include 'requires_group' and user tries to
>>>>>> assign multiple vGPUs to a domain without having <group> field in XML,
>>>>>> it will create single vGPU.    
>>>>>
>>>>> We never finished our discussion of how this gets implemented or
>>>>> whether a group applies only to devices from the same vendor, same
>>>>> device, how heterogeneous groups are handled, etc.
>>>>>     
>>>>
>>>> We agreed on above points that I mentioned here and we wish to know what
>>>> libvirt folks think, right?
>>>> Since there were no inputs on that thread I thought I should summarize
>>>> what had been discussed and get libvirt experts thoughts on this.  
>>>
>>> No, that was just an idea that was never full defined.  We had that
>>> idea, the IOMMU group idea, the invalid container idea...  This is
>>> still an open question in the design.
>>>   
>>
>> We do want to close down on the design as soon as possible.
>> I had tried to address open questions in earlier discussion also. Let me
>> address your concerns you have:
>>
>>> whether a group applies only to devices from the same vendor,  
>>
>> Yes.
> 
> So should we call this a vendor_group?  How does libvirt learn the
> vendor?  We can't assume that the parent device will always be a PCI
> device.  

In sysfs, devices, even other than PCI devices keeps link to its owner
driver. We can make use of that to find the owner driver.

> Maybe it's actually a driver_group and libvirt should put any
> mdev devices with the same ->parent->driver into the same group.
> 

Yes.

> If these groups are managed via the vendor driver, we don't really need
> a global namespace of group ids, isn't it per vendor driver?  Maybe the
> group should be a uuid rather than an integer to effectively remove the
> collision/uniqueness problem.
>  

Ok, UUID is also fine. It should be unique.


>>> same device,  
>>
>> Group can be of mdev devices from different supported types and from
>> different physical device but should be from same vendor.
> 
> I think we're just reinforcing that this is a driver-group.  Another
> vendor might require a device-group, where devices must all be attached
> to the same parent device.  "group" is too generic.
> 
>>> how heterogeneous groups are handled,  
>>
>> 'heterogeneous' does that mean device grouped together with different
>> vendors?
> 
> Different vendors or different type-ids.  I think you've clarified both
> of these, but it certainly wasn't specified previously.
>  
>>>>>> 'params' is optional field. User should set this field if extra
>>>>>> parameters need to be set for a particular vGPU device. Libvirt don't
>>>>>> need to parse these params. These are meant for vendor driver.    
>>>>>
>>>>> ie. magic black hole.  nope.
>>>>>     
>>>>>> Libvirt need to follow the sequence to create device:
>>>>>> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
>>>>>> then only proceed else fail.
>>>>>>
>>>>>> * Set extra params if 'params' field exist in device XML and 'params'
>>>>>> file exist in type id directory
>>>>>>
>>>>>>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params
>>>>>>
>>>>>> * Autogenerate UUID
>>>>>> * Create device:
>>>>>>
>>>>>>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
>>>>>>
>>>>>>     where <group> is optional. Group should be unique number among all
>>>>>> the groups created for mdev devices.    
>>>>>
>>>>> Is it an integer, a UUID, a string?  How is the group for an mdev
>>>>> device discovered?  Can it be changed later?
>>>>>     
>>>>
>>>> Group number could be integer, that's where we need input from libvirt
>>>> folks. It should be provided at the time of mdev device creation. It
>>>> can't be changed after that. If user want to change group number
>>>> destroy/remove mdev device and create it again with proper group number.  
>>>
>>> My concern is that a type id seems arbitrary but we're specifying that
>>> it be unique.  We already have something unique, the name.  So why try
>>> to make the type id unique as well?  A vendor can accidentally create
>>> their vendor driver so that a given name means something very
>>> specific.  On the other hand they need to be extremely deliberate to
>>> coordinate that a type id means a unique thing across all their product
>>> lines.
>>>   
>>
>> Let me clarify, type id should be unique in the list of
>> mdev_supported_types. You can't have 2 directories in with same name.
> 
> Of course, but does that mean it's only unique to the machine I'm
> currently running on?  Let's say I have a Tesla P100 on my system and
> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> new card still going to be a "GRID-M60-0B"?  If not then we've based
> our XML on the wrong attribute.  If the new device does not support
> "GRID-M60-0B" then we should generate an error, not simply initialize
> whatever type-id 11 happens to be on this new card.
> 

If there are 2 M60 in the system then you would find '11' type directory
in mdev_supported_types of both M60. If you have P100, '11' type would
not be there in its mdev_supported_types, it will have different types.

For example, if you replace M60 with P100, but XML is not updated. XML
have type '11'. When libvirt would try to create mdev device, libvirt
would have to find 'create' file in sysfs in following directory format:

 --- mdev_supported_types
     |-- 11
     |   |-- create

but now for P100, '11' directory is not there, so libvirt should throw
error on not able to find '11' directory.


> Again, we've defined that name is unique, so why should type-id be
> anything other than arbitrary?  It relates only to the current instance
> on the machine.
>  
>> I think here 'name' is the problem and it is not used in xml file.
>> Lets not make it mandatory. But vendor driver can have it which 3rd
>> party can use to read.
>> Lets remove 'id' from type id in XML if that is the concern. Supported
>> types is going to be defined by vendor driver, so let vendor driver
>> decide what to use for directory name and same should be used in device
>> xml file, it could be '11' or "GRID M60-0B":
>>
>>     <device>
>>       <name>my-vgpu</name>
>>       <parent>pci_0000_86_00_0</parent>
>>       <capability type='mdev'>
>>         <type='11'/>
>>         ...
>>       </capability>
>>     </device>
>>
>>
>>>>>>
>>>>>> 7. Hot-plug
>>>>>>
>>>>>> It is same syntax to create a virtual device for hot-plug.    
>>>>>
>>>>> How do groups work with hotplug?  Can a device be creating into an
>>>>> existing, running group?  Can a device be removed from an existing,
>>>>> running group?  Thanks,
>>>>>     
>>>>
>>>> Group is for vendor driver to decide when to commit resources for a
>>>> group of devices to support multiple devices in a domain. It will be
>>>> upto vendor driver how it manage it.  
>>>
>>> Then it's not sufficiently specified for libvirt to use.  Thanks,
>>>   
>>
>> Can you clarify what if required for libvirt to use?
> 
> Everything we're discussing above.  We can't simply say that group
> management is defined by the vendor driver, that means that libvirt
> needs to know how to uniquely behave for each vendor driver.  We want
> to create a consistent framework where libvirt doesn't care what the
> vendor driver is.  Therefore we need to make group manipulation fully
> specified, regardless of the vendor driver.  Thanks,
>

There is no communication between different vendor drivers. So if
libvirt creates a group with one device from one vendor and one device
from another vendor, both vendor driver would think they have a single
device and accordingly commit resources for that single device from its
own driver. Ideally nothing would fail. But that is not the main aim of
grouping, right?

To make this independent of vendor driver, we had initial proposal of
having same UUID and different instance numbers. But that was also not
acceptable.

Thanks,
Kirti




More information about the libvir-list mailing list