[libvirt] [PATCH 3/3] conf: formatter/parser/RNG/docs for hostdev <driver name='kvm|vfio'/>

Wed Apr 24 03:46:32 UTC 2013

On 04/23/2013 07:34 PM, Eric Blake wrote:
> On 04/23/2013 10:45 AM, Laine Stump wrote:
>> A domain's <interface> or <hostdev>, as well as a <network>'s
>> <forward>, can now have an optional <driver name='kvm|vfio'/>
>> element. As of this patch, there is no functionality behind this new
>> knob - this patch adds support to the domain and network
>> formatter/parser, and to the RNG and documentation.
>>
>> When then backend is added, legacy KVM PCI device assignment will
> s/then/the/
>
>> continue to be used when no driver name is specified (or if <driver
>> name='kvm'/> is specified), but if driver name is 'vfio', the new UEFI
>> Secure Boot compatible VFIO device assignment will be used.
>>
>> Note that the parser doesn't automatically insert the current default
>> value of this setting. This is done on purpose because the two
>> possibilities are functionally equivalent from the guest's point of
>> view, and we want to be able to automatically start using vfio as the
>> default (even for existing domains) at some time in the future. This
>> is similar to what was done with the "vhost" driver option in
>> <interface>.
> Yes, leaving an unspecified element as hypervisor default has worked for
> us in the past, especially when the choice of host driver should not be
> guest-visible.

Doing that has turned out badly, though, in the case where the choice of
driver *does* make a visible difference in the guest (I'm thinking of
<interface> model), which is why I felt the need to explain my decision.

>
>> ---
>>  docs/formatdomain.html.in     | 42 ++++++++++++++++++++++-
>>  docs/formatnetwork.html.in    | 15 ++++++++
>>  docs/schemas/domaincommon.rng | 79 ++++++++++++++++++++++++++++---------------
>>  docs/schemas/network.rng      | 11 ++++++
>>  src/conf/domain_conf.c        | 36 ++++++++++++++++++++
>>  src/conf/domain_conf.h        | 11 ++++++
>>  6 files changed, 166 insertions(+), 28 deletions(-)
>>
>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
>> index 888c005..0e5b00c 100644
>> --- a/docs/formatdomain.html.in
>> +++ b/docs/formatdomain.html.in
>> @@ -2340,7 +2340,22 @@
>>        the device as can be found with the <code>lspci</code> or
>>        with <code>virsh
>>        nodedev-list</code>. <a href="#elementsAddress">See above</a> for
>> -      more details on the address element.
>> +      more details on the address element.</dd>
>> +      <dt><code>driver</code></dt>
>> +      <dd>
>> +        PCI devices can have an optional <code>driver</code>
>> +        subelement that specifies which backend driver to use for PCI
>> +        device assignment. Use the <code>name</code> attribute to
>> +        select either "vfio" (for the new VFIO device assignment
>> +        backend, which is compatible with UEFI SecureBoot) or "kvm"
>> +        (for the legacy device assignment handled directly by the KVM
>> +        kernel module)<span class="since">Since 1.0.5 (QEMU and KVM
>> +        only, requires kernel 3.6 or newer)</span>. Currently, "kvm"
>> +        is the default used by libvirt when not explicitly provided,
>> +        but since the two are functionally equivalent, this default
>> +        could be changed in the future with no impact to domains that
>> +        don't specify anything.
> IIRC, we have talked about managing host devices across all hypervisors,
> since it is a shared host resource. If that's true, then what happens if
> LXC wants to start using a driver?  Or should LXC only use vfio and
> never use kvm?  

Well, nobody but the qemu hypervisor driver can/should ever use "kvm".
In general, the contents of a <driver> subelement don't need to be
identical for all hypervisors, so it's okay if other hypervisor drivers
ignore it, or have their own set of attributes/values.

> Also, didn't you say that xen used yet another driver,
> in which case we should consider listing that in the RNG.

Just as kvm has its own internal device assignment implementation, I
guess so does xen. kvm likes having devices bound to the "pci-stub"
driver before it starts messing with them, and xen likes having them
bound to the "pciback" driver.

Since I don't know what would be the proper name for xen's internal
implementation, and they don't support anything else anyway, I don't
think we need to add anything for it yet. In the future, xen may decide
to support VFIO device assignment too, in which case they could also
recognize the "vfio" value.

>
>
>> +        device assignment performed directly by the kvm kernel module
>> +        (the default is currently "kvm", but is subject to change).
>> +        <span class="since">Since 1.0.5 (QEMU and KVM only, requires kernel 3.6 or newer)</span>
> Long line (several times).
>
>
>> +++ b/docs/schemas/domaincommon.rng
>> @@ -1923,28 +1923,40 @@
>>        </optional>
>>        <optional>
>>          <element name="driver">
>> -          <optional>
>> -            <attribute name="name">
>> -              <choice>
>> -                <value>qemu</value>
>> -                <value>vhost</value>
>> -              </choice>
>> -            </attribute>
>> -          </optional>
>> -          <optional>
>> -            <attribute name="txmode">
>> -              <choice>
>> -                <value>iothread</value>
>> -                <value>timer</value>
>> -              </choice>
>> -            </attribute>
>> -          </optional>
>> -          <optional>
>> -            <ref name="ioeventfd"/>
>> -          </optional>
>> -          <optional>
>> -            <ref name="event_idx"/>
>> -          </optional>
>> +          <choice>
>> +            <group>
>> +              <attribute name="name">
>> +                <choice>
>> +                  <value>kvm</value>
>> +                  <value>vfio</value>
>> +                </choice>
>> +              </attribute>
>> +            </group>
>> +            <group>
>> +              <optional>
>> +                <attribute name="name">
>> +                  <choice>
>> +                    <value>qemu</value>
>> +                    <value>vhost</value>
> So for define='interface-options', we are extending the exiting <driver>

s/exiting/existing/

(I couldn't pass up a chance to do a little "review of the review" :-)

> so that there are now four values for <driver name=.../>, but only the
> two older options allow other attributes.
>
> Can the two ever overlap, or are we safe in that kvm/vfio are for
> hostdev interface passthrough, while qemu/vhost is for emulation, so the
> two types of driver choices never make sense at once?

No, they can never overlap. qemu and vhost are only used if the
interface model is virtio, and kvm/vfio are only used if the type of the
interface is hostdev.

The one problematic situation would be if someone had this:

    <interface type='network'>
      <source network='mynet'/>
      <model name='virtio'/>
      <driver name='qemu'/>
      ...
    </interface>

(so the <interface> is expecting mynet to just be some type of bridge or
macvtap network) but the "mynet" network was actually a pool of
hostdevs. In this case the <driver name='qemu'/> from the <interface>
would be ignored (just as the <model name='virtio'/> is ignored), but
<driver name> could be specified in the network, and would be retrieved
from there.

Anyway, I don't see that ever cropping up in real life - nobody is going
to point their <interface> at a network where they don't know in advance
whether or not it will be a <forward mode='hostdev'> network - if they
did that, they would have no idea what kind of network device the guest
would see.

>
>> +                  </choice>
>> +                </attribute>
>> +              </optional>
>> +              <optional>
>> +                <attribute name="txmode">
>> +                  <choice>
>> +                    <value>iothread</value>
>> +                    <value>timer</value>
>> +                  </choice>
>> +                </attribute>
>> +              </optional>
>> +              <optional>
>> +                <ref name="ioeventfd"/>
>> +              </optional>
>> +              <optional>
>> +                <ref name="event_idx"/>
>> +              </optional>
>> +            </group>
>> +          </choice>
>>            <empty/>
>>          </element>
>>        </optional>
>> @@ -3084,14 +3096,27 @@
>>      <attribute name="type">
>>        <value>pci</value>
>>      </attribute>
>> -    <element name="source">
>> +    <interleave>
>>        <optional>
>> -        <ref name="startupPolicy"/>
>> +        <element name="driver">
>> +          <attribute name="name">
>> +            <choice>
>> +              <value>kvm</value>
>> +              <value>vfio</value>
>> +            </choice>
>> +          </attribute>
>> +          <empty/>
> Whereas for define='hostdevsubsyspci', we are completely adding a new
> subelement, with only two choices for the new attribute.

Right, because there wasn't previously any <driver> element for <hostdev>

>
>> +        </element>
>>        </optional>
>> -      <element name="address">
>> -        <ref name="pciaddress"/>
>> +      <element name="source">
>> +        <optional>
>> +          <ref name="startupPolicy"/>
>> +        </optional>
>> +        <element name="address">
>> +          <ref name="pciaddress"/>
>> +        </element>
>>        </element>
>> -    </element>
>> +    </interleave>
>>    </define>
>>  
>>    <define name="hostdevsubsysusb">
>> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng
>> index 6c3fae2..493edae 100644
>> --- a/docs/schemas/network.rng
>> +++ b/docs/schemas/network.rng
>> @@ -149,6 +149,17 @@
>>                    </attribute>
>>                  </element>
>>                </optional>
>> +              <optional>
>> +                <element name="driver">
>> +                  <attribute name="name">
>> +                    <choice>
>> +                      <value>kvm</value>
>> +                      <value>vfio</value>
>> +                    </choice>
>> +                  </attribute>
>> +                  <empty/>
>> +                </element>
>> +              </optional>
>>              </interleave>
> Looks okay to me.
>
> I'm not sure if we are ready to push this without more patches in the
> series, but at least I think the XML is on the right track.

Yeah, this patch can't go in without some functionality behind it.
Thanks for looking through this though. You did manage to pick up on one
of the things that caused the most thought by me (the qemu/vhost vs.
kvm/vfio thing for <interface>).