[libvirt PATCH] RFC: Add support for vDPA network devices

Jonathon Jongsma jjongsma at redhat.com
Mon Aug 24 21:33:59 UTC 2020


On Thu, 2020-08-20 at 18:56 -0400, Laine Stump wrote:
> On 8/18/20 2:37 PM, Jonathon Jongsma wrote:
> > vDPA network devices allow high-performance networking in a virtual
> > machine by providing a wire-speed data path. These devices require
> > a
> > vendor-specific host driver but the data path follows the virtio
> > specification.
> > 
> > The support for vDPA devices was recently added to qemu. This
> > allows
> > libvirt to support these devices. It requires that the device is
> > configured on the host with the appropriate vendor-specific driver.
> > This will create a chardev on the host at e.g. /dev/vhost-vdpa-0.
> > That
> > chardev path can then be used to define a new interface with
> > type='vdpa'.
> > ---


[snip]


> 
> 
> > +
> > +::
> > +
> > +   ...
> > +   <devices>
> > +     <interface type='vdpa'>
> > +       <source dev='/dev/vhost-vdpa-0'/>
> 
> (The above device is created (I just learned this from you in IRC!)
> by 
> unbinding a VF from its NIC driver on the host, and re-binding it to
> a 
> special VDPA-VF driver.)
> 
> 
> As we were just discussing online, on one hand it could be nice if 
> libvirt could automatically handle rebinding the VF to the vdpa host 
> driver (given the PCI address of the VF), to make it easier to use 
> (because no advance setup would be needed), similar to what's
> already 
> done with hostdev devices (and <interface type='hostdev'>) when 
> managed='yes' (which is the default setting).
> 
> 
> On the other hand, it is exactly that managed='yes' functionality
> that 
> has created more "libvirt-but-not-really-libvirt" bug reports than
> any 
> other aspect of vfio device assignment, because the process of
> unbinding 
> and rebinding drivers is timing-sensitive and causes code that's
> usually 
> run only once at host boot-time to be run hundreds of times thus
> making 
> it more likely to expose infrequently-hit bugs.
> 
> 
> I just bring this up in advance of someone suggesting the addition
> of 
> managed='yes' here to put in my vote for *not* doing it, and instead 
> using that same effort to provide some sort of API in the node-
> device 
> driver for easily creating one or more VDPA devices from VFs, which 
> could be done once at host boot time, and thus avoid the level of 
> "libvirt-not-libvirt" bug reports for VDPA. (and after that maybe
> even 
> an API to allocate a device from that pool to be used by a guest).
> But 
> that's for later.

I'd really love to hear some additional opinions on this topic from
some more of the libvirt "old-timers". I intentionally kept the initial
vdpa support simple by requiring that the device is already setup and
bound to the appropriate driver. But I want to make sure that we can
add additional capabilities later (as Laine suggested) with this same
API/schema.

Anybody else have thoughts on this topic?


> 
> 
> > +     </interface>
> > +   </devices>
> > +   ...
> > +
> >   :anchor:`<a id="elementsTeaming"/>`
> >   
> >   Teaming a virtio/hostdev NIC pair
> > diff --git a/docs/schemas/domaincommon.rng
> > b/docs/schemas/domaincommon.rng
> > index 0d0dcbc5ce..17f74490f4 100644
> > --- a/docs/schemas/domaincommon.rng
> > +++ b/docs/schemas/domaincommon.rng
> > @@ -3108,6 +3108,21 @@
> >               <ref name="interface-options"/>
> >             </interleave>
> >           </group>
> > +
> > +        <group>
> > +          <attribute name="type">
> > +            <value>vdpa</value>
> > +          </attribute>
> > +          <interleave>
> > +            <element name="source">
> > +              <attribute name="dev">
> > +                <ref name="deviceName"/>
> > +              </attribute>
> > +            </element>
> > +            <ref name="interface-options"/>
> > +          </interleave>
> > +        </group>
> > +
> >         </choice>
> >         <optional>
> >           <attribute name="trustGuestRxFilters">
> > 


[snip]

> > diff --git a/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> > b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> > new file mode 100644
> > index 0000000000..8e76ac7794
> > --- /dev/null
> > +++ b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> > @@ -0,0 +1,37 @@
> > +LC_ALL=C \
> > +PATH=/bin \
> > +HOME=/tmp/lib/domain--1-QEMUGuest1 \
> > +USER=test \
> > +LOGNAME=test \
> > +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \
> > +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \
> > +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \
> > +QEMU_AUDIO_DRV=none \
> > +/usr/bin/qemu-system-i386 \
> > +-name guest=QEMUGuest1,debug-threads=on \
> > +-S \
> > +-object secret,id=masterKey0,format=raw,\
> > +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \
> > +-machine pc,accel=tcg,usb=off,dump-guest-core=off \
> > +-cpu qemu64 \
> > +-m 214 \
> > +-overcommit mem-lock=off \
> > +-smp 1,sockets=1,cores=1,threads=1 \
> > +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \
> > +-display none \
> > +-no-user-config \
> > +-nodefaults \
> > +-chardev socket,id=charmonitor,fd=1729,server,nowait \
> > +-mon chardev=charmonitor,id=monitor,mode=control \
> > +-rtc base=utc \
> > +-no-shutdown \
> > +-no-acpi \
> > +-boot strict=on \
> > +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> > +-add-fd set=0,fd=1732 \
> > +-netdev vhost-vdpa,vhostdev=/dev/fdset/0,id=hostnet0 \
> 
> Okay, I'm feeling too lazy to parse through the code above an see
> how 
> you arrived at "vhostdev='/dev/fdset/0'", but that doesn't look
> right. 
> Shouldn't you be ending up with "-netdev vhost-vdpa,fd=NN,..."? The 
> document I have shows that syntax is supported, so there shouldn't
> be 
> any need to do the add-fd stuff in this case.


The initial proposal for vdpa in qemu supported both vhostdev= and fd=
parameters, but the final implementation does not actually support an
fd= parameter here. The recommended way to pass an pre-opened fd to
qemu in this scenario is to use the 'add-fd' and /dev/fdset/ method as
shown above.

As far as I understand from reading qemu code, /dev/fdset/N is not
actually a file that exists in the filesystem. Instead, when you call
'add-fd', qemu adds that fd to an internal mapping that maps from the
fdset specified by set=N to the passed fd. Whenever qemu tries to open
a file, qemu_open() has special handling for filenames that start with
"/dev/fdset/": it looks up the fd associated with that fdset id and
returns that instead of attempting to open the file path. 

So I think the above should be correct.

> 
> 
> I think the next step should be to find some hardware and give this
> a 
> smoke test! :-)

Indeed, I'm working with Cindy Lu (who implemented the qemu vdpa
support) to try to get this tested properly. Unfortunately I've been
unable to get vdpa_sim working and I don't have access to hardware at
the moment.

Thanks for the review!

Jonathon




More information about the libvir-list mailing list