[libvirt] [PATCH 3/4] virsh-domain: update attach-interface to support type=hostdev

Pavel Hrdina phrdina at redhat.com
Mon Oct 26 13:38:44 UTC 2015

On Mon, Oct 26, 2015 at 04:05:28AM -0400, Laine Stump wrote:
> On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
> > Adding this feature will allow users to easily attach a hostdev network
> > interface using PCI passthrough.
> >
> > The interface can be attached using --type=hostdev and PCI address or
> > network device name as --source.  This command also allows you to tell,
> > whether the interface should be managed and to choose a assignment
> > driver.
> >
> > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=997561
> I think the PCI address version of this command is fairly 
> straightforward, so it can and should go in. But I've been thinking more 
> about the "netdev name" version since our exchange in the BZ, and even 
> had a very long treatise prepared defending my position from there, but 
> then I decided to look at it all again from the beginning and came to 
> the conclusion that we are *both* wrong :-)
> The main aim here is convenience, and while I still have the position 
> that if we're going to make it more convenient, we should make it more 
> convenient at the XML level so that all consumers of libvirt can take 
> advantage without needing a ton of extra code, I also have realized that 
> specifying the netdev name is not very convenient anyway.
> Why?
> * Keep in mind that <interface type='hostdev'> will only work with SRIOV 
> VFs (because you can't set the MAC address of a normal netdev from the 
> host and have that MAC address persist through the guest driver's device 
> init).
> * So any device that you assign in this way is a VF.
> * A user will know which PF ("Physical Function", but from their point 
> of view it's "the physical port used for the connection") they want a VF 
> from; they don't care which VF (they're all functionally equivalent), 
> and anyway the standard utilities don't even tell you the netdev names 
> of the VFs associated with a particular PF. So it's not that simple to 
> learn the netdev name of the VF you want to use:
>    * virsh nodedev-list --tree doesn't group the VFs under their PF
>      (because its hierarchy is according to the PCI bus layout, which has
>      all the PFs and VFs at the same level)
>    * virsh nodedev-dumpxml for the PF device shows the VFs'
>      PCI addresses, NOT their netdev names.
>     <capability type='virt_functions'>
>        <address domain='0x0000' bus='0x02' slot='0x10' function='0x0'/>
>        <address domain='0x0000' bus='0x02' slot='0x10' function='0x2'/>
>        <address domain='0x0000' bus='0x02' slot='0x10' function='0x4'/>
>        <address domain='0x0000' bus='0x02' slot='0x10' function='0x6'/>
>        <address domain='0x0000' bus='0x02' slot='0x11' function='0x0'/>
>        <address domain='0x0000' bus='0x02' slot='0x11' function='0x2'/>
>        <address domain='0x0000' bus='0x02' slot='0x11' function='0x4'/>
> (Note the leading "0x" on these values :-P)
>    * "ip link show" lists the VFs directly under their PF,
>      but shows the VF#, not the netdev name
>      11: p4p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
>                state UP mode DEFAULT group default qlen 1000
>                link/ether 90:e2:ba:02:22:01 brd ff:ff:ff:ff:ff:ff
>         vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
>         vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
>         vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
>         vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
>         vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
>         vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
>         vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
> So if there's not a simple way to get a list of the netdev names of the 
> VFs for a PF, then what convenience is gained by allowing use of netdev 
> name? What exactly was I thinking when I wrote comment 1 in that BZ???
> What *could* be useful would be the ability to specify the PF name and 
> VF# of the device you want to assign, e.g.:   "<source pf='p4p2' 
> vf='3'/>", since that's info you can easily get from "ip link show". 
> Anymore, though, I don't even know how useful *that* is, since you can 
> already get the PCI addresses of the VFs from the output of virsh 
> nodedev-dumpxml (in almost exactly the format you need - just add 
> "type='pci'".
> (I'm now even wondering if I misunderstood what the original reporter 
> was asking for, but unfortunately it's not possible to ask him, because 
> his account has been closed :-( Thinking back to what he said - there is 
> a *different* place where it's common to know the netdev name and want 
> to translate it into a PCI address - when you are assigning a 
> *non-SRIOV* ethernet device using plain <hostdev> (not <interface 
> type='hostdev'>, which only works for SRIOV VFs). In this case you 
> probably know the netdev name but not the PCI address. So for this case 
> a translation from netdev name to PCI address would make sense, and here 
> I would agree that the translation should happen in virsh rather than 
> the generic <hostdev> XML understanding what a netdev name is (contrary 
> to <interface type='hostdev'>, where it is well established that the 
> device you're assigning is a network device, and there are already 
> "netdev-y" config attributes, up until now <hostdev> has not contained 
> any config items specific to a particular type of PCI device, and I 
> don't think it should have any added)).
> TL;DR of my opinion:
> 1) this patch minus the netdev-->PCI address translation is good

I don't mind if the patch goes in with or without the translation.  The main
purpose of this extension is to make it easier for users so they don't have to
use the generic 'attach-device' with XML definition but only specify some
arguments.  But see the comment below for the 2).

> 2) we don't need the netdev-->PCI translation for <interface 
> type='hostdev'>;
>     it's just extra code for (in my newly revised opinion) no gain.

Actually it's not entirely true, the netdev names of VFs are listed by
'ip link show' command, but it's not clear to guess the relation to PFs:

3: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT qlen 1000
    link/ether b4:b5:2f:66:3c:4f brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 22:67:e2:b0:28:b0, spoof checking on, link-state auto
    vf 1 MAC 0a:d3:63:e6:06:45, spoof checking on, link-state auto
    vf 2 MAC 52:54:00:32:93:5c, spoof checking on, link-state auto
    vf 3 MAC 42:9a:79:f6:9d:cb, spoof checking on, link-state auto

14: enp5s16f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT qlen 1000
    link/ether 22:67:e2:b0:28:b0 brd ff:ff:ff:ff:ff:ff

One could benefit using the netdev name also for VF.

> 3) a netdev-->PCI translation in virsh that creates a plain
>     <hostdev> might be useful (hmm, maybe keep (2) as you have it,
>     with an additional check for

Yes, this might be a useful extension of 'attach-interface --type=hostdev', that
we would ether detect, whether the interface is a VF or we would use <hostdev>
as default, unless a mac address is specified or we can combine both approaches.

In this case, the command will remain the same, but it will be able to attach
VF and non-VF interfaces to the guest.  This would probably require to extend
a detach-interface to be able to detach also a network interface attached as

> 4) adding <source pf='ethX' vf='n'/> support to <interface type='hostdev'>
>     might be useful.

I don't like to extend our XMLs to introduce another way how to represent some
source device if we already have one.  And I don't see the value of the PF and
VF in XML.  

> How much sense does any of that make?
> >
> > Signed-off-by: Pavel Hrdina <phrdina at redhat.com>
> > ---
> >   tools/virsh-domain.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++---
> >   1 file changed, 86 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
> > index e8503ec..b124441 100644
> > --- a/tools/virsh-domain.c
> > +++ b/tools/virsh-domain.c
> > @@ -56,6 +56,7 @@
> >   #include "virtime.h"
> >   #include "virtypedparam.h"
> >   #include "virxml.h"
> > +#include "virsh-nodedev.h"
> >   
> >   /* Gnulib doesn't guarantee SA_SIGINFO support.  */
> >   #ifndef SA_SIGINFO
> > @@ -866,6 +867,14 @@ static const vshCmdOptDef opts_attach_interface[] = {
> >        .type = VSH_OT_BOOL,
> >        .help = N_("print XML document rather than attach the interface")
> >       },
> > +    {.name = "managed",
> > +     .type = VSH_OT_BOOL,
> > +     .help = N_("set the interface to be managed by libvirt")
> Maybe a more descriptive help - "have libvirt automatically manage 
> detach/attach of device from host driver" (or something like that; I 
> know that's too long).

That makes sense, I'll try to come up with something better.

> > +    },
> > +    {.name = "driver",
> > +     .type = VSH_OT_STRING,
> > +     .help = N_("set driver for hostdev interface, default is 'kvm'")
> Actually the default behavior depends on what is available on the host - 
> if only one of legacy-kvm and vfio is available, then that is the one 
> used, but if both are available, then vfio is used (so vfio is closer to 
> being "default" than kvm). The kvm method of assigning devices is 
> deprecated, and it has been completely disabled in some distros 
> (definitely RHEL7 and CentOS7, not sure about Fedora). These days 
> manually specifying which driver to use is mostly pointless; I vaguely 
> recall that it was initially added because when VFIO was first added 
> some were nervous about not having a simple way to fallback to old 
> behavior if something went wrong while using VFIO; those days are long 
> gone though, and I can't think of a situation where anyone would 
> want/need to specify which driver to use (i.e. I'm not sure we even need 
> to expose this option in virsh; it may confuse more than help).

In that case, the documentation for <interface type='hostdev'> is not correct,
because there is a statement '(or simply omit the <driver> element, since "kvm"
is currently the default)'.  Looking at the <hostdev> section in our
documentation, this is already updated and there is a correct description, when
is which driver used as default.  In that case, I'll remove it from virsh.


More information about the libvir-list mailing list