[libvirt] [PATCH 3/4] virsh-domain: update attach-interface to support type=hostdev

Laine Stump laine at laine.org
Mon Oct 26 08:05:28 UTC 2015


On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
> Adding this feature will allow users to easily attach a hostdev network
> interface using PCI passthrough.
>
> The interface can be attached using --type=hostdev and PCI address or
> network device name as --source.  This command also allows you to tell,
> whether the interface should be managed and to choose a assignment
> driver.
>
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=997561

I think the PCI address version of this command is fairly 
straightforward, so it can and should go in. But I've been thinking more 
about the "netdev name" version since our exchange in the BZ, and even 
had a very long treatise prepared defending my position from there, but 
then I decided to look at it all again from the beginning and came to 
the conclusion that we are *both* wrong :-)

The main aim here is convenience, and while I still have the position 
that if we're going to make it more convenient, we should make it more 
convenient at the XML level so that all consumers of libvirt can take 
advantage without needing a ton of extra code, I also have realized that 
specifying the netdev name is not very convenient anyway.

Why?

* Keep in mind that <interface type='hostdev'> will only work with SRIOV 
VFs (because you can't set the MAC address of a normal netdev from the 
host and have that MAC address persist through the guest driver's device 
init).

* So any device that you assign in this way is a VF.

* A user will know which PF ("Physical Function", but from their point 
of view it's "the physical port used for the connection") they want a VF 
from; they don't care which VF (they're all functionally equivalent), 
and anyway the standard utilities don't even tell you the netdev names 
of the VFs associated with a particular PF. So it's not that simple to 
learn the netdev name of the VF you want to use:

   * virsh nodedev-list --tree doesn't group the VFs under their PF
     (because its hierarchy is according to the PCI bus layout, which has
     all the PFs and VFs at the same level)

   * virsh nodedev-dumpxml for the PF device shows the VFs'
     PCI addresses, NOT their netdev names.

    <capability type='virt_functions'>
       <address domain='0x0000' bus='0x02' slot='0x10' function='0x0'/>
       <address domain='0x0000' bus='0x02' slot='0x10' function='0x2'/>
       <address domain='0x0000' bus='0x02' slot='0x10' function='0x4'/>
       <address domain='0x0000' bus='0x02' slot='0x10' function='0x6'/>
       <address domain='0x0000' bus='0x02' slot='0x11' function='0x0'/>
       <address domain='0x0000' bus='0x02' slot='0x11' function='0x2'/>
       <address domain='0x0000' bus='0x02' slot='0x11' function='0x4'/>

(Note the leading "0x" on these values :-P)

   * "ip link show" lists the VFs directly under their PF,
     but shows the VF#, not the netdev name

     11: p4p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
               state UP mode DEFAULT group default qlen 1000
               link/ether 90:e2:ba:02:22:01 brd ff:ff:ff:ff:ff:ff
        vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
        vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
        vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
        vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
        vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
        vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
        vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto

So if there's not a simple way to get a list of the netdev names of the 
VFs for a PF, then what convenience is gained by allowing use of netdev 
name? What exactly was I thinking when I wrote comment 1 in that BZ???

What *could* be useful would be the ability to specify the PF name and 
VF# of the device you want to assign, e.g.:   "<source pf='p4p2' 
vf='3'/>", since that's info you can easily get from "ip link show". 
Anymore, though, I don't even know how useful *that* is, since you can 
already get the PCI addresses of the VFs from the output of virsh 
nodedev-dumpxml (in almost exactly the format you need - just add 
"type='pci'".

(I'm now even wondering if I misunderstood what the original reporter 
was asking for, but unfortunately it's not possible to ask him, because 
his account has been closed :-( Thinking back to what he said - there is 
a *different* place where it's common to know the netdev name and want 
to translate it into a PCI address - when you are assigning a 
*non-SRIOV* ethernet device using plain <hostdev> (not <interface 
type='hostdev'>, which only works for SRIOV VFs). In this case you 
probably know the netdev name but not the PCI address. So for this case 
a translation from netdev name to PCI address would make sense, and here 
I would agree that the translation should happen in virsh rather than 
the generic <hostdev> XML understanding what a netdev name is (contrary 
to <interface type='hostdev'>, where it is well established that the 
device you're assigning is a network device, and there are already 
"netdev-y" config attributes, up until now <hostdev> has not contained 
any config items specific to a particular type of PCI device, and I 
don't think it should have any added)).

TL;DR of my opinion:

1) this patch minus the netdev-->PCI address translation is good

2) we don't need the netdev-->PCI translation for <interface 
type='hostdev'>;
    it's just extra code for (in my newly revised opinion) no gain.

3) a netdev-->PCI translation in virsh that creates a plain
    <hostdev> might be useful (hmm, maybe keep (2) as you have it,
    with an additional check for

4) adding <source pf='ethX' vf='n'/> support to <interface type='hostdev'>
    might be useful.

How much sense does any of that make?


>
> Signed-off-by: Pavel Hrdina <phrdina at redhat.com>
> ---
>   tools/virsh-domain.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 86 insertions(+), 4 deletions(-)
>
> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
> index e8503ec..b124441 100644
> --- a/tools/virsh-domain.c
> +++ b/tools/virsh-domain.c
> @@ -56,6 +56,7 @@
>   #include "virtime.h"
>   #include "virtypedparam.h"
>   #include "virxml.h"
> +#include "virsh-nodedev.h"
>   
>   /* Gnulib doesn't guarantee SA_SIGINFO support.  */
>   #ifndef SA_SIGINFO
> @@ -866,6 +867,14 @@ static const vshCmdOptDef opts_attach_interface[] = {
>        .type = VSH_OT_BOOL,
>        .help = N_("print XML document rather than attach the interface")
>       },
> +    {.name = "managed",
> +     .type = VSH_OT_BOOL,
> +     .help = N_("set the interface to be managed by libvirt")

Maybe a more descriptive help - "have libvirt automatically manage 
detach/attach of device from host driver" (or something like that; I 
know that's too long).

> +    },
> +    {.name = "driver",
> +     .type = VSH_OT_STRING,
> +     .help = N_("set driver for hostdev interface, default is 'kvm'")

Actually the default behavior depends on what is available on the host - 
if only one of legacy-kvm and vfio is available, then that is the one 
used, but if both are available, then vfio is used (so vfio is closer to 
being "default" than kvm). The kvm method of assigning devices is 
deprecated, and it has been completely disabled in some distros 
(definitely RHEL7 and CentOS7, not sure about Fedora). These days 
manually specifying which driver to use is mostly pointless; I vaguely 
recall that it was initially added because when VFIO was first added 
some were nervous about not having a simple way to fallback to old 
behavior if something went wrong while using VFIO; those days are long 
gone though, and I can't think of a situation where anyone would 
want/need to specify which driver to use (i.e. I'm not sure we even need 
to expose this option in virsh; it may confuse more than help).

> +    },
>       {.name = NULL}
>   };
>   
> @@ -919,7 +928,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
>       virDomainPtr dom = NULL;
>       const char *mac = NULL, *target = NULL, *script = NULL,
>                  *type = NULL, *source = NULL, *model = NULL,
> -               *inboundStr = NULL, *outboundStr = NULL;
> +               *inboundStr = NULL, *outboundStr = NULL, *driver = NULL;
>       virNetDevBandwidthRate inbound, outbound;
>       virDomainNetType typ;
>       int ret;
> @@ -931,6 +940,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
>       bool config = vshCommandOptBool(cmd, "config");
>       bool live = vshCommandOptBool(cmd, "live");
>       bool persistent = vshCommandOptBool(cmd, "persistent");
> +    bool managed = vshCommandOptBool(cmd, "managed");
>   
>       VSH_EXCLUSIVE_OPTIONS_VAR(persistent, current);
>   
> @@ -949,7 +959,8 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
>           vshCommandOptStringReq(ctl, cmd, "script", &script) < 0 ||
>           vshCommandOptStringReq(ctl, cmd, "model", &model) < 0 ||
>           vshCommandOptStringReq(ctl, cmd, "inbound", &inboundStr) < 0 ||
> -        vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) < 0)
> +        vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) < 0 ||
> +        vshCommandOptStringReq(ctl, cmd, "driver", &driver) < 0)
>           goto cleanup;
>   
>       /* check interface type */
> @@ -982,8 +993,23 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
>           }
>       }
>   
> +    if (typ != VIR_DOMAIN_NET_TYPE_HOSTDEV) {
> +        if (managed) {
> +            vshError(ctl, _("--managed is usable only with --type=hostdev"));
> +            goto cleanup;
> +        }
> +        if (driver) {
> +            vshError(ctl, _("--driver is usable only with --type=hostdev"));
> +            goto cleanup;
> +        }
> +    }
> +
>       /* Make XML of interface */
> -    virBufferAsprintf(&buf, "<interface type='%s'>\n", type);
> +    virBufferAsprintf(&buf, "<interface type='%s'", type);
> +    if (managed)
> +        virBufferAddLit(&buf, " managed='yes'>\n");
> +    else
> +        virBufferAddLit(&buf, ">\n");
>       virBufferAdjustIndent(&buf, 2);
>   
>       switch (typ) {
> @@ -995,6 +1021,63 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
>       case VIR_DOMAIN_NET_TYPE_DIRECT:
>           virBufferAsprintf(&buf, "<source dev='%s'/>\n", source);
>           break;
> +    case VIR_DOMAIN_NET_TYPE_HOSTDEV:
> +    {
> +        struct PCIAddress pciAddr = {0, 0, 0, 0};
> +
> +        if (str2PCIAddress(source, &pciAddr) < 0) {
> +            const char *caps[] = {"net"};
> +            char *tmpName = NULL;
> +            virshNodeDeviceListPtr list = NULL;
> +            virNodeDevicePtr netDev = NULL;
> +            size_t i;
> +
> +            list = virshNodeDeviceListCollect(ctl, (char **)caps, 1,
> +                                              VIR_CONNECT_LIST_NODE_DEVICES_CAP_NET);
> +
> +            if (!list) {
> +                vshError(ctl, _("cannot list network devices"));
> +                goto cleanup;
> +            }
> +
> +            if (virAsprintf(&tmpName, "net_%s", source) < 0)
> +                goto cleanup;
> +
> +            for (i = 0; i < list->ndevices; i++) {
> +                if (STREQLEN(tmpName, virNodeDeviceGetName(list->devices[i]),
> +                             strlen(tmpName)))
> +                    netDev = list->devices[i];
> +            }
> +            VIR_FREE(tmpName);
> +
> +            if (!netDev) {
> +                vshError(ctl, _("network interface '%s' doesn't exist"),
> +                         source);
> +                goto cleanup;
> +            }
> +
> +            if (str2PCIAddress(virNodeDeviceGetParent(netDev)+4, &pciAddr) < 0) {
> +                virshNodeDeviceListFree(list);
> +                vshError(ctl, _("cannot parse pci address for network "
> +                                "interface '%s'"), source);
> +                goto cleanup;
> +            }
> +
> +            virshNodeDeviceListFree(list);
> +        }
> +        if (driver)
> +            virBufferAsprintf(&buf, "<driver name='%s'/>\n", driver);
> +
> +        virBufferAddLit(&buf, "<source>\n");
> +        virBufferAdjustIndent(&buf, 2);
> +        virBufferAsprintf(&buf, "<address type='pci' domain='0x%.4x'"
> +                          " bus='0x%.2x' slot='0x%.2x' function='0x%.1x'/>\n",
> +                          pciAddr.domain, pciAddr.bus,
> +                          pciAddr.slot, pciAddr.function);
> +        virBufferAdjustIndent(&buf, -2);
> +        virBufferAddLit(&buf, "</source>\n");
> +        break;
> +    }
>   
>       case VIR_DOMAIN_NET_TYPE_USER:
>       case VIR_DOMAIN_NET_TYPE_ETHERNET:
> @@ -1004,7 +1087,6 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
>       case VIR_DOMAIN_NET_TYPE_MCAST:
>       case VIR_DOMAIN_NET_TYPE_UDP:
>       case VIR_DOMAIN_NET_TYPE_INTERNAL:
> -    case VIR_DOMAIN_NET_TYPE_HOSTDEV:
>       case VIR_DOMAIN_NET_TYPE_LAST:
>           vshError(ctl, _("No support for %s in command 'attach-interface'"),
>                    type);




More information about the libvir-list mailing list