[libvirt] <interface type='direct'>

Moshe Levi moshele at mellanox.com
Sun Sep 4 05:57:17 UTC 2016



From: sendmail [mailto:justsendmailnothingelse at gmail.com] On Behalf Of Laine Stump
Sent: Saturday, September 03, 2016 10:50 PM
To: Libvirt <libvir-list at redhat.com>
Cc: Moshe Levi <moshele at mellanox.com>; Edan David <edand at mellanox.com>
Subject: Re: [libvirt] <interface type='direct'>

On 09/03/2016 11:08 AM, Moshe Levi wrote:





-----Original Message-----

From: sendmail [mailto:justsendmailnothingelse at gmail.com] On Behalf Of

Laine Stump

Sent: Thursday, September 01, 2016 5:59 PM

To: Libvirt <libvir-list at redhat.com><mailto:libvir-list at redhat.com>

Cc: Moshe Levi <moshele at mellanox.com><mailto:moshele at mellanox.com>; Edan David

<edand at mellanox.com><mailto:edand at mellanox.com>

Subject: Re: [libvirt] <interface type='direct'>



On 09/01/2016 04:05 AM, Moshe Levi wrote:

Hi,



In OpenStack we have a port type macvtap.

Mavtap port is just a tap device connected to VF.



In Libvirt the guest xml look like

<interface type='direct'>

   <mac address='fa:16:3e:b1:06:4e'/>

   <source dev='p1p6' mode='passthrough'/>

   <target dev='macvtap1'/>

   <model type='virtio'/>

   <driver name='vhost'/>

   <alias name='net0'/>

   <address type='pci' domain='0x0000' bus='0x00' slot='0x03'

function='0x0'/> </interface>





In the hypervisor we can see that the  mac of the VF which is

fa:16:3e:f3:9b:e8 - is set by OpenStack see [1]

9: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq

master ovs-system state UP mode DEFAULT group default qlen 1000

     link/ether 7c:fe:90:29:24:4e brd ff:ff:ff:ff:ff:ff

     vf 0 MAC 00:00:00:00:00:00, spoof checking off, link-state disable

     vf 1 MAC 00:00:00:00:00:00, spoof checking off, link-state disable

     vf 2 MAC fa:16:3e:f3:9b:e8, vlan 48, spoof checking on, link-state enable

     vf 3 MAC fa:16:3e:f6:02:c8, vlan 48, spoof checking on,

link-state enable

41: ens3f4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq

state UP mode DEFAULT group default qlen 1000

     link/ether fa:16:3e:f6:02:c8 brd ff:ff:ff:ff:ff:ff

42: macvtap0 at ens3f4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu

1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500

     link/ether fa:16:3e:f6:02:c8, brd ff:ff:ff:ff:ff:ff



The netdevice of the VF which is ens3f4 has also the same mac. This

mac is set when using Libvirt 1.2.2 (Ubuntu 14.04), But when we tested

with new Libvirt versions >= 1.2.17 (Fedora 23/Ubuntu 16.04) the mac

netdevice of the VF (ens3f4) is not set.

This change in Libvirt breaks the guest from getting DHCP in OpenStack.

Do you know why the behavior change in newer releases?



The MAC address is now set with a netlink command to set the VFINFO of

the particular VF# of the PF. This change was made in response to a bug

report stating that once the MAC address had been set for a hostdev

assignment of a VF (in which case this method is required), it was no longer

possible to set the MAC address for macvtap passthrough (the VF driver

would complain "MAC has been administratively set", on Intel igbvf at least).

Unfortunately I recently found that when you set the MAC address in this

manner, it doesn't take effect on the actual device

- it's only saved in memory to be applied the *next time* the host driver is

rebound to the VF.

Are saying that the change was to update the MAC of the VF?

So I don’t understand how this effect the issue that  VF netdevice  MAC don't get set

Look at the explanation in commit cb3fe38c and also
https://bugzilla.redhat.com/show_bug.cgi?id=1113474


That commit switched from using a simple ioctl(SIOCSIFHWADDR) to the VF's netdev name, to using a netlink RTM_SETLINK message to the netdev of the *PF* for the given VF.

This was done because the latter is the *only* way you can set the MAC address for a VF that you're going to assign to the guest with vfio device assignment, and once you've set the MAC address that way, future attempts to set the MAC address with ioctl(SIOCSIGHWADDR) result in failure and a kernel log like this:



kernel: igb 0000:0e:00.1: VF 1 attempted to override administratively set MAC address

kernel: Reload the VF driver to resume operations
Looking into the kernel, it appears that once the MAC address for a VF has been set via RTM_SETLINK, the igb driver (and I believe also the ixgbe driver, not sure about others) doesn't allow it to be changed via ioctl until the PF driver is reloaded (which can't realistically be done on an active system)


But recently there was another report of the MAC address not getting set properly for macvtap passthrough mode when the device is an SRIOV VF (I can't find it in bugzilla, so it must have been an email to one of the lists) and when I tried it myself I found they were correct - in the output of "ip link show" the MAC address showed in the list of VFs under the PF is correctly modified, but it's not set properly in the VF's netdev instance - apparently the MAC addresses in the VF list aren't set in the VF's netdev immediately, they're just saved to be set *the next time the VF is re-bound to the VF netdev driver*. I think in the past the interface may have been in promiscuous mode so it didn't matter, but now it isn't? I'm not sure as I haven't had much time to investigate.

Does that make any more sense now?

Yes Thanks ☺


Since I don't see a reasonably efficient way to get this to work, I need to

make a patch to revert to the old behavior, and we'll then just have to tell

people "If you do hostdev device assignment of VFs, then you can't later re-

use the same device for macvtap passthrough mode".



(actually, I *think* an alternative would be to unbind/rebind the host driver

to the VF after setting the VF MAC address, but that seems a bit

disruptive/extreme to  work around a problem that is probably only seen in

QE labs, but not in the real world (realistically, production systems likely use

either hostdev or macvtap, and don't switch back and forth between them).



A question - I notice you have the vlan set for the VF. Does *that* properly

take effect? (it's set in the same manner as the MAC address, via a netlink

command to set the VFINFO)

I am not sure what you mean, but we set the vlan  in OpenStack after we create the guest xml.

What command do you use to set it? Do you use "ip link set $PF vf $VF# vlan $VLANID" ? I think that's what it's showing here:
Yes we use ip link set $PF vlan $VLANID to set the vlan.
So in the guest xml we don’t put the vlan id for <interface type='direct'> only for interface type='hostdev' are you saying that is should be supported
In both? Should I open a bug for this as well?

https://review.openstack.org/#/c/364121/1/nova/network/linux_net.py

(I don't know my way around openstack code, but arrived at that page via clicking on links from a google search)





In OpenStack we put the MAC of the VF and the vlan using iproute2.

I just want to know if that should be the part of Libvirt setting mac/vlan or

Libvirt  just create the macvtap interface and we should put the mac/vlan?

libvirt *should* do it.







We have a WIP patch in OpenStack  for setting also the mac for the

netdevice of the VF  [2]. Just wanted to know that this is the correct

approach.

Can you confirm that setting the VF netdevice mac in OpenStack is a reasonable workaround for the newer Libvirt versions?

If libvirt isn't getting the job done, and you can set it yourself, then that's a workaround. I don't know that I'd call it "reasaonable" though. If everybody puts in special code to workaround bugs in libvirt (which is apparently what's been done) rather than actually reporting the bug (what you're doing now - Thanks!) then we are tricked into thinking that either the code works, or that nobody is using it so it doesn't matter if it's broken.
Sure, I would like to get the fix in Libvirt so I opened this bug https://bugzilla.redhat.com/show_bug.cgi?id=1372944
But then again I can’t assume that everyone will use the latest Libvirt so I will put a workaround in OpenStack with TODO for removal

The *best* way of overcoming this problem is to fix libvirt so it does what it's supposed to do.

It's possible we can make it work by adding some operation after we send the RTM_SETLINK (maybe unbind the VF from its netdev driver, then re-bind, but that seems so drastic and time consuming!), or maybe we'll have to revert to using ioctl(SIOCSIFHWADDR), but of course that will fail if the interface has been used for hostdev assignment since the last host reboot.

It's interesting that openstack is apparently using the RTM_SETLINK method to set the mac address (afaik, that's what is used by the "ip link set $pf_ifname vf $vf_num mac $mac_addr vlan $vlanid" command that's shown in the bit of code from nova/network/linux_net.py at the link I posted above).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20160904/012feb0e/attachment-0001.htm>


More information about the libvir-list mailing list