[libvirt-users] e1000 network interface takes a long time to set the link ready

Thu May 10 21:44:14 UTC 2018

On Thu, May 10, 2018 at 2:07 PM, Laine Stump <laine at redhat.com> wrote:
> On 05/10/2018 02:53 PM, Ihar Hrachyshka wrote:
>> Hi,
>>
>> In kubevirt, we discovered [1] that whenever e1000 is used for vNIC,
>> link on the interface becomes ready several seconds after 'ifup' is
>> executed
>
> What is your definition of "becomes ready"? Are you looking at the
> output of "ip link show" in the guest? Or are you watching "brctl
> showstp" for the bridge device on the host? Or something else?

I was watching the guest dmesg for the following messages:

[    4.773275] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[    6.769235] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: RX
[    6.771408] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

For e1000, there are 2 seconds in between those messages; for virtio,
it's near instant. Interesting that it happens on the very first ifup;
when I do it the second time after the guest booted, it's instant.

>
>> which for some buggy images like cirros may slow down boot
>> process for up to 1 minute [2]. If we switch from e1000 to virtio, the
>> link is brought up and ready almost immediately.
>>
>> For the record, I am using the following versions:
>> - L0 kernel: 4.16.5-200.fc27.x86_64 #1 SMP
>> - libvirt: 3.7.0-4.fc27
>> - guest kernel: 4.4.0-28-generic #47-Ubuntu
>>
>> Is there something specific about e1000 that makes it initialize the
>> link too slowly on libvirt or guest side?
>
> There isn't anything libvirt could do that would cause the link to
> IFF_UP up any faster or slower, so if there is an issue it's elsewhere.
> Since switching to the virtio device eliminates the problem, my guess
> would be that it's something about the implementation of the emulated
> device in qemu that is causing a delay in the e1000 driver in the guest.
> That's just a guess though.
>
>>
>> [1] https://github.com/kubevirt/kubevirt/issues/936
>> [2] https://bugs.launchpad.net/cirros/+bug/1768955
>
> (I discount the idea of the stp delay timer having an effect, as
> suggested in one of the comments on github that points to my explanation
> of STP in a libvirt bugzilla record, because that would cause the same
> problem for e1000 or virtio).

Yes, it's not STP, and I also tried to explicitly set all bridge
timers to 0 with no result. I also did "tcpdump -i any" inside the
container that hosts the VM VIF, and there was no relevant traffic on
tap device.

>
> I hesitate to suggest this, because the rtl8139 code in qemu is
> considered less well maintained and lower performance than e1000, but
> have you tried setting that model to see how it behaves? You may be
> forced to make that the default when virtio isn't available.

Indeed rth8139 is near instant too:

[    4.156872] 8139cp 0000:07:01.0 eth0: link up, 100Mbps,
full-duplex, lpa 0x05E1
[    4.177520] 8139cp 0000:07:01.0 eth0: link up, 100Mbps,
full-duplex, lpa 0x05E1

Thanks for the tip, we will consider it too (also thanks for the
background info about the driver support state).

>
> Another thought - I guess the virtio driver in Cirros is always
> available? Perhaps kubevirt could use libosinfo to auto-decide what
> device to use for networking based on OS.
>

This, or we can introduce explicit tags for NICs / guest type to use.

Thanks a lot for reply,
Ihar