[libvirt] NL_RECV RETURNED WITH ERROR: NO BUFFER SPACE AVAILABLE ERROR FOR IXGBE LOAD

Vaughan Cao vaughan.cao at oracle.com
Wed Jun 18 15:58:07 UTC 2014


On 2014年06月18日 20:24, Laine Stump wrote:
> On 06/17/2014 06:16 AM, vaughan wrote:
>> Hi experts,
>>
>> Release: OL7
>> Kernel: 3.10.0-121.el7.x86_64
>>
>> Noticed below error on OL7 server, while loading Intel 10gigabit nic
>> driver module , ixgbe  in syslog
>> --------------------------------------------------------------------
>> journal: nl_recv returned with error: No buffer space available
>> -------------------------------------------------------------------
>>
>> Complete syslog content for ixgbe module load :
>>
>> un 16 20:46:10 ca-ostest432 kernel: ixgbe: Intel(R) 10 Gigabit PCI
>> Express Network Driver - version 3.15.1-k
>> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe: Copyright (c) 1999-2013
>> Intel Corporation.
>> Jun 16 20:46:10 ca-ostest432 kvm: 1 guest now active
>> Jun 16 20:46:10 ca-ostest432 kvm: 0 guests now active
>> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: Multiqueue
>> Enabled: Rx Queue count = 16, Tx Queue count = 16
>> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (PCI
>> Express:5.0GT/s:Width x8) 00:1b:21:c8:24:74
>> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: MAC: 2, PHY: 9,
>> SFP+: 3, PBA No: E70856-007
>> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: PCI Express
>> bandwidth of 32GT/s available
>> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (Speed:5.0GT/s,
>> Width: x8, Encoding Loss:20%)
>> Jun 16 20:46:10 ca-ostest432 journal: nl_recv returned with error: No
>> buffer space available
> A very similar problem (probably the same, but you're showing the kernel
> error message rather than the error logged by libvirt) was reported and
Can we distinguish the two problem just from libvirtd.log? I turned it 
on, and it just output the following messages, like common errors. If 
you would like to see the whole log, I can attach it.

2014-06-18 14:16:40.901+0000: 31891: info : 
virNetlinkEventServiceStart:520 : starting netlink event service with 
protocol 15
2014-06-18 14:16:40.901+0000: 31891: debug : 
virEventPollInterruptLocked:710 : Skip interrupt, 0 0
2014-06-18 14:16:40.901+0000: 31891: debug : virEventPollAddHandle:136 : 
EVENT_POLL_ADD_HANDLE: watch=6 fd=14 events=1 cb=0x7fdff9397680 
opaque=0x7fdffb8ff960 ff=(nil)
2014-06-18 14:16:40.901+0000: 31891: debug : 
virNetlinkEventServiceStart:576 : netlink event listener on fd: 14 running
2014-06-18 14:16:40.901+0000: 31891: debug : virNetServerRun:1093 : 
srv=0x7fdffb8f0d70 quit=0
.........
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=0 w=1, f=6 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=1 w=2, f=8 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=2 w=3, f=11 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=3 w=4, f=12 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=4 w=5, f=13 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=5 w=6, f=14 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=6 w=7, f=15 e=0 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=7 w=8, f=15 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=8 w=9, f=17 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCalculateTimeout:332 : Calculate expiry of 0 timers
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCalculateTimeout:361 : Timeout at 0 due in -1 ms
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:627 : 
EVENT_POLL_RUN: nhandles=8 timeout=-1
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:638 : 
Poll got 1 event(s)
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchTimeouts:423 : Dispatch 0
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:468 : Dispatch 8
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=0 w=1
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=1 w=2
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=2 w=3
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=3 w=4
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=4 w=5
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=5 w=6
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:496 : EVENT_POLL_DISPATCH_HANDLE: watch=6 
events=1
2014-06-18 14:16:48.993+0000: 31891: error : virNetlinkEventCallback:344 
: nl_recv return ok length: 186: Success
2014-06-18 14:16:48.993+0000: 31891: debug : virNetlinkEventCallback:349 
: dispatching to max 0 clients, called from event watch 6
2014-06-18 14:16:48.993+0000: 31891: debug : virNetlinkEventCallback:362 
: event not handled.
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=7 w=8
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=8 w=9
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCleanupTimeouts:514 : Cleanup 0
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCleanupTimeouts:550 : Found 0 out of 0 timeout slots used, 
releasing 0
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCleanupHandles:562 : Cleanup 9
2014-06-18 14:16:48.993+0000: 31891: debug : virEventRunDefaultImpl:270 
: running default event implementation
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCleanupTimeouts:514 : Cleanup 0
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCleanupTimeouts:550 : Found 0 out of 0 timeout slots used, 
releasing 0
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCleanupHandles:562 : Cleanup 9
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=0 w=1, f=6 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=1 w=2, f=8 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=2 w=3, f=11 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=3 w=4, f=12 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=4 w=5, f=13 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=5 w=6, f=14 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=6 w=7, f=15 e=0 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=7 w=8, f=15 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 
: Prepare n=8 w=9, f=17 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCalculateTimeout:332 : Calculate expiry of 0 timers
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollCalculateTimeout:361 : Timeout at 0 due in -1 ms
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:627 : 
EVENT_POLL_RUN: nhandles=8 timeout=-1
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:638 : 
Poll got 1 event(s)
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchTimeouts:423 : Dispatch 0
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:468 : Dispatch 8
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=0 w=1
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=1 w=2
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=2 w=3
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=3 w=4
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=4 w=5
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:482 : i=5 w=6
2014-06-18 14:16:48.993+0000: 31891: debug : 
virEventPollDispatchHandles:496 : EVENT_POLL_DISPATCH_HANDLE: watch=6 
events=5
2014-06-18 14:16:48.993+0000: 31891: error : virNetlinkEventCallback:340 
: nl_recv returned with error, -5: No buffer space available


> addressed in libnl3 quite awhile back. libnl3 originally set the default
> buffer size to 4096, which wasn't enough for SRIOV cards with lots of
> VFs. So they increased it to 4 * 4096, which should be plenty for
> anybody. That libnl3 patch is present in RHEL7.0 (currently at 3.2.21.6).
>
> Can you verify the version of libnl3 you are running, and that it
> contains this code
>
>     if (page_size == 0)
>            page_size = getpagesize() * 4;
I'm using libnl3-3.2.21-6.0.1.el7.x86_64 which has the patch you mentioned,
* <E4><B8><89> 2<E6><9C><88> 26 2014 Thomas Graf <tgraf at redhat.com> - 
3.2.21-5
- nl-Increase-receive-buffer-size-to-4-pages.patch (rh #1040626)

> in the function lib/nl.c:nl_recv() (previously it was just "page_size =
> getpagesize();"). If you don't have that patch in your libnl3 package,
> please backport the upstream commit that makes that change. If you do
> have that patch in your libnl3, perhaps you have gotten a different
> ixgbe driver from somewhere (we did test against ixgbe with the maximum
> number of VFs, so there would have to be something different in your
> driver). It would be good to figure out the source of the problem before
> applying any fix anywhere - much better to understand the cause, and
> right now I don't think we do; what is creating the need for such a
> large buffer in your case, but not for others who use the same driver
> with the same number of VFs?).
Thank you to point this out. I admit that my patch is derived from the 
symptom, not the cause.
I'll dig in this direction.

filename: 
/lib/modules/3.10.0-121.el7.x86_64/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
version: 3.15.1-k
license: GPL
description: Intel(R) 10 Gigabit PCI Express Network Driver
author: Intel Corporation, <linux.nics at intel.com>
srcversion: FC629607FA8BAB4CDE3BCF0
.....
depends: mdio,ptp,dca
intree: Y
vermagic: 3.10.0-121.el7.x86_64 SMP mod_unload modversions
signer: Red Hat Enterprise Linux kernel signing key
sig_key: E3:7B:3F:4E:75:78:02:B9:12:00:CF:F3:C7:81:1A:51:2E:94:83:EF
sig_hashalgo: sha256
parm: max_vfs:Maximum number of virtual functions to allocate per 
physical function - default is zero and maximum value is 63 (uint)
parm: allow_unsupported_sfp:Allow unsupported and untested SFP+ modules 
on 82599-based adapters (uint)
parm: debug:Debug level (0=none,...,16=all) (int)


>
>
> Up to now our position has been that this problem should be fixed in
> libnl, so we have preferred to not patch libvirt for it, but instead get
> libnl fixed. If we do decide to patch libvirt, I think it would be
> better to turn on message peeking for nl_recv
> (nl_socket_enable_msg_peek()), as that would solve the problem totally
> and permanently (the upstream maintainer of libnl is reluctant to turn
> that on by default due to potential performance problems in other users
> of libnl)
>
Affirmative.




More information about the libvir-list mailing list