[libvirt] [PATCH] nodedev_udev: Fix missing events when kernel report lots of udev events within a short time

Erik Skultety eskultet at redhat.com
Wed May 10 15:09:07 UTC 2017


On Wed, May 10, 2017 at 12:44:22PM +0100, Daniel P. Berrange wrote:
> On Wed, May 10, 2017 at 01:38:00PM +0200, Erik Skultety wrote:
> > On Tue, May 09, 2017 at 10:09:07AM +0800, ZhiPeng Lu wrote:
> > > From: "ning.bo" <ning.bo9 at zte.com.cn>
> > >
> > > When create Virtual Function for Inter XL710 use below commands:
> > > for i in `seq 0 1`; do
> > >         echo 63 > /sys/devices/pci0000:00/0000:00:03.2/0000:07:00.$i/sriov_numvfs
> > > done
> > > for i in `seq 0 3`; do
> > >         echo 31 > /sys/devices/pci0000:80/0000:80:02.2/0000:82:00.$i/sriov_numvfs
> > > done
> > >
> >
> > Hi, I think this is worth a BZ, so we can track and test the issue. Now, I'm
> > working on a similar issue which is very hard to reproduce and this looks like
> > it could be reproduced easily with 100% chance, but I don't have any HW to try
> > it on (it might take a while to get my hands on some). So, would you mind
> > opening a BZ for this, attaching not only libvirtd's debug logs, but preferably
> > udev debug logs as well (libudev API is kinda poorly designed in this aspect
> > and the only way to check for the real error is to enable the debugging and
> > look into the logs).
> >
> > > The libvirtd will missing some udev events, the result of libvirt-python API
> > > listDevices('pci') will not list all pci devices.
> > > The reason is that the buffer of udev monitor default size cann't save all udev
> > > events reported by kernel.
> > > So we need change buffer size so that we can receive as much events as possible
> > > whitin a short time.
> > >
> > > Signed-off-by: ning.bo <ning.bo9 at zte.com.cn>
> > > ---
> > >  src/node_device/node_device_udev.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
> > > index 6e706a1..d813206 100644
> > > --- a/src/node_device/node_device_udev.c
> > > +++ b/src/node_device/node_device_udev.c
> > > @@ -1564,6 +1564,7 @@ static int nodeStateInitialize(bool privileged,
> > >      }
> > >
> > >      udev_monitor_enable_receiving(priv->udev_monitor);
> > > +    udev_monitor_set_receive_buffer_size(priv->udev_monitor, 128 * 1024 * 1024);
> > >
> >
> > The reason why I want to investigate the logs (ideally try it myself) is
> > rather than blindly increasing the buffer size, maybe we receive just too many
> > events, out of which some we might filter out and the problem would disappear.
> > Anyhow, isn't 129 MiB bit of an overkill for a buffer size? I mean, you
> > didn't provide any commentary on why you chose 128MiB specifically, I can only
> > guess it somehow relates to the fact, that XL710 is capable of 128 VFs??
>
> This matches what udevd sets for the buffer.

Ah, right.

>
> This method ends up setting the SO_RCVRBUFFORCE socket options. So we're
> not allocating 128 MB in userspace - IIUC it just lets the kernel expand
> it upto 128 MB if needed. This is a privileged operation however, so

Yeah, I just checked how kernel handles sockets and buffers associated with
them and you're absolutely right, when a datagram is to be delivered to a
socket, sk_rmem_alloc size is incremented by the size of the buffer holding the
datagram and in case the incremented size is to be bigger than sk_rcvbuf
(that's the limit set by SO_RCVBUF) the datagram is dropped and ENOMEM is
returned.

> we better make sure we don't call this when running non-root.
Yes, good point, but what about unprivileged daemon, you should be safe to use
SO_RCVBUF safely in that case, as long as it's less than the global rmem_max
limit. Now, since I'm dealing with a similar BZ, IMHO trying to come up with a
suitable buffer size is non-trivial and setting it according to the udevd's
manner is quite neat I have to admin, so the unprivileged case was just out of
curiousity whether we need to take account for that as well.

Thanks,
Erik

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list