[libvirt] [PATCH alternative 1] util: fix libvirtd startup failure due to netlink error

Eric Blake eblake at redhat.com
Tue May 1 21:41:28 UTC 2012


On 05/01/2012 01:10 PM, Laine Stump wrote:
> This patch is one alternative to solve the problem detailed in:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=816465
> 
> Some other unidentified library in use by libvirtd (in another thread)
> is apparently temporarily binding to a NETLINK_ROUTE raw socket with
> an address of "pid of libvirtd" during startup. This is the same
> address used by libnl for the first netlink socket it binds, and the
> netlink socket allocated for virNetlinkEventServiceStart() happens to
> be that first socket; the result is that nl_connect() fails about
> 15-20% of the time (but apparently only if there is a guest running at
> the time libvirtd starts).
> 
> Testing has shown that in the case that nl_connect fails the first
> time, retrying it after a 500msec sleep leads to success 100% of the
> time, so this patch doubles that delay (which also has 100% success
> rate.
> 

> +++ b/src/util/virnetlink.c
> @@ -355,9 +355,18 @@ virNetlinkEventServiceStart(void)
>      }
>  
>      if (nl_connect(srv->netlinknh, NETLINK_ROUTE) < 0) {
> -        virReportSystemError(errno,
> -                             "%s", _("cannot connect to netlink socket"));
> -        goto error_server;
> +        /* the address that libnl wants to use for this connect ("pid
> +         * of libvirtd") is sometimes temporarily in use by some other
> +         * unidentified code. Retrying after a 500msec sleep has
> +         * achieved 100% success rates, so we sleep for 1000msec and
> +         * retry.
> +         */
> +        usleep(1000000);

Sleeping for 1 entire second is user-visible; if we go with this
approach, I'd rather see it be as a retry loop that probes something
like once every 200ms for 5 tries (or something similar), for better
response time.

-- 
Eric Blake   eblake at redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 620 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20120501/bb99e5fc/attachment-0001.sig>


More information about the libvir-list mailing list