[libvirt] [PATCH] macvtap: Work-around failing nl_connect calls (weird problem)

Stefan Berger stefanb at linux.vnet.ibm.com
Mon Feb 14 22:22:46 UTC 2011


On 02/14/2011 03:30 PM, Stefan Berger wrote:
> On 02/14/2011 02:51 PM, Daniel P. Berrange wrote:
>>
>> This approach feels like a nasty hack to me and potentially still leaves
>> us with a problem in netcf which is also using netlink sockets. I think
>> we need to get a clearer picture of what the root cause is before going
>> for this kind of patch
> Correct, I am 'fixing' this in the wrong place. The issues is in the 
> call sequence
>
> nl_handle = nl_handle_alloc()
> nl_connect(nl_handle, NETLINK_ROUTE)
>
> with the second one failing taking merely input from the 1st one. 
> These are obviously two libnl calls. Something is either not using 
> libn or not using it correctly.
> Thanks for pointing out netcf. I looked at libnetcf code and found 
> this sequence here:
>
> [...]
> int netlink_init(struct netcf *ncf) {
>
>     ncf->driver->nl_sock = nl_handle_alloc();
>     if (ncf->driver->nl_sock == NULL)
>         goto error;
>     if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0) {
>         goto error;
>     }
>
> This seems to be doing the same as I do. Maybe there is yet 'something 
> else' that's using netlink sockets.
> What's also strange is that the first 'virsh start' still works, but 
> the subsequent 'virsh destroy' then does not.

One definte problem in libnl is that the 'port allocation' 
(generate_local_port()) is not thread-safe, even though I think it's the 
library's responsibility to lock, not libvirt introducing a lock that we 
need to grab before calling into netcf and grabbing in macvtap. Unless 
libnl fixes this, I believe there will be no other way than retrying. 
One will eventually bind and exclude a concurrent thread from binding.

Regards,
    Stefan




More information about the libvir-list mailing list