[libvirt] [PATCH] macvtap: Work-around failing nl_connect calls (weird problem)
Stefan Berger
stefanb at linux.vnet.ibm.com
Mon Feb 14 22:22:46 UTC 2011
On 02/14/2011 03:30 PM, Stefan Berger wrote:
> On 02/14/2011 02:51 PM, Daniel P. Berrange wrote:
>>
>> This approach feels like a nasty hack to me and potentially still leaves
>> us with a problem in netcf which is also using netlink sockets. I think
>> we need to get a clearer picture of what the root cause is before going
>> for this kind of patch
> Correct, I am 'fixing' this in the wrong place. The issues is in the
> call sequence
>
> nl_handle = nl_handle_alloc()
> nl_connect(nl_handle, NETLINK_ROUTE)
>
> with the second one failing taking merely input from the 1st one.
> These are obviously two libnl calls. Something is either not using
> libn or not using it correctly.
> Thanks for pointing out netcf. I looked at libnetcf code and found
> this sequence here:
>
> [...]
> int netlink_init(struct netcf *ncf) {
>
> ncf->driver->nl_sock = nl_handle_alloc();
> if (ncf->driver->nl_sock == NULL)
> goto error;
> if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0) {
> goto error;
> }
>
> This seems to be doing the same as I do. Maybe there is yet 'something
> else' that's using netlink sockets.
> What's also strange is that the first 'virsh start' still works, but
> the subsequent 'virsh destroy' then does not.
One definte problem in libnl is that the 'port allocation'
(generate_local_port()) is not thread-safe, even though I think it's the
library's responsibility to lock, not libvirt introducing a lock that we
need to grab before calling into netcf and grabbing in macvtap. Unless
libnl fixes this, I believe there will be no other way than retrying.
One will eventually bind and exclude a concurrent thread from binding.
Regards,
Stefan
More information about the libvir-list
mailing list