[libvirt] [PATCH] macvtap: Work-around failing nl_connect calls (weird problem)
Stefan Berger
stefanb at linux.vnet.ibm.com
Tue Feb 15 04:43:29 UTC 2011
On 02/14/2011 05:22 PM, Stefan Berger wrote:
> On 02/14/2011 03:30 PM, Stefan Berger wrote:
>> On 02/14/2011 02:51 PM, Daniel P. Berrange wrote:
>>>
>>> This approach feels like a nasty hack to me and potentially still
>>> leaves
>>> us with a problem in netcf which is also using netlink sockets. I think
>>> we need to get a clearer picture of what the root cause is before going
>>> for this kind of patch
>> Correct, I am 'fixing' this in the wrong place. The issues is in the
>> call sequence
>>
>> nl_handle = nl_handle_alloc()
>> nl_connect(nl_handle, NETLINK_ROUTE)
>>
>> with the second one failing taking merely input from the 1st one.
>> These are obviously two libnl calls. Something is either not using
>> libn or not using it correctly.
>> Thanks for pointing out netcf. I looked at libnetcf code and found
>> this sequence here:
>>
>> [...]
>> int netlink_init(struct netcf *ncf) {
>>
>> ncf->driver->nl_sock = nl_handle_alloc();
>> if (ncf->driver->nl_sock == NULL)
>> goto error;
>> if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0) {
>> goto error;
>> }
>>
>> This seems to be doing the same as I do. Maybe there is yet
>> 'something else' that's using netlink sockets.
>> What's also strange is that the first 'virsh start' still works, but
>> the subsequent 'virsh destroy' then does not.
>
> One definte problem in libnl is that the 'port allocation'
> (generate_local_port()) is not thread-safe, even though I think it's
> the library's responsibility to lock, not libvirt introducing a lock
> that we need to grab before calling into netcf and grabbing in
> macvtap. Unless libnl fixes this, I believe there will be no other way
> than retrying. One will eventually bind and exclude a concurrent
> thread from binding.
>
It's late but this doesn't look right even now in libnl
(libnl-debuginfo-1.1-12.fc14.x86_64):
port allocation (socket.c ; line 134):
used_ports_map[i] |= (1UL << n);
- that's going to set a bit
port deallocation (socket.c; line 156) :
used_ports_map[nr / 32] &= ~((nr % 32) + 1);
- that's going to produce garbage; no wonder things don't work
used_ports_map[nr / 32] &= ~(1 << (nr % 32));
or
used_ports_map[nr / 32] &= ~(1 << (nr & 0x1f));
- would probably be much better
Stefan
More information about the libvir-list
mailing list