[libvirt] [PATCH] macvtap: Work-around failing nl_connect calls (weird problem)

Stefan Berger stefanb at linux.vnet.ibm.com
Tue Feb 15 04:43:29 UTC 2011


On 02/14/2011 05:22 PM, Stefan Berger wrote:
> On 02/14/2011 03:30 PM, Stefan Berger wrote:
>> On 02/14/2011 02:51 PM, Daniel P. Berrange wrote:
>>>
>>> This approach feels like a nasty hack to me and potentially still 
>>> leaves
>>> us with a problem in netcf which is also using netlink sockets. I think
>>> we need to get a clearer picture of what the root cause is before going
>>> for this kind of patch
>> Correct, I am 'fixing' this in the wrong place. The issues is in the 
>> call sequence
>>
>> nl_handle = nl_handle_alloc()
>> nl_connect(nl_handle, NETLINK_ROUTE)
>>
>> with the second one failing taking merely input from the 1st one. 
>> These are obviously two libnl calls. Something is either not using 
>> libn or not using it correctly.
>> Thanks for pointing out netcf. I looked at libnetcf code and found 
>> this sequence here:
>>
>> [...]
>> int netlink_init(struct netcf *ncf) {
>>
>>     ncf->driver->nl_sock = nl_handle_alloc();
>>     if (ncf->driver->nl_sock == NULL)
>>         goto error;
>>     if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0) {
>>         goto error;
>>     }
>>
>> This seems to be doing the same as I do. Maybe there is yet 
>> 'something else' that's using netlink sockets.
>> What's also strange is that the first 'virsh start' still works, but 
>> the subsequent 'virsh destroy' then does not.
>
> One definte problem in libnl is that the 'port allocation' 
> (generate_local_port()) is not thread-safe, even though I think it's 
> the library's responsibility to lock, not libvirt introducing a lock 
> that we need to grab before calling into netcf and grabbing in 
> macvtap. Unless libnl fixes this, I believe there will be no other way 
> than retrying. One will eventually bind and exclude a concurrent 
> thread from binding.
>
It's late but this doesn't look right even now in libnl 
(libnl-debuginfo-1.1-12.fc14.x86_64):

port allocation (socket.c ; line 134):
used_ports_map[i] |= (1UL << n);

- that's going to set a bit

port deallocation (socket.c; line 156) :
used_ports_map[nr / 32] &= ~((nr % 32) + 1);


- that's going to produce garbage; no wonder things don't work


used_ports_map[nr / 32] &= ~(1 << (nr % 32));

or

used_ports_map[nr / 32] &= ~(1 << (nr & 0x1f));

- would probably be much better


    Stefan




More information about the libvir-list mailing list