[libvirt] Internal error message from netcf when trying to start libvirtd

Mon Dec 2 17:53:37 UTC 2013

On 02.12.2013 18:40, Daniel P. Berrange wrote:
> On Mon, Dec 02, 2013 at 06:38:37PM +0100, Michal Privoznik wrote:
>> On 02.12.2013 18:24, Daniel P. Berrange wrote:
>>> On Mon, Dec 02, 2013 at 06:17:00PM +0100, Michal Privoznik wrote:
>>>> On 17.10.2013 14:40, Daniel P. Berrange wrote:
>>>>> On Tue, Oct 15, 2013 at 06:03:27PM +0200, Christophe Fergeau wrote:
>>>>>> Hey,
>>>>>>
>>>>>> Due to a configuration issue on my system, libvirtd is not starting on my
>>>>>> system (not complaining about this!):
>>>>>> 2013-10-15 15:40:51.024+0000: 10222: info : libvirt version: 1.1.3
>>>>>> 2013-10-15 15:40:51.024+0000: 10222: error : virNetTLSContextCheckCertFile:117 : Cannot read CA certificate
>>>>>> '/home/teuf/usr/etc/pki/CA/cacert.pem': No such file or directory
>>>>>>
>>>>>> However, before libvirtd exits, I get another error message from the netcf
>>>>>> code, which is unexpected this time:
>>>>>> 2013-10-15 15:49:18.361+0000: 10222: error : netcfStateCleanup:105 :
>>>>>> internal error: Attempt to close netcf state driver already closed
>>>>>>
>>>>>> This message comes from the call of virStateCleanup() at the end of main()
>>>>>> in libvirtd.c. virStateCleanup() should not be called before
>>>>>> daemonStateInit() has been called in main.
>>>>>> After this call, things get more ugly as daemonStateInit() calls
>>>>>> virStateInitialize() from a thread, so there's probably a small window for
>>>>>> virStateInitialize() and virStateCleanup() running concurrently if an error
>>>>>> occurs between the call to daemonStateInit() and the call to
>>>>>> virNetServerRun().
>>>>>>
>>>>>> I'm sending this email rather than a patch as I'm not sure what is the best
>>>>>> way to fix it. The easy way would be for virStateCleanup() to be a noop
>>>>>> when virStateInitialize() hasn't been called (iow remove the error message
>>>>>> from netcfStateCleanup). However, this would leave this small race
>>>>>> condition around (which is not that bad as it would only occurs in
>>>>>> situations when the daemon fails to start). So another approach would be to
>>>>>> set a vir_state_initialized boolean once the thread has called
>>>>>> ivrStateInitialize, and only call virStateCleanup() when it's set.
>>>>>>
>>>>>> Or maybe there's a 3rd way to fix this?
>>>>>>
>>>>>> Let me know if you have any guidance into the best way to fix this,
>>>>>
>>>>> Yeah, I've known about this issue for a while and its a bit horrible
>>>>> to solve. I think we probably need to have a global mutex to serialize
>>>>> the virStateInitialize, virStateReload and virStateCleanup calls so
>>>>> that none of them can run in parallel.
>>>>>
>>>>> I'd have virGlobalInit create the mutex instance, since we know that
>>>>> is run from thread safe context.
>>>>>
>>>>> Then make the virState* calls hold that mutex while executing.
>>>>>
>>>>> We don't want virStateCleanup to skip execution if virStateInitialize
>>>>> has failed though - every callback in virStateCleanup should be written
>>>>> to be safe if its corresponding init function hasn't run. Just the
>>>>> mutex serialization ought to be sufficient I think.
>>>>>
>>>>>
>>>>> Daniel
>>>>>
>>>>
>>>> Sorry for bringing up a dead thread, but as you both may have noticed I
>>>> was trying to solve this too:
>>>>
>>>> https://www.redhat.com/archives/libvir-list/2013-November/msg01311.html
>>>>
>>>> And I don't think Dan, your approach is gonna work. I mean, the reason
>>>> for running virStateInitialize in thread is - it may take ages to
>>>
>>> Actually no, the reason for using a thread is that the initialization
>>> code needs to have a working event loop - so you cannot run the
>>> virStateInitialize code from the same thread that is running the
>>> event loop.
>>
>> Okay, this might be the reason too.
>>
>>>
>>> You're probably thinking of the QEMU VM autostart code, whicih spawns a
>>> separate thread to auto-start QEMU VMs to avoid delaying use of the QEMU
>>> driver while (slow) autostart takes place.
>>>
>>
>> No. I am thinking of the stateAutostart member which is called from
>> within virStateInitialize(). In case of qemu driver it's
>> qemuStateAutoStart() what is called, which boils down to calling
>> qemuProcessStart over each autostart domain without spawning a separate
>> thread for it. The thread you have in mind is used for
>> qemuProcessReconnect() not qemuProcessStart().
>>
>>>> complete. Hence, serializing virState* will lead to main thread waiting
>>>> for the virStateInitialize to complete (which may not happen, i.e. in
>>>> case of deadlock somewhere in a driver).
>>>>
>>>> But I was thinking of rather refined locking. Turning each of
>>>> vir*DriverTab into virObjectLockable(). Then, each access would require
>>>> object to lock itself. Moreover, the virStateCleanup would unregister
>>>> the driver. Unfortunately, it could only unregister stateful drivers.
>>>
>>> This race condition scenario only hits us in two cases
>>>
>>>  - libvirtd startup is being aborted due to some failure
>>>    while StateInit is still running
>>>  - libvirtd reload is triggered by SIGHUP causing StateReload
>>>    to run
>>
>> I can see the third scenario: the daemon exiting the event loop, i.e. as
>> a result of timeout or SIGINT. In this case you don't want to wait for
>> virStateInitialize() to finish, do you?
> 
> If the event loop exits then in fact the virStateInitialize may never
> complete, since it requires a functioning event loop. So in this case
> then you cannot safely call virStateCleanup at all and we should just
> skip it entirely and exit with trying to cleanup. The cleanup code is
> pretty much worthless from a functional POV anyway - it is just there
> to make valgrind happy.

Yep. I vote for removing virStateCleanup(). Although, I'm not sure if
NWFilter is not doing something actually useful there. Other drivers
seems to just free() allocated memory (=making valgrind happy).

Michal