[libvirt] [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process

Michal Privoznik mprivozn at redhat.com
Mon Mar 5 14:01:22 UTC 2018


On 03/05/2018 01:21 PM, Cordius Wu wrote:
> 
>> -----Original Message-----
>> From: Michal Privoznik [mailto:mprivozn at redhat.com]
>> Sent: Monday, March 5, 2018 8:09 PM
>> To: Cordius Wu; 'Wuzongyong (Euler Dept)'; libvir-list at redhat.com
>> Cc: 'Wanzongshun (Vincent)'; 'weijinfen'
>> Subject: Re: [libvirt] [Question]Libvirt doesn't care about qemu monitor
>> event if fail to destroy qemu process
>>
>> On 03/05/2018 12:43 PM, Cordius Wu wrote:
>>>>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We unregister qemu monitor after sending
>>>>>>> QEMU_PROCESS_EVENT_MONITOR_EOF
>>>>>> to workerPool:
>>>>>>>
>>>>>>> static void
>>>>>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
>>>>>>>                             virDomainObjPtr vm,
>>>>>>>                             void *opaque) {
>>>>>>>     virQEMUDriverPtr driver = opaque;
>>>>>>>     qemuDomainObjPrivatePtr priv;
>>>>>>> struct qemuProcessEvent *processEvent; ...
>>>>>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF;
>>>>>>>     processEvent->vm = vm;
>>>>>>>
>>>>>>>     virObjectRef(vm);
>>>>>>>     if (virThreadPoolSendJob(driver->workerPool, 0, processEvent)
>>>>>>> < 0)
>>> {
>>>>>>>         ignore_value(virObjectUnref(vm));
>>>>>>>         VIR_FREE(processEvent);
>>>>>>>         goto cleanup;
>>>>>>>     }
>>>>>>>
>>>>>>>     /* We don't want this EOF handler to be called over and over
>>>>>>> while
>>>>>> the
>>>>>>>      * thread is waiting for a job.
>>>>>>>      */
>>>>>>> qemuMonitorUnregister(mon);
>>>>>>> ...
>>>>>>> }
>>>>>>>
>>>>>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in
>>>>>>> processMonitorEOFEvent
>>>>>> function:
>>>>>>>
>>>>>>> static void
>>>>>>> processMonitorEOFEvent(virQEMUDriverPtr driver,
>>>>>>>                        virDomainObjPtr vm) {
>>>>>>>       ...
>>>>>>>       if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY,
>>>>>>> true) <
>>>>>> 0)
>>>>>>>         return;
>>>>>>>       ...
>>>>>>> }
>>>>>>>
>>>>>>> Here,  libvirt will show that the vm state is running all the time
>>>>>>> if qemuProcessBeginStopJob return -1 even though qemu may
>>>>>>> terminate or be
>>>>>> killed later.
>>>>>>>
>>>>>>> So, may be we should re-register the monitor when
>>>>>> qemuProcessBeginStopJob failed?
>>>>>>
>>>>>> The fact that processMonitorEOFEvent() failed to grab DESTROY job
>>>>>> means that we screwed up earlier and now you're just seeing effects
>> of it.
>>>>>> Threads should be albe to acquire DESTROY job at any point,
>>>>>> regardless
>>> of
>>>>>> other jobs set on the domain object.
>>>>>>
>>>>>> Can you please:
>>>>>> a) try to turn on debug logs [1] and tell us why acquiring DESTROY
>>>>>> job failed? You should see an error message like this:
>>>>>>
>>>>>>   error: cannot acquire state change lock ..
>>>>>>
>>>>>> b) tell us what is your libvirt version and if you're able to
>>>>>> reproduce this with the latest git HEAD?
>>>>>>
>>>>>
>>>>> I said " qemuProcessBeginStopJob failed" means that:
>>>>
>>>> Oh, I though that the message you've sent earlier is related to this:
>>>>
>>>> https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html
>>>>
>>>> So you are not accidentally sending SIGKILL to qemu then?
>>>
>>> Yep, I send SIGKILL to qemu outside. The 'accident' means that the
>>> scene libvirt indicate the vm is in running state all the time is
>>> hardly to reproduce. In the past month, I just reproduce it twice.
>>>
>>>
>>>
>>>>> we failed to kill qemu process in 15 seconds (refer to
>>> virProcessKillPainfully).
>>>>> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit
>>>>> in
>>> 15s, and
>>>>> then libvirt will think qemu is still in running state event though
>>>>> qemu
>>> exit
>>>>> indeed after the 15s loop in virProcessKillPainfully.
>>>>
>>>> What state is qemu process in then? I mean, how can we see EOF if the
>>>> process still exists?
>>>>
>>> I send SIGKILL to qemu process, but the qemu process didn't exited
>>> immediately, I use command 'ps -ef | grep qemu' show that the qemu
>>> process is in defunct state.
>>
>> Ah, so you can find the process, but it is in D state. Because I read
> the
>> email linked above like qemu is gone.
> 
> Yep
>>> Then about
>>> 20s-30s after sending the SIGKILLthe qemu process exited and I can't
>>> find the qemu info though ps command.
>>> So, the libvirt still think the qemu process is alive in the 15s loop
>>> in virProcessKillPainfully.
>>
>> Ah, so IIUC, qemu has closed the monitor but right after that it went to
>> the D state instead of quitting. Meanwhile, libvirt sees EOF on the
> monitor
>> but is unable to kill the process.
> 
> Right
>> Well, registering EOF handler back would be only a workaround, because
> if
>> you register EOF handler back the event loop will do a busy wait (in
> each
>> iteration it will see EOF), so eventually the
>> virProcessKillPainfully() will see the process gone and
>> qemuProcessBeginStopJob() would be able to return successfully.
>>
>> I'm unsure what the right fix might be though. Maybe, at EOF we can
> check
>> what state is qemu process in and if it's in D state don't try to kill
> it
>> and continue with BeginJob() call.
>>
>> Michal
> Hmm, I can't come up with a better solution for this problem, so I wish if
> somebody could help to solve this problem.
> BTW, how to check a process is in D state in libvirt?

By reading /proc/$pid/status. Although this would work only on Linux,
not *BSD. On the other hand, I'm not sure *BSD has D state.

Michal




More information about the libvir-list mailing list