[libvirt] [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process

Wuzongyong (Euler Dept) cordius.wu at huawei.com
Tue Mar 6 12:09:24 UTC 2018



Thanks,
Zongyong Wu
> >> On 03/05/2018 12:43 PM, Cordius Wu wrote:
> >>>>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> We unregister qemu monitor after sending
> >>>>>>> QEMU_PROCESS_EVENT_MONITOR_EOF
> >>>>>> to workerPool:
> >>>>>>>
> >>>>>>> static void
> >>>>>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
> >>>>>>>                             virDomainObjPtr vm,
> >>>>>>>                             void *opaque) {
> >>>>>>>     virQEMUDriverPtr driver = opaque;
> >>>>>>>     qemuDomainObjPrivatePtr priv; struct qemuProcessEvent
> >>>>>>> *processEvent; ...
> >>>>>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF;
> >>>>>>>     processEvent->vm = vm;
> >>>>>>>
> >>>>>>>     virObjectRef(vm);
> >>>>>>>     if (virThreadPoolSendJob(driver->workerPool, 0,
> >>>>>>> processEvent) < 0)
> >>> {
> >>>>>>>         ignore_value(virObjectUnref(vm));
> >>>>>>>         VIR_FREE(processEvent);
> >>>>>>>         goto cleanup;
> >>>>>>>     }
> >>>>>>>
> >>>>>>>     /* We don't want this EOF handler to be called over and over
> >>>>>>> while
> >>>>>> the
> >>>>>>>      * thread is waiting for a job.
> >>>>>>>      */
> >>>>>>> qemuMonitorUnregister(mon);
> >>>>>>> ...
> >>>>>>> }
> >>>>>>>
> >>>>>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in
> >>>>>>> processMonitorEOFEvent
> >>>>>> function:
> >>>>>>>
> >>>>>>> static void
> >>>>>>> processMonitorEOFEvent(virQEMUDriverPtr driver,
> >>>>>>>                        virDomainObjPtr vm) {
> >>>>>>>       ...
> >>>>>>>       if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY,
> >>>>>>> true) <
> >>>>>> 0)
> >>>>>>>         return;
> >>>>>>>       ...
> >>>>>>> }
> >>>>>>>
> >>>>>>> Here,  libvirt will show that the vm state is running all the
> >>>>>>> time if qemuProcessBeginStopJob return -1 even though qemu may
> >>>>>>> terminate or be
> >>>>>> killed later.
> >>>>>>>
> >>>>>>> So, may be we should re-register the monitor when
> >>>>>> qemuProcessBeginStopJob failed?
> >>>>>>
> >>>>>> The fact that processMonitorEOFEvent() failed to grab DESTROY job
> >>>>>> means that we screwed up earlier and now you're just seeing
> >>>>>> effects
> >> of it.
> >>>>>> Threads should be albe to acquire DESTROY job at any point,
> >>>>>> regardless
> >>> of
> >>>>>> other jobs set on the domain object.
> >>>>>>
> >>>>>> Can you please:
> >>>>>> a) try to turn on debug logs [1] and tell us why acquiring
> >>>>>> DESTROY job failed? You should see an error message like this:
> >>>>>>
> >>>>>>   error: cannot acquire state change lock ..
> >>>>>>
> >>>>>> b) tell us what is your libvirt version and if you're able to
> >>>>>> reproduce this with the latest git HEAD?
> >>>>>>
> >>>>>
> >>>>> I said " qemuProcessBeginStopJob failed" means that:
> >>>>
> >>>> Oh, I though that the message you've sent earlier is related to this:
> >>>>
> >>>> https://www.redhat.com/archives/libvir-list/2018-March/msg00148.htm
> >>>> l
> >>>>
> >>>> So you are not accidentally sending SIGKILL to qemu then?
> >>>
> >>> Yep, I send SIGKILL to qemu outside. The 'accident' means that the
> >>> scene libvirt indicate the vm is in running state all the time is
> >>> hardly to reproduce. In the past month, I just reproduce it twice.
> >>>
> >>>
> >>>
> >>>>> we failed to kill qemu process in 15 seconds (refer to
> >>> virProcessKillPainfully).
> >>>>> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit
> >>>>> in
> >>> 15s, and
> >>>>> then libvirt will think qemu is still in running state event
> >>>>> though qemu
> >>> exit
> >>>>> indeed after the 15s loop in virProcessKillPainfully.
> >>>>
> >>>> What state is qemu process in then? I mean, how can we see EOF if
> >>>> the process still exists?
> >>>>
> >>> I send SIGKILL to qemu process, but the qemu process didn't exited
> >>> immediately, I use command 'ps -ef | grep qemu' show that the qemu
> >>> process is in defunct state.
> >>
> >> Ah, so you can find the process, but it is in D state. Because I read
> > the
> >> email linked above like qemu is gone.
> >
> > Yep
> >>> Then about
> >>> 20s-30s after sending the SIGKILLthe qemu process exited and I can't
> >>> find the qemu info though ps command.
> >>> So, the libvirt still think the qemu process is alive in the 15s
> >>> loop in virProcessKillPainfully.
> >>
> >> Ah, so IIUC, qemu has closed the monitor but right after that it went
> >> to the D state instead of quitting. Meanwhile, libvirt sees EOF on
> >> the
> > monitor
> >> but is unable to kill the process.
> >
> > Right
> >> Well, registering EOF handler back would be only a workaround,
> >> because
> > if
> >> you register EOF handler back the event loop will do a busy wait (in
> > each
> >> iteration it will see EOF), so eventually the
> >> virProcessKillPainfully() will see the process gone and
> >> qemuProcessBeginStopJob() would be able to return successfully.
> >>
> >> I'm unsure what the right fix might be though. Maybe, at EOF we can
> > check
> >> what state is qemu process in and if it's in D state don't try to
> >> kill
> > it
> >> and continue with BeginJob() call.
> >>
> >> Michal
> > Hmm, I can't come up with a better solution for this problem, so I
> > wish if somebody could help to solve this problem.
> > BTW, how to check a process is in D state in libvirt?
> 
> By reading /proc/$pid/status. Although this would work only on Linux, not
> *BSD. On the other hand, I'm not sure *BSD has D state.
> 
> Michal
Hmmm, is a process marked with defunct in Z state instead of D state?





More information about the libvir-list mailing list