[libvirt] [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process

Cordius Wu wuzongyo at mail.ustc.edu.cn
Mon Mar 5 12:21:05 UTC 2018


> -----Original Message-----
> From: Michal Privoznik [mailto:mprivozn at redhat.com]
> Sent: Monday, March 5, 2018 8:09 PM
> To: Cordius Wu; 'Wuzongyong (Euler Dept)'; libvir-list at redhat.com
> Cc: 'Wanzongshun (Vincent)'; 'weijinfen'
> Subject: Re: [libvirt] [Question]Libvirt doesn't care about qemu monitor
> event if fail to destroy qemu process
>
> On 03/05/2018 12:43 PM, Cordius Wu wrote:
> >>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
> >>>>> Hi,
> >>>>>
> >>>>> We unregister qemu monitor after sending
> >>>>> QEMU_PROCESS_EVENT_MONITOR_EOF
> >>>> to workerPool:
> >>>>>
> >>>>> static void
> >>>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
> >>>>>                             virDomainObjPtr vm,
> >>>>>                             void *opaque) {
> >>>>>     virQEMUDriverPtr driver = opaque;
> >>>>>     qemuDomainObjPrivatePtr priv;
> >>>>> struct qemuProcessEvent *processEvent; ...
> >>>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF;
> >>>>>     processEvent->vm = vm;
> >>>>>
> >>>>>     virObjectRef(vm);
> >>>>>     if (virThreadPoolSendJob(driver->workerPool, 0, processEvent)
> >>>>> < 0)
> > {
> >>>>>         ignore_value(virObjectUnref(vm));
> >>>>>         VIR_FREE(processEvent);
> >>>>>         goto cleanup;
> >>>>>     }
> >>>>>
> >>>>>     /* We don't want this EOF handler to be called over and over
> >>>>> while
> >>>> the
> >>>>>      * thread is waiting for a job.
> >>>>>      */
> >>>>> qemuMonitorUnregister(mon);
> >>>>> ...
> >>>>> }
> >>>>>
> >>>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in
> >>>>> processMonitorEOFEvent
> >>>> function:
> >>>>>
> >>>>> static void
> >>>>> processMonitorEOFEvent(virQEMUDriverPtr driver,
> >>>>>                        virDomainObjPtr vm) {
> >>>>>       ...
> >>>>>       if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY,
> >>>>> true) <
> >>>> 0)
> >>>>>         return;
> >>>>>       ...
> >>>>> }
> >>>>>
> >>>>> Here,  libvirt will show that the vm state is running all the time
> >>>>> if qemuProcessBeginStopJob return -1 even though qemu may
> >>>>> terminate or be
> >>>> killed later.
> >>>>>
> >>>>> So, may be we should re-register the monitor when
> >>>> qemuProcessBeginStopJob failed?
> >>>>
> >>>> The fact that processMonitorEOFEvent() failed to grab DESTROY job
> >>>> means that we screwed up earlier and now you're just seeing effects
> of it.
> >>>> Threads should be albe to acquire DESTROY job at any point,
> >>>> regardless
> > of
> >>>> other jobs set on the domain object.
> >>>>
> >>>> Can you please:
> >>>> a) try to turn on debug logs [1] and tell us why acquiring DESTROY
> >>>> job failed? You should see an error message like this:
> >>>>
> >>>>   error: cannot acquire state change lock ..
> >>>>
> >>>> b) tell us what is your libvirt version and if you're able to
> >>>> reproduce this with the latest git HEAD?
> >>>>
> >>>
> >>> I said " qemuProcessBeginStopJob failed" means that:
> >>
> >> Oh, I though that the message you've sent earlier is related to this:
> >>
> >> https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html
> >>
> >> So you are not accidentally sending SIGKILL to qemu then?
> >
> > Yep, I send SIGKILL to qemu outside. The 'accident' means that the
> > scene libvirt indicate the vm is in running state all the time is
> > hardly to reproduce. In the past month, I just reproduce it twice.
> >
> >
> >
> >>> we failed to kill qemu process in 15 seconds (refer to
> > virProcessKillPainfully).
> >>> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit
> >>> in
> > 15s, and
> >>> then libvirt will think qemu is still in running state event though
> >>> qemu
> > exit
> >>> indeed after the 15s loop in virProcessKillPainfully.
> >>
> >> What state is qemu process in then? I mean, how can we see EOF if the
> >> process still exists?
> >>
> > I send SIGKILL to qemu process, but the qemu process didn't exited
> > immediately, I use command 'ps -ef | grep qemu' show that the qemu
> > process is in defunct state.
>
> Ah, so you can find the process, but it is in D state. Because I read
the
> email linked above like qemu is gone.

Yep
> > Then about
> > 20s-30s after sending the SIGKILLthe qemu process exited and I can't
> > find the qemu info though ps command.
> > So, the libvirt still think the qemu process is alive in the 15s loop
> > in virProcessKillPainfully.
>
> Ah, so IIUC, qemu has closed the monitor but right after that it went to
> the D state instead of quitting. Meanwhile, libvirt sees EOF on the
monitor
> but is unable to kill the process.

Right
> Well, registering EOF handler back would be only a workaround, because
if
> you register EOF handler back the event loop will do a busy wait (in
each
> iteration it will see EOF), so eventually the
> virProcessKillPainfully() will see the process gone and
> qemuProcessBeginStopJob() would be able to return successfully.
>
> I'm unsure what the right fix might be though. Maybe, at EOF we can
check
> what state is qemu process in and if it's in D state don't try to kill
it
> and continue with BeginJob() call.
>
> Michal
Hmm, I can't come up with a better solution for this problem, so I wish if
somebody could help to solve this problem.
BTW, how to check a process is in D state in libvirt?

Thanks,
Wu Zongyong






More information about the libvir-list mailing list