[libvirt] [PATCH REBASE 4/5] qemu: fix domain object wait to handle monitor errors

John Ferlan jferlan at redhat.com
Fri May 4 14:52:49 UTC 2018



On 05/03/2018 03:54 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 01.05.2018 01:03, John Ferlan wrote:
>>
>>
>> On 04/18/2018 10:44 AM, Nikolay Shirokovskiy wrote:
>>> Block job abort operation can not handle properly qemu crashes
>>> when waiting for abort/pivot completion. Deadlock scenario is next:
>>>
>>> - qemuDomainBlockJobAbort waits for pivot/abort completion
>>> - qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and
>>>   then waits for job condition (taken by qemuDomainBlockJobAbort)
>>> - qemuDomainBlockJobAbort awakes but nothing really changed, VM is still
>>>   active (vm->def->id != -1) so thread starts waiting for completion again.
>>>   Now two threads are in deadlock.
>>>
>>> First let's add some condition besides domain active status so that waiting
>>> thread can check it when awakes. Second let's move signalling to the place
>>> where condition is set, namely monitor eof/error handlers. Having signalling
>>> in qemuProcessBeginStopJob is not useful.
>>>
>>> The patch copies monitor error to domain state because at time
>>> waiting thread awakes there can be no monitor and it is useful to
>>> report monitor error to user.
>>>
>>> The patch has a drawback that on destroying a domain when waiting for
>>> event from monitor we get not very convinient error message like
>>
>> convenient
>>
>>> 'EOF from monitor' from waiting API. On the other hand if qemu crashes
>>> this is more useful then 'domain is not running'. The first case
>>> will be addressed in another patch.
>>>
>>> The patch also fixes other places where we wait for event from qemu.
>>> Namely handling device removal and tray ejection. Other places which
>>> used virDomainObjWait (dump, migration, save) were good because of
>>> async jobs which allow concurrent destroy job.
>>>
>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy at virtuozzo.com>
>>> ---
>>>  src/conf/domain_conf.c    | 43 -------------------------------------------
>>>  src/conf/domain_conf.h    |  3 ---
>>>  src/libvirt_private.syms  |  2 --
>>>  src/qemu/qemu_domain.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>>  src/qemu/qemu_domain.h    |  5 ++++-
>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>  src/qemu/qemu_hotplug.c   |  4 ++--
>>>  src/qemu/qemu_migration.c | 12 ++++++------
>>>  src/qemu/qemu_process.c   | 27 ++++++++++++++++++++++-----
>>>  9 files changed, 82 insertions(+), 65 deletions(-)
>>>
>>
>> This no longer git am -3 applies and based on previous patch reviews, I
>> think perhaps this needs to be redone.
> 
> I'll resend as soon as we come to agreement on series. Now I'm not
> convinced to change much (except for using distinct flag to indicate error
> as mentioned in 2nd patch discussion, then I don't need 3d patch).
> 
> See comments below.
> 
>>
>> I don't believe moving/renaming virDomainObjWait is good/right, but I'm
>> sure there would be other opinions on that.  Yes, QEMU is currently the
>> only consumer... If it were to move, it should move to virdomainobjlist
>> since that's where innards of virDomainObjPtr are managed. The fact that
>> we look at ->parent.lock, well, <sigh>...
> 
> I would not move the function without reason. It gets new name qemuDomainObjWait
> becase it use qemu specific data, namely monError.

Today only qemu uses this generic virDomainObjWait which is generic
without the need to have/use qemu specific things. Other domain
consumers could use it if they chose.

I'm not in favor of moving, renaming, repurposing for a very specific
issue for what is a generically used function. Maybe that's just me - so
you could try to get a different reviewer if you want.


> 
>>
>> Anyway, you're attempting to special case something and perhaps you just
>> need to create a qemuDomainObjWait that would call the timeout version
>> of the virDomainObjWait and be able to handle whether some other error
>> occurred.  Wouldn't the LastError before the current wait return NULL
>> (as in no error), then during the timeout period if LastError is
>> something couldn't that indicate the failure you're looking for.
> 
> I introduced qemuDomainObjWait not to put virDomainObjWait and virDomainObjWaitUntil
> in the first place but to check monError. Then rather then keeping too functions
> it's look more simple to use only one and branch on given timeout.
> 

I would think if the goal was to have specific code for a specific
purpose for specific functions, then introduce the qemuDomainObjWait,
but have it call virDomainObjWait[Until] based on the need you have.
Which in this case appears to fiddle with monError in some way.

Again - that's just my opinion

John

>>
>> I didn't spend a lot of time thinking about alternatives and how the
>> code should change, but when you mention the patch has a drawback - that
>> immediately raises concern.
> 
> But ... I addressed this issue in next patch as I wrote.
> 


[...]




More information about the libvir-list mailing list