[libvirt] [PATCH 1/1] qemu: sync blockjob finishing in qemuDomainGetBlockJobInfo

Nikolay Shirokovskiy nshirokovskiy at virtuozzo.com
Thu Nov 28 11:25:42 UTC 2019



On 28.11.2019 14:11, Peter Krempa wrote:
> On Thu, Nov 28, 2019 at 10:51:56 +0000, Nikolay Shirokovskiy wrote:
>>
>>
>> On 28.11.2019 12:05, Peter Krempa wrote:
>>> On Thu, Nov 28, 2019 at 07:29:08 +0000, Nikolay Shirokovskiy wrote:
>>>>
>>>>
>>>> On 27.11.2019 17:56, Peter Krempa wrote:
>>>>> On Wed, Nov 27, 2019 at 17:19:18 +0300, Nikolay Shirokovskiy wrote:
>>>>>> Due to race qemuDomainGetBlockJobInfo can return there is
>>>>>> no block job for disk but later call to spawn new blockjob
>>>>>> can fail because libvirt internally still not process blockjob
>>>>>> finishing. Thus let's wait for blockjob finishing if we
>>>>>> report there is no more blockjob.
>>>>>
>>>>> Could you please elaborate how this happened? e.g. provide some logs?
>>>>>
>>>>> Polling with qemuDomainGetBlockJobInfo will always be racy and if the
>>>>> problem is that diskPriv->blockjob is allocated but qemu didn't report
>>>>> anything I suspect the problem is somewhere else.
>>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy at virtuozzo.com>
>>>>>> ---
>>>>>>  src/qemu/qemu_driver.c | 18 +++++++++++++++++-
>>>>>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>>>>> index 669c12d6ca..b148df3a57 100644
>>>>>> --- a/src/qemu/qemu_driver.c
>>>>>> +++ b/src/qemu/qemu_driver.c
>>>>>> @@ -17785,12 +17785,26 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom,
>>>>>>          goto endjob;
>>>>>>      }
>>>>>>  
>>>>>> +    qemuBlockJobSyncBegin(job);
>>>>>> +
>>>>>>      qemuDomainObjEnterMonitor(driver, vm);
>>>>>>      ret = qemuMonitorGetBlockJobInfo(qemuDomainGetMonitor(vm), job->name, &rawInfo);
>>>>>>      if (qemuDomainObjExitMonitor(driver, vm) < 0)
>>>>>>          ret = -1;
>>>>>> -    if (ret <= 0)
>>>>>> +    if (ret < 0)
>>>>>> +        goto endjob;
>>>>>> +
>>>>>> +    if (ret == 0) {
>>>>>
>>>>> So if qemu returns that there is no blockjob ...
>>>>>
>>>>>> +        qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE);
>>>>>> +        while (qemuBlockJobIsRunning(job)) {
>>>>>
>>>>> but we think it's still running it's either that the event arrived but
>>>>> wasn't processed yet. In that case this would help, but the outcome
>>>>> would not differ much from the scenario if the code isn't here as the
>>>>> event would be processed later.
>>>>>
>>>>>
>>>>>> +            if (virDomainObjWait(vm) < 0) {
>>>>>> +                ret = -1;
>>>>>> +                goto endjob;
>>>>>> +            }
>>>>>> +            qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE);
>>>>>
>>>>> If there's a bug and qemu doesn't know that there's a job but libvirt
>>>>> thinks there's one but qemu is right, this will lock up here as the
>>>>> event will never arrive.
>>>>>
>>>>>> +        }
>>>>>>          goto endjob;
>>>>>
>>>>> My suggested solution would be to fake the job from the data in 'job'
>>>>> rather than attempt to wait in this case e.g.
>>>>>
>>>>> if (ret == 0) {
>>>>>     rawStats.type = job->type;
>>>>>     rawStats.ready = 0;
>>>>> }
>>>>>
>>>>> and then make it to qemuBlockJobInfoTranslate which sets some fake
>>>>> progress data so that the job looks active.
>>>>>
>>>>> That way you get a response that there's a job without the potential to
>>>>> deadlock.
>>>>>
>>>>
>>>> Fake progress data does not look nice. May be we should cache progress on qemuDomainGetBlockJobInfo
>>>> calls so it will not go backwards after the patch?
>>>
>>> I agree but it's always a tradeoff between complexity of the code and
>>> benefit of it. If it's a very unlikely scenario, returning fake data is
>>> simpler than adding caching and persistence over restarts.
>>>
>>>> I can understand protecting against possible bugs when recovering is not possible or difficult
>>>> like case of data corruption. But in this case it is matter of libvirtd restart I guess.
>>>
>>> Well, that's why I'm asking what the original bug is. Maybe the problem
>>> lies somewhere else and there's a more elegant fix.
>>>
>>
>> I have mgmt polling for blockpull job completion. After completion fail/success
>> is infered by inspecting domain xml. So even on success due to race we can
>> see the backing chaing that should be deleted by blockpull and decide it was
>> failure.
> 
> The strongly preferred method to figure out success of a job is to use
> the VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 event which is delivered after the
> block job is finalized by libvirt.
> 
> This means that any backing chain modification/redetection is finished
> after it's delivered and additionally the event reports success state.
> 
> Additionally since libvirt 5.10 and qemu 4.2 blockdev is used which
> impacts block jobs. With blockdev used we persist the jobs in qemu until
> we process the state of them. This means that the above race is
> impossible with current qemu as the block job would still be present in
> the output of query-blockjobs until we process the event from qemu and
> thus modify the backing chain in the XML during one lock.
> 
> In the pre-blockdev era I can see that there's possibility that it could
> happen, but in my opinion it's extremely unlikely. The race window is
> between qemu removing the job and delivering the event. After the event
> is delivered we enqueue it in the job handling and thus it will be
> processed in order. This means that the code would have to
> call virDomainGetBlockJobInfo, where the info from qemu is no longer
> present, but that also means that qemu sent the event notifying about
> the finished job. The window where the virDomainGetXMLDesc call would
> have to take place is between those two actions.
> 
> Given that with newest qemu and libvirt the situation will not happen
> and it's unlikely I'm against adding this complex code you proposed.
> We can go with the fake progress data if you insist on fixing the
> polling scenario which is suboptimal by itself.
> 

Thanx for elaborate answer. In the long term the problem is solved
by block jobs reaped explicitly so I'm ok with dismissing the patch.

Nikolay





More information about the libvir-list mailing list