[libvirt] [PATCH v2] qemu: Introduce a wrapper over virFileWrapperFdClose

Michal Privoznik mprivozn at redhat.com
Thu Sep 21 15:20:29 UTC 2017


On 09/21/2017 04:18 PM, John Ferlan wrote:
> 
> 
> On 09/19/2017 09:56 AM, Michal Privoznik wrote:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1448268
>>
>> When migrating to a file (e.g. when doing 'virsh save file'),
>> couple of things are happening in the thread that is executing
>> the API:
>>
>> 1) the domain obj is locked
>> 2) iohelper is spawned as a separate process to handle all I/O
>> 3) the thread waits for iohelper to finish
>> 4) the domain obj is unlocked
>>
>> Now, the problem is that while the thread waits in step 3 for
>> iohelper to finish this may take ages because iohelper calls
>> fdatasync(). And unfortunately, we are waiting the whole time
>> with the domain locked. So if another thread wants to jump in and
>> say copy the domain name ('virsh list' for instance), they are
>> stuck.
>>
>> The solution is to unlock the domain whenever waiting for I/O and
>> lock it back again when it finished.
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>>
>> diff to v1:
>> - check after locking the domain obj back if the domain is still alive,
>>   since it was unlocked it might have changed its state.
>>
>>  src/qemu/qemu_driver.c | 33 +++++++++++++++++++++++++++++----
>>  1 file changed, 29 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index e1a0dd553..6095a5ec5 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -3216,6 +3216,31 @@ qemuOpenFileAs(uid_t fallback_uid, gid_t fallback_gid,
>>      goto cleanup;
>>  }
>>  
>> +
>> +static int
>> +qemuFileWrapperFDClose(virDomainObjPtr vm,
>> +                       virFileWrapperFdPtr fd)
>> +{
>> +    int ret;
>> +
>> +    /* virFileWrapperFd uses iohelper to write data onto disk.
>> +     * However, iohelper calls fdatasync() which may take ages to
>> +     * finish. Therefore, we shouldn't be waiting with the domain
>> +     * object locked. */
>> +
>> +    virObjectUnlock(vm);
>> +    ret = virFileWrapperFdClose(fd);
>> +    virObjectLock(vm);
>> +    if (!virDomainObjIsActive(vm)) {
>> +        if (!virGetLastError())
>> +            virReportError(VIR_ERR_OPERATION_FAILED, "%s",
>> +                           _("domain is no longer running"));
>> +        ret = -1;
>> +    }
>> +    return ret;
>> +}
>> +
>> +
>>  /* Helper function to execute a migration to file with a correct save header
>>   * the caller needs to make sure that the processors are stopped and do all other
>>   * actions besides saving memory */
>> @@ -3276,7 +3301,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver,
>>          goto cleanup;
>>      }
>>  
>> -    if (virFileWrapperFdClose(wrapperFd) < 0)
>> +    if (qemuFileWrapperFDClose(vm, wrapperFd) < 0)
>>          goto cleanup;
>>  
>>      if ((fd = qemuOpenFile(driver, vm, path, O_WRONLY, NULL, NULL)) < 0 ||
>> @@ -3827,7 +3852,7 @@ doCoreDump(virQEMUDriverPtr driver,
>>                               path);
>>          goto cleanup;
>>      }
>> -    if (virFileWrapperFdClose(wrapperFd) < 0)
>> +    if (qemuFileWrapperFDClose(vm, wrapperFd) < 0)
>>          goto cleanup;
>>  
>>      ret = 0;
>> @@ -6687,7 +6712,7 @@ qemuDomainRestoreFlags(virConnectPtr conn,
>>  
>>      ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, data, path,
>>                                       false, QEMU_ASYNC_JOB_START);
>> -    if (virFileWrapperFdClose(wrapperFd) < 0)
>> +    if (qemuFileWrapperFDClose(vm, wrapperFd) < 0)
>>          VIR_WARN("Failed to close %s", path);
> 
> This and the next one - may not be completely true if the -1 return is
> because the guest not active.
> 
> Furthermore for both if we had a failure because the domain was inactive
> the condition wouldn't be propagated as both only care that
> qemuDomainSaveImageStartVM succeeded.
> 
>>  
>>      qemuProcessEndJob(driver, vm);
>> @@ -6962,7 +6987,7 @@ qemuDomainObjRestore(virConnectPtr conn,
>>  
>>      ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, data, path,
>>                                       start_paused, asyncJob);
>> -    if (virFileWrapperFdClose(wrapperFd) < 0)
>> +    if (qemuFileWrapperFDClose(vm, wrapperFd) < 0)
>>          VIR_WARN("Failed to close %s", path);
>>  
> 
> For this one ignoring an active failure means perhaps unlinking the
> managed save file and then running qemuProcessStart perhaps unexpectedly
> again...
> 
> 
> For these two cases there's just enough fear to not be fully supportive
> for an R-B.  For the Save and Dump cases, they're fine, but it would
> seem to me we really want these restore operations to be quite
> autonomous and not need to be impacted by some outside influence causing
> the domain to be inactive again. Besides, the BZ only mentions Save and
> Dump.
> 
> if:
> 
>    1. qemuFileWrapperFDClose is commented to indicate only using for the
> "non-restore" cases
>    2. Using virFileWrapperFdClose for the two restore cases
> 
> then:
> 
> Reviewed-by: John Ferlan <jferlan at redhat.com>

Okay. I'll go with this option. Pushed, thanks.

Michal




More information about the libvir-list mailing list