[libvirt] [PATCH 7/9] qemu: Grab modify job for changing domain XML

Michal Privoznik mprivozn at redhat.com
Tue Jun 4 14:23:09 UTC 2019


On 6/4/19 11:07 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 30.05.2019 12:34, Michal Privoznik wrote:
>> Changing domain definition must be guarded with acquiring modify
>> job. The problem is if there is a thread executing say
>> qemuDomainSetMemoryStatsPeriod() which is an API that acquires
>> modify job and then possibly unlock the domain object and locks
>> it again. However, becasue virDomainObjAssignDef() does not take
>> a job (only object lock) it may have changed the domain
>> definition while the other thread unlocked the domain object in
>> order to talk on the monitor. For instance:
>>
>> Thread1:
>> 1) lookup domain object and lock it
>> 2) acquire job
>> 3) get pointers to live and persistent defs
>> 4) unlock the domain object
>> 5) talk to qemu on the monitor
>>
>> Thread2 (Execute qemuDomainDefineXMLFlags):
>> 1) lookup domain object and lock it
>> 2) virDomainObjAssignDef() which overwrites persistent def and
>>     frees the old one
>> 3) unlock domain object
>>
>> Thread1:
>> 6) lock the domain object again
>> 7) try to access the persistent def via pointer stored in 3)
>>
>> Now this will crash because the pointer from step 3) points to a
>> memory that was freed.
>>
>> However, if Thread2 waited and acquired modify job (just like
>> Thread1) then these two would serialize and no harm would be
>> done.
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>>   src/qemu/qemu_domain.c    |  3 ++-
>>   src/qemu/qemu_driver.c    | 29 +++++++++++++++++++++++++----
>>   src/qemu/qemu_migration.c |  7 +++++++
>>   3 files changed, 34 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
>> index c61d39b12b..fff2f1932e 100644
>> --- a/src/qemu/qemu_domain.c
>> +++ b/src/qemu/qemu_domain.c
>> @@ -8784,7 +8784,8 @@ qemuDomainRemoveInactive(virQEMUDriverPtr driver,
>>           return;
>>       }
>>   
>> -    qemuDomainRemoveInactiveCommon(driver, vm);
>> +    if (vm->def)
>> +        qemuDomainRemoveInactiveCommon(driver, vm);
>>   
>>       virDomainObjListRemove(driver->domains, vm);
> 
> Hmm, virDomainObjListRemoveLocked uses vm->def too.
> 
>>   }
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index 8bbac339e0..77cb7e4b87 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -1706,9 +1706,16 @@ static virDomainPtr qemuDomainCreateXML(virConnectPtr conn,
>>                                      VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE)))
>>           goto cleanup;
>>   
>> +    if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) {
>> +        qemuDomainRemoveInactive(driver, vm);
>> +        goto cleanup;
>> +    }
>> +
>>       virDomainObjAssignDef(vm, def, true, NULL);
>>       def = NULL;
>>   
>> +    qemuDomainObjEndJob(driver, vm);
>> +
> 
> Looks like we can begin VIR_DOMAIN_JOB_OPERATION_START from the start instead
> of acquiring job twice.

That is an async job. We shouldn't rely on async job when modifying a 
domain.

> 
>>       if (qemuProcessBeginJob(driver, vm, VIR_DOMAIN_JOB_OPERATION_START,
>>                               flags) < 0) {
>>           qemuDomainRemoveInactiveJob(driver, vm);
>> @@ -6976,6 +6983,11 @@ qemuDomainRestoreFlags(virConnectPtr conn,
>>                                      VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE)))
>>           goto cleanup;
>>   
>> +    if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) {
>> +        qemuDomainRemoveInactive(driver, vm);
>> +        goto cleanup;
>> +    }
>> +
>>       virDomainObjAssignDef(vm, def, true, NULL);
>>       def = NULL;
>>   
>> @@ -6988,6 +7000,7 @@ qemuDomainRestoreFlags(virConnectPtr conn,
>>           priv = vm->privateData;
>>           priv->hookRun = true;
>>       }> +    qemuDomainObjEndJob(driver, vm);
> 
> Same here
> 
>>   
>>       if (qemuProcessBeginJob(driver, vm, VIR_DOMAIN_JOB_OPERATION_RESTORE,
>>                               flags) < 0)
>> @@ -7651,6 +7664,11 @@ qemuDomainDefineXMLFlags(virConnectPtr conn,
>>                                      driver->xmlopt, 0)))
>>           goto cleanup;
>>   
>> +    if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) {
>> +        qemuDomainRemoveInactive(driver, vm);
>> +        goto cleanup;
>> +    }
>> +
>>       virDomainObjAssignDef(vm, def, false, &oldDef);
>>       def = NULL;
>>   
>> @@ -7673,7 +7691,7 @@ qemuDomainDefineXMLFlags(virConnectPtr conn,
>>               vm->persistent = 0;
>>               qemuDomainRemoveInactiveJob(driver, vm);
>>           }
>> -        goto cleanup;
>> +        goto endjob;
>>       }
>>   
>>       event = virDomainEventLifecycleNewFromObj(vm,
>> @@ -7685,6 +7703,9 @@ qemuDomainDefineXMLFlags(virConnectPtr conn,
>>       VIR_INFO("Creating domain '%s'", vm->def->name);
>>       dom = virGetDomain(conn, vm->def->name, vm->def->uuid, vm->def->id);
>>   
>> + endjob:
>> +    qemuDomainObjEndJob(driver, vm);
>> +
>>    cleanup:
>>       virDomainDefFree(oldDef);
>>       virDomainDefFree(def);
> 
> Additionally we need to replace qemuDomainRemoveInactiveJob with qemuDomainRemoveInactive
> as we have a job already.

I'm planning on wrapping what you suggested in your reply to the cover 
letter into a single function. This way, the change won't be that 
intrusive and a lot of these changes won't be made.

Michal




More information about the libvir-list mailing list