[libvirt] [PATCH] nwfilter: fix deadlock caused updating network device and nwfilter

Pavel Hrdina phrdina at redhat.com
Tue Nov 11 16:34:14 UTC 2014


On 11/11/2014 04:13 PM, John Ferlan wrote:
>
>
> On 11/05/2014 09:02 AM, Pavel Hrdina wrote:
>> Commit 6e5c79a1 tried to fix deadlock between nwfilter{Define,Undefine}
>> and starting of guest, but this same deadlock is also for
>> updating/attaching network device to domain.
>
> The referenced commit has a roadmap of sorts of what the lock is -
> perhaps you could add similar details here for before and after.

In that case I can do copy&paste of that roadmap as it's the same and
this patch only extend the referenced one.

>
> While more of a pain logistically - separating out the patches into
> single changes may make it easier to list the bad locking order followed
> by the order the commit changes.
>
> The ones that are most odd are the places where you take out the lock in
> the middle of a function.  That seems to go against the original commit
> which takes them out right after the flags check. In particular that's
> the qemuDomainRestoreFlags and qemuMigrationPrepareAny changes.

The order of the nwfilter lock and flags could be the same in all
functions, that's a good point. My intention was to lock the 
nwfilterRead right before the vm is locked because the deadlock is 
between the vm and nwfilter objects.

However I don't quiet understand why you would like to split the patch 
into a series. All the paths leads to the same nwfilter call 
"virDomainConfNWFilterInstantiate" and that's the only function callable 
outside of nwfilter that could cause the deadlock.

>
>>
>> The deadlock was introduced by removing global QEMU driver lock because
>> nwfilter was counting on this lock and ensure that all driver locks are
>> locked inside of nwfilter{Define,Undefine}.
>>
>> This patch extends usage of virNWFilterReadLockFilterUpdates to prevent
>> the deadlock for all possible paths in QEMU driver. LXC and UML drivers
>> still have global lock.
>>
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1143780
>>
>> Signed-off-by: Pavel Hrdina <phrdina at redhat.com>
>> ---
>>
>> This is temporary fix for the deadlock issue as I'm planning to create global
>> libvirt jobs (similar to QEMU domain jobs) and use it for other drivers and
>> for example for nwfilters too.
>>
>>   src/qemu/qemu_driver.c    | 12 ++++++++++++
>>   src/qemu/qemu_migration.c |  3 +++
>>   src/qemu/qemu_process.c   |  4 ++++
>>   3 files changed, 19 insertions(+)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index 6acaea8..9e6f505 100644
>
> [1] Usage around virQEMUDriverGetConfig() calls isn't consistent. In
> particular qemuDomainAttachDeviceFlags() will lock/unlock in different
> order.
>
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -5937,6 +5937,8 @@ qemuDomainRestoreFlags(virConnectPtr conn,
>>           def = tmp;
>>       }
>>
>> +    virNWFilterReadLockFilterUpdates();
>> +
>
> So no other locks up to this point are taken? And no need perhaps to
> lock earlier to play follow the leader (eg, the original commit) where
> the lock was taken after the various flags checks.
>
>>       if (!(vm = virDomainObjListAdd(driver->domains, def,
>>                                      driver->xmlopt,
>>                                      VIR_DOMAIN_OBJ_LIST_ADD_LIVE |
>> @@ -5978,6 +5980,7 @@ qemuDomainRestoreFlags(virConnectPtr conn,
>>       virFileWrapperFdFree(wrapperFd);
>>       if (vm)
>>           virObjectUnlock(vm);
>> +    virNWFilterUnlockFilterUpdates();
>
> So this could get called without ever having set the lock - is that
> correct?  We can get to cleanup without calling the ReadLock - probably
> not a good thing since it's indeterminate what happens according to the
> man page.
>
>>       return ret;
>>   }
>>
>> @@ -7502,6 +7505,8 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const char *xml,
>>
>>       affect = flags & (VIR_DOMAIN_AFFECT_LIVE | VIR_DOMAIN_AFFECT_CONFIG);
>>
>> +    virNWFilterReadLockFilterUpdates();
>> +
>
> [1] This one is done after the GetConfig
>
>>       if (!(caps = virQEMUDriverGetCapabilities(driver, false)))
>>           goto cleanup;
>>
>> @@ -7614,6 +7619,7 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const char *xml,
>>           virObjectUnlock(vm);
>>       virObjectUnref(caps);
>>       virObjectUnref(cfg);
>
> [1] But the release is done after the cfg unref
>
>> +    virNWFilterUnlockFilterUpdates();
>
>>       return ret;
>>   }
>>
>> @@ -7644,6 +7650,8 @@ static int qemuDomainUpdateDeviceFlags(virDomainPtr dom,
>>                     VIR_DOMAIN_AFFECT_CONFIG |
>>                     VIR_DOMAIN_DEVICE_MODIFY_FORCE, -1);
>>
>> +    virNWFilterReadLockFilterUpdates();
>> +
>
> [1] This one is done before the GetConfig
>
>>       cfg = virQEMUDriverGetConfig(driver);
>>
>>       affect = flags & (VIR_DOMAIN_AFFECT_LIVE | VIR_DOMAIN_AFFECT_CONFIG);
>> @@ -7760,6 +7768,7 @@ static int qemuDomainUpdateDeviceFlags(virDomainPtr dom,
>>           virObjectUnlock(vm);
>>       virObjectUnref(caps);
>>       virObjectUnref(cfg);
>> +    virNWFilterUnlockFilterUpdates();
>
> [1] Release done after cfg unref
>
>>       return ret;
>>   }
>>
>> @@ -14510,6 +14519,8 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
>>        * and use of FORCE can cause multiple transitions.
>>        */
>>
>> +    virNWFilterReadLockFilterUpdates();
>> +
>
> [1] This one is done before GetConfig
>
>>       if (!(vm = qemuDomObjFromSnapshot(snapshot)))
>>           return -1;
>>
>> @@ -14831,6 +14842,7 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
>>           virObjectUnlock(vm);
>>       virObjectUnref(caps);
>>       virObjectUnref(cfg);
>> +    virNWFilterUnlockFilterUpdates();
>
> [1] Release done after cfg unref
>
>>
>>       return ret;
>>   }
>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>> index 94a4cf6..18242ae 100644
>> --- a/src/qemu/qemu_migration.c
>> +++ b/src/qemu/qemu_migration.c
>> @@ -2666,6 +2666,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver,
>>               goto cleanup;
>>       }
>>
>> +    virNWFilterReadLockFilterUpdates();
>> +
>
> Lots of code before we get here - what's the locking
>
>>       if (!(vm = virDomainObjListAdd(driver->domains, *def,
>>                                      driver->xmlopt,
>>                                      VIR_DOMAIN_OBJ_LIST_ADD_LIVE |
>> @@ -2825,6 +2827,7 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver,
>>           qemuDomainEventQueue(driver, event);
>>       qemuMigrationCookieFree(mig);
>>       virObjectUnref(caps);
>> +    virNWFilterUnlockFilterUpdates();
>
> Yet another case where we could reach the cleanup without having ever
> obtained the ReadLock
>
>>       return ret;
>>
>>    stop:
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 26d4948..409a672 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>
> This one seems OK - although I think I'd make it a separate patch and of
> course explain the lock ordering change
>
>
> John
>> @@ -3422,6 +3422,8 @@ qemuProcessReconnect(void *opaque)
>>
>>       VIR_FREE(data);
>>
>> +    virNWFilterReadLockFilterUpdates();
>> +
>>       virObjectLock(obj);
>>
>>       cfg = virQEMUDriverGetConfig(driver);
>> @@ -3573,6 +3575,7 @@ qemuProcessReconnect(void *opaque)
>>
>>       virObjectUnref(conn);
>>       virObjectUnref(cfg);
>> +    virNWFilterUnlockFilterUpdates();
>>
>>       return;
>>
>> @@ -3608,6 +3611,7 @@ qemuProcessReconnect(void *opaque)
>>       }
>>       virObjectUnref(conn);
>>       virObjectUnref(cfg);
>> +    virNWFilterUnlockFilterUpdates();
>>   }
>>
>>   static int
>>




More information about the libvir-list mailing list