[libvirt] [PATCH] nwfilter: fix deadlock caused updating network device and nwfilter
Pavel Hrdina
phrdina at redhat.com
Tue Nov 11 16:34:14 UTC 2014
On 11/11/2014 04:13 PM, John Ferlan wrote:
>
>
> On 11/05/2014 09:02 AM, Pavel Hrdina wrote:
>> Commit 6e5c79a1 tried to fix deadlock between nwfilter{Define,Undefine}
>> and starting of guest, but this same deadlock is also for
>> updating/attaching network device to domain.
>
> The referenced commit has a roadmap of sorts of what the lock is -
> perhaps you could add similar details here for before and after.
In that case I can do copy&paste of that roadmap as it's the same and
this patch only extend the referenced one.
>
> While more of a pain logistically - separating out the patches into
> single changes may make it easier to list the bad locking order followed
> by the order the commit changes.
>
> The ones that are most odd are the places where you take out the lock in
> the middle of a function. That seems to go against the original commit
> which takes them out right after the flags check. In particular that's
> the qemuDomainRestoreFlags and qemuMigrationPrepareAny changes.
The order of the nwfilter lock and flags could be the same in all
functions, that's a good point. My intention was to lock the
nwfilterRead right before the vm is locked because the deadlock is
between the vm and nwfilter objects.
However I don't quiet understand why you would like to split the patch
into a series. All the paths leads to the same nwfilter call
"virDomainConfNWFilterInstantiate" and that's the only function callable
outside of nwfilter that could cause the deadlock.
>
>>
>> The deadlock was introduced by removing global QEMU driver lock because
>> nwfilter was counting on this lock and ensure that all driver locks are
>> locked inside of nwfilter{Define,Undefine}.
>>
>> This patch extends usage of virNWFilterReadLockFilterUpdates to prevent
>> the deadlock for all possible paths in QEMU driver. LXC and UML drivers
>> still have global lock.
>>
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1143780
>>
>> Signed-off-by: Pavel Hrdina <phrdina at redhat.com>
>> ---
>>
>> This is temporary fix for the deadlock issue as I'm planning to create global
>> libvirt jobs (similar to QEMU domain jobs) and use it for other drivers and
>> for example for nwfilters too.
>>
>> src/qemu/qemu_driver.c | 12 ++++++++++++
>> src/qemu/qemu_migration.c | 3 +++
>> src/qemu/qemu_process.c | 4 ++++
>> 3 files changed, 19 insertions(+)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index 6acaea8..9e6f505 100644
>
> [1] Usage around virQEMUDriverGetConfig() calls isn't consistent. In
> particular qemuDomainAttachDeviceFlags() will lock/unlock in different
> order.
>
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -5937,6 +5937,8 @@ qemuDomainRestoreFlags(virConnectPtr conn,
>> def = tmp;
>> }
>>
>> + virNWFilterReadLockFilterUpdates();
>> +
>
> So no other locks up to this point are taken? And no need perhaps to
> lock earlier to play follow the leader (eg, the original commit) where
> the lock was taken after the various flags checks.
>
>> if (!(vm = virDomainObjListAdd(driver->domains, def,
>> driver->xmlopt,
>> VIR_DOMAIN_OBJ_LIST_ADD_LIVE |
>> @@ -5978,6 +5980,7 @@ qemuDomainRestoreFlags(virConnectPtr conn,
>> virFileWrapperFdFree(wrapperFd);
>> if (vm)
>> virObjectUnlock(vm);
>> + virNWFilterUnlockFilterUpdates();
>
> So this could get called without ever having set the lock - is that
> correct? We can get to cleanup without calling the ReadLock - probably
> not a good thing since it's indeterminate what happens according to the
> man page.
>
>> return ret;
>> }
>>
>> @@ -7502,6 +7505,8 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const char *xml,
>>
>> affect = flags & (VIR_DOMAIN_AFFECT_LIVE | VIR_DOMAIN_AFFECT_CONFIG);
>>
>> + virNWFilterReadLockFilterUpdates();
>> +
>
> [1] This one is done after the GetConfig
>
>> if (!(caps = virQEMUDriverGetCapabilities(driver, false)))
>> goto cleanup;
>>
>> @@ -7614,6 +7619,7 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const char *xml,
>> virObjectUnlock(vm);
>> virObjectUnref(caps);
>> virObjectUnref(cfg);
>
> [1] But the release is done after the cfg unref
>
>> + virNWFilterUnlockFilterUpdates();
>
>> return ret;
>> }
>>
>> @@ -7644,6 +7650,8 @@ static int qemuDomainUpdateDeviceFlags(virDomainPtr dom,
>> VIR_DOMAIN_AFFECT_CONFIG |
>> VIR_DOMAIN_DEVICE_MODIFY_FORCE, -1);
>>
>> + virNWFilterReadLockFilterUpdates();
>> +
>
> [1] This one is done before the GetConfig
>
>> cfg = virQEMUDriverGetConfig(driver);
>>
>> affect = flags & (VIR_DOMAIN_AFFECT_LIVE | VIR_DOMAIN_AFFECT_CONFIG);
>> @@ -7760,6 +7768,7 @@ static int qemuDomainUpdateDeviceFlags(virDomainPtr dom,
>> virObjectUnlock(vm);
>> virObjectUnref(caps);
>> virObjectUnref(cfg);
>> + virNWFilterUnlockFilterUpdates();
>
> [1] Release done after cfg unref
>
>> return ret;
>> }
>>
>> @@ -14510,6 +14519,8 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
>> * and use of FORCE can cause multiple transitions.
>> */
>>
>> + virNWFilterReadLockFilterUpdates();
>> +
>
> [1] This one is done before GetConfig
>
>> if (!(vm = qemuDomObjFromSnapshot(snapshot)))
>> return -1;
>>
>> @@ -14831,6 +14842,7 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
>> virObjectUnlock(vm);
>> virObjectUnref(caps);
>> virObjectUnref(cfg);
>> + virNWFilterUnlockFilterUpdates();
>
> [1] Release done after cfg unref
>
>>
>> return ret;
>> }
>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>> index 94a4cf6..18242ae 100644
>> --- a/src/qemu/qemu_migration.c
>> +++ b/src/qemu/qemu_migration.c
>> @@ -2666,6 +2666,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver,
>> goto cleanup;
>> }
>>
>> + virNWFilterReadLockFilterUpdates();
>> +
>
> Lots of code before we get here - what's the locking
>
>> if (!(vm = virDomainObjListAdd(driver->domains, *def,
>> driver->xmlopt,
>> VIR_DOMAIN_OBJ_LIST_ADD_LIVE |
>> @@ -2825,6 +2827,7 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver,
>> qemuDomainEventQueue(driver, event);
>> qemuMigrationCookieFree(mig);
>> virObjectUnref(caps);
>> + virNWFilterUnlockFilterUpdates();
>
> Yet another case where we could reach the cleanup without having ever
> obtained the ReadLock
>
>> return ret;
>>
>> stop:
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 26d4948..409a672 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>
> This one seems OK - although I think I'd make it a separate patch and of
> course explain the lock ordering change
>
>
> John
>> @@ -3422,6 +3422,8 @@ qemuProcessReconnect(void *opaque)
>>
>> VIR_FREE(data);
>>
>> + virNWFilterReadLockFilterUpdates();
>> +
>> virObjectLock(obj);
>>
>> cfg = virQEMUDriverGetConfig(driver);
>> @@ -3573,6 +3575,7 @@ qemuProcessReconnect(void *opaque)
>>
>> virObjectUnref(conn);
>> virObjectUnref(cfg);
>> + virNWFilterUnlockFilterUpdates();
>>
>> return;
>>
>> @@ -3608,6 +3611,7 @@ qemuProcessReconnect(void *opaque)
>> }
>> virObjectUnref(conn);
>> virObjectUnref(cfg);
>> + virNWFilterUnlockFilterUpdates();
>> }
>>
>> static int
>>
More information about the libvir-list
mailing list