[libvirt] [PATCH 2/2] virDomainLockManagerAddImage: Recursively lock backing chain

Wed Sep 9 08:13:24 UTC 2015

On 08.09.2015 18:04, Daniel P. Berrange wrote:
> On Tue, Sep 08, 2015 at 05:59:45PM +0200, Peter Krempa wrote:
>> On Tue, Sep 08, 2015 at 17:17:19 +0200, Michal Privoznik wrote:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1192399
>>>
>>> It's known fact for a while now that we should not only lock the
>>> top level layers of backing chain but the rest of it too. And
>>> well known too that we are not doing that. Well, up until this
>>> commit. The reason is that while one guest can have for instance
>>> the following backing chain: A (top) -> B -> C (bottom), the
>>> other can be started just over B or C. But libvirt should prevent
>>> that as it will almost certainly mangle the backing chain for the
>>> former guest. On the other hand, we want to allow guest with the
>>
>> Well, the corruption may still happen since if you run the guest that
>> uses image 'B' for writing, without starting the one that uses 'A' the
>> image 'A' will be invalidated anyways.
> 
> Yep, the lock manager won't protect against you doing stupid
> stuff to the base image, which invalidates children, but that's
> out of scope really. We only need to consider the concurrent
> running VM problem here.

Agreed. There's no way for us to see if the layer we are looking at is
top layer or not. On snapshot creation top layer is not modified, there
are no reverse records. Unless we want to search each file on the system
to see if there's 'A' when starting from 'B', it's not an argument
against my patch.

> 
> 
>>> following backing chain: D (top) -> B -> C (bottom), because in
>>> both cases B and C are there just for reading. In order to
>>> achieve that we must lock the rest of backing chain with
>>> VIR_LOCK_MANAGER_RESOURCE_SHARED flag.
>>
>> See below ...
> 
>>> +
>>> +        if (!src->path)
>>> +            break;
>>> +
>>> +        VIR_DEBUG("Add disk %s", src->path);
>>> +        if (virLockManagerAddResource(lock,
>>> +                                      VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK,
>>> +                                      src->path,
>>> +                                      0,
>>> +                                      NULL,
>>> +                                      diskFlags) < 0) {
>>> +            VIR_DEBUG("Failed add disk %s", src->path);
>>> +            goto cleanup;
>>> +        }
>>> +
>>> +        /* Now that we've added the first disk (top layer of backing chain)
>>> +         * into the lock manager, let's traverse the rest of the backing chain
>>> +         * and lock each layer for RO. This should prevent accidentally
>>> +         * starting a domain with RW disk from the middle of the chain. */
>>> +        diskFlags = VIR_LOCK_MANAGER_RESOURCE_SHARED;
>>
>> The VIR_LOCK_MANAGER_RESOURCE_SHARED flag means "shared, writable" mode
>> according to lock_driver.h, I think you want to lock it with
>> VIR_LOCK_MANAGER_RESOURCE_READONLY.
> 
> Yep, backing files should be marked readonly - 'shared' will allow
> multiple VMs to write to the image at once. So you've only provided
> mutual exclusion between an exclusive writer, vs a shared writer.

Except that both of our locking drivers threat
VIR_LOCK_MANAGER_RESOURCE_READONLY as no-op.

But I think that since shared and exclusive writer are mutually
exclusive, it doesn't matter that the lower layers are locked as shared
writer. As soon as somebody wants to lock it exclusively they will fail.
Nor virtlockd nor sanlock is actually enforcing the RO vs RW policy (if
a file is locked for RO no writes are allowed). It's merely for tracking
shared and exclusive locks.

Michal