[libvirt] [PATCH v4 00/23] Introduce metadata locking

Thu Sep 27 12:04:11 UTC 2018

On 09/27/2018 01:16 PM, Bjoern Walk wrote:
> Michal Privoznik <mprivozn at redhat.com> [2018-09-27, 12:07PM +0200]:
>> On 09/27/2018 11:11 AM, Bjoern Walk wrote:
>>> I still don't understand why we need a timeout at all. If virtlockd is
>>> unable to get the lock, just bail and continue with what you did after
>>> the timeout runs out. Is this some kind of safety-measure? Against what?
>>
>> When there are two threads fighting over the lock. Imagine two threads
>> trying to set security label over the same file. Or imagine two daemons
>> on two distant hosts trying to set label on the same file on NFS.
>> virtlockd implements try-or-fail approach. So we need to call lock()
>> repeatedly until we succeed (or timeout).
> 
> One thread wins and the other fails with lock contention? I don't see
> how repeatedly trying to acquire the lock will improve things, but maybe
> I am not getting it right now.

The point of metadata locking is not to prevent metadata change, but to
be able to atomically change it. As I said in other thread, there is not
much point in this feature alone since chown() and setfcon() are atomic
themselves. But this is preparing the code for original label
remembering which will be stored in XATTRs at which point the whole
operation is not going to be atomic.

https://www.redhat.com/archives/libvir-list/2018-September/msg01349.html

Long story short, at the first chown() libvirt will record the original
owner of the file into XATTRs and then on the last restore it will use
that owner to return the file to instead of root:root. The locks are
there because reading XATTR, parsing it, increasing/decreasing
refcounter and possibly doing chown() is not atomic. But with locks it is.

> 
>>> There IS a lock held on the image, from the FIRST domain that we
>>> started. The second domain, which is using the SAME image, unshared,
>>> runs into the locking timeout. Sorry if I failed to describe this setup
>>> appropriately. I am starting to believe that this is expected behaviour,
>>> although it is not intuitive.
>>>
>>> 	# lslocks -u
>>> 	COMMAND            PID   TYPE SIZE MODE  M START        END PATH
>>> 	...
>>> 	virtlockd       199062  POSIX 1.5G WRITE 0     0          0 /var/lib/libvirt/images/u1604.qcow2
>>
>> But this should not clash, because metadata lock use different bytes:
>> the regular locking uses offset=0, metadata lock uses offset=1. Both
>> locks lock just one byte in length.
> 
> It's the only lock in the output for the image. Should I see the
> metadata lock at start=1 as well?

You should see it if you run lslocks at the right moment. But since
metadata locking happens only for a fraction of a second, it is very
unlikely that you'll be able to run it at the right moment.

> 
>> However, from the logs it looks like virtlockd doesn't try to actually
>> acquire the lock because it found in internal records that the path is
>> locked (even though it is locked with different offset). Anyway, now I
>> am finally able to reproduce the issue and I'll look into this too.
> 
> Good.
> 
>> Quick and dirty patch might look something like this:
>>
>> diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c
>> index 79fa48d365..3c51d7926b 100644
>> --- i/src/util/virlockspace.c
>> +++ w/src/util/virlockspace.c
>> @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr
>> lockspace,
>>      virMutexLock(&lockspace->lock);
>>
>>      if ((res = virHashLookup(lockspace->resources, resname))) {
>> -        if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) &&
>> -            (flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) {
>> +        if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED &&
>> +            flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) ||
>> +            start == 1) {
>>
>>              if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0)
>>                  goto cleanup;
> 
> Had a quick test with this, but now it seems like that the whole
> metadata locking does nothing. When starting the second domain, I get an
> internal error from QEMU, failing to get the write lock. The SELinux
> labels of the image are changed as well. This is the same behaviour as
> with metadata locking disabled. Not entirely sure what should happen or
> if this is a error in my setup or not. I will have to think about this.

Metadata locking is not supposed to prevent you from running two domains
with the same disk. Nor to prevent chown() on the file.
But since you are using virtlockd to lock the disk contents, it should
prevent running such domains, not qemu. I wonder if this is a
misconfiguration or something. Perhaps one domain has disk RW and the
other has it RO + shared?

Michal