[libvirt] [RFC] secdrivers remembering original labels

Mon Aug 6 08:02:44 UTC 2018

On 07/27/2018 11:01 AM, Daniel P. Berrangé wrote:
> On Fri, Jul 27, 2018 at 09:56:23AM +0200, Michal Privoznik wrote:
>> Dear list,
>>
>> we have this very old bug [1] that I just keep pushing in front of me. I
>> made several attempts to fix it. However, none of them made into the
>> tree. I guess it's time to have discussion what to do about it. IIRC,
>> the algorithm that I implemented last was to keep original label in
>> XATTRs (among with some ref counter) and the last one to restore the
>> label will look there and find the original label. There was a problem
>> with two libvirtds fighting over a file on shared FS.
> 
> IIRC there was a second problem around security of XATTRs. eg, if we set
> an XATTR it was possible for QEMU to change it once we given QEMU privs
> on the image. Thus a malicious QEMU can cause libvirtd to change the
> image to an arbitrary user on shutdown.
> 
> I think it was possible to deal with that on Linux, because the kernel
> hsa some xattr namespacing that is reserved for only root. This protection
> is worthless though when using NFS images as you can't trust the remote NFS
> client to honour the same restriction.
> 
>> So I guess my question is can we come up with a design that would work?
>> Or at least work to the extent that we're satisfied with?
>>
>> Personally, I like the XATTRs approach. And to resolve the NFS race we
>> can invent yet another lockspace to guard labelling - I also have a bug
>> for that [2] (although, I'm not that familiar with lockspaces). I guess
>> doing disk metadata locking is not going to be trivial, is it?
> 
> Yeah, for sure we need two distinct locking regimes. Our current locking
> aims to protect disk content, and the lifetime of the lock is the entire
> duration of the QEMU process. For XATTRs, we only need write protection
> for the short time window in which the security drivers execute, which
> is at start, during hotplug/unplug, during shutdown, and finally migration.
> 
> I think we could achieve this with the virtlockd if we make it use the
> same locking file, but a different byte offset within the file. Just
> need to make sure it doesn't clash with the byte offset QEMU has chosen,
> nor the current offset we use.

So do we really need virtlockd for this? I mean, the whole purpose of it
is to hold locks so that libvirtd can restart independently, without
losing the lock attached. However, since the metadata lock will be held
only for a fraction of a second and will be not held through more than a
single API aren't we just fine with libvirtd holding the lock? The way I
imagine this to work is the following:

lock_for_metadata(vm->def); // locks an unique byte in all the disks

qemuSecuritySetAllLabel();

...

qemuProcessStartCPUs();  // disk content lock is acquired here

unlock_for_metadata(vm->def);

Another way to look at it is: it's libvirtd who relabels all the paths
and the metadata lock should guard just that. Speaking of which, we need
these locks to wait for each other, not just set-or-fail. Again,
something that goes against virtlockd view of locking.

Frankly, I've started implementing this with virtlockd already, but the
changes I made so far simply do not feel right, e.g. I have to change
lock_protocol.x so that virLockSpaceProtocolAcquireResourceArgs has this
'type' argument which tells virtlockd which byte we want to lock.

Michal