[libvirt] [PATCH v1 09/23] lock_protocol: Add two new remote procedures

Mon Oct 19 13:12:02 UTC 2015

On 10/19/2015 01:49 AM, Michal Privoznik wrote:
> On 16.10.2015 21:58, John Ferlan wrote:
>>
>>
>> On 10/16/2015 10:54 AM, Michal Privoznik wrote:
>>> On 16.10.2015 09:17, Peter Krempa wrote:
>>>> On Mon, Oct 12, 2015 at 12:25:54 +0200, Michal Privoznik wrote:
>>>>> These procedures will be used to store and bring back security
>>>>> labels. So far, the idea is that tuple (path, model, label) is
>>>>> enough. Well, certainly for DAC and SELinux. The functions are:
>>>>
>>>> I'm afraid that a 'path' per-se isn't enough to express storage
>>>> locations in a unique way. Said that a string is fine though, but I'd
>>>> rather call it 'identifier' or something similar and explicitly
>>>> document some formats for possibly other storage systems that might use
>>>> security labeling but don't have a local representation in the host.
>>>>
>>>> I'd imagine something like:
>>>> STORAGE_SYSTEM_UUID:VOLUME_UUID
>>>
>>> I can imagine this, okay.
>>>
>>>> and perhaps
>>>> HOST_UUID:/path for local files
>>>
>>> Hm... what good it will have to store UUID among with the path? I mean,
>>> we can't know if two paths from two different hosts is the same file or
>>> not anyway.
>>>
>>>>
>>>> One other thing to consider is that if this approach will be used across
>>>> multiple hosts the paths although equal string-wise might not result
>>>> being the same file. Not honoring that would result into security
>>>> implications.
>>>
>>> What security implications do you have in mind?
>>>
>>> On the other hand, I just realized that this approach will not fly. I
>>> mean, virtlockd is running per host. So consider a file on NFS share,
>>> and to make things more complicated assume it's accessible under
>>> different paths from two different hosts. There is still a race between
>>> those two virtlockd-s - they will not mutually exclude each other and
>>> therefore can store already chown()-ed owership info. For instance:
>>>
>>> host1: stat()
>>> host1: store ownership (e.g. 100:20)
>>> host1: chown (e.g. to 20:20)
>>>
>>> host2: stat()
>>> host2: store ownership (20:20)
> 
> host2: chown (e.g. to 20:40)
> 
>>>
>>> host1: recall ownership (100:20)
>>> host1: restore ownership (100:20)
> 
> (this will cut off host2)
> 
>>>
>>> host2: recall ownership (20:20)
>>> host2: restore ownership (20:20)
>>>
>>> Both hosts think that they are the last ones to use the file, therefore
>>> they restore ownerships. However, after all the original ownership is
>>> not preserved.
>>
>>
>> Ah - the age old problem of networked and shared storage...  And it's
>> compounded by two hosts that don't know they are modifying the same
>> resource. If realistically the server is the 'only' one to know, then
>> who's problem is this to solve?
>>
>> Say nothing of choosing NFS...  For this one you have to also consider
>> root_squash vs. non root squash environment. The attempt to change
>> anything is managed by the server, but for the sake of your argument
>> it's set up to allow "no_root_squash" - meaning clients can have more
>> control...
>>
>> In your scenario, H2 never chown()'s... So was it trying set to (20:20)
>> - in which case, why would it do that since the stat it would get would
>> show (20:20)... Or does it try to set say (40:20).  If the latter, then
>> H1 is going to have access problems as soon as H2 is successful. If H1
>> then goes to restore and finds the current stat of (40:20), but it
>> believes it set (20:20), then should it restore?  Or does it "try to"
>> restore anyway and then let virtlockd hold onto the restore until H2 is
>> done? Eventually H2 will restore to (20:20) at which time whatever is
>> waiting to restore H1's value finds the (20:20) and can finally
>> successfully restore back to (100:20)...  Conversely if H2 did save and
>> chown (20:20), but when it goes to restore (20:20) and finds (100:20),
>> you wouldn't want some thread hanging around...  However, in this case
>> if the "store" == "restore" and the current is something else, then
>> perhaps the restore gets dropped.
> 
> Oh, yeah. H2 should chown() too. But not necessarily cutting off H1 (it
> still can have access via user while H2 via group part of the permission).
> And we should not consider just DAC labels here. The framework should be
> capable of storing SELinux labels too. In that world, different labels
> (or context ranges) do not necessarily mean EPERM like in DAC. Therefore
> I think we should not check if  current label is the one we set earlier.
> 

Understood that my thoughts were DAC specific... Although a comparison
routine for the specific driver could be called from a generic spot to
determine if it's changed - it was just easier to type it thinking only
DAC though.

Also, consider that when using DAC with uid:gid mapping, the uid:gid on
the client*s* would need to match the uid:gid on the server and the NFS
server would need to be configured properly to allow client*s* to
change. Otherwise, there's some perhaps unintended consequences.

> Moreover, even if we do check on restore if the seclabel has changed
> since the time we set it and if it has do not restore - in my scenario
> the file would has the wrong ownership at the end anyway, wouldn't it? I
> mean, at restore, h1 finds the seclabel has changed (it's 20:40 and it
> has set it to 20:20), so it leaves the file alone. Then, h2 comes around
> and finds the seclabel has NOT changed, so it restores it to what it
> thinks was the original label (20:20).
> 

The nuance for h1 is can it or should it 'wait' to find the label has
reverted back to what it set it to?  Where the wait thread lives I
hadn't thought too much about, but perhaps would be "no different" than
some other libvirt thread that waits for something and then sends an
event when done. It's not a fool proof method for sure, but throwing
spaghetti on the wall to see if anything sticks.

>>
>> Maybe a few drinks will help the thought process more ;-)
>>
>> John
>>>
>>> One way out of this could be that we already require virtlockd in order
>>> to work properly that lockspace is on a shared mountpoint accessible by
>>> both hosts. So instead of keeping seclabel space in memory, we can load
>>> & store it within a file on that mountpoint and guard accesses via
>>> flock(). But this is rather unpleasant as we would have to parse and
>>> format the whole seclabel space just to update a refcounter to some entry.
>>>
>>> The other approach would be to have a seclabel space as a directory on
>>> the shared mountpoint and create some file (with internal structure) per
>>> path remembered. The internal structure of the file would then contain
>>> tuple (model, seclabel, refcount) for each model. When updating the
>>> refcount, only the file would need to be flock()-ed getting us much
>>> higher throughput. But we would still need to parse and format a file on
>>> each access (even though the file would be much smaller).
> 
> I've given this some thoughts and even after few beers (reaching Ballmer
> peak) it still seems like good idea.
> 

So a solution to the NFS problem (or other such shared filesystem) is to
carve out space on that file system? If it's NFS and root squashed -
under which uid:gid does that happen? How "large" is it?  How do you
"guarantee" something doesn't come along and reap it or change it's access?

I would think it should be up to the networked file system to provide
the mechanism for secure access, change, etc.  IOW: Is this really
libvirt's problem to solve? I'm not sure every nuance of every target
server file system could be dealt with properly by some generic method.

If someone wants to use NFS and they want this rollback, can we convince
them to run virtlockd on their NFS Server? And then use that connection
in order to manage things if they really care about this label rollback
feature?  Are we building something too complex to use and/or describe?
Do we really want to be the "owner" of the security problem?

John