[libvirt] [PATCH v1 09/23] lock_protocol: Add two new remote procedures

Mon Oct 19 05:49:16 UTC 2015

On 16.10.2015 21:58, John Ferlan wrote:
> 
> 
> On 10/16/2015 10:54 AM, Michal Privoznik wrote:
>> On 16.10.2015 09:17, Peter Krempa wrote:
>>> On Mon, Oct 12, 2015 at 12:25:54 +0200, Michal Privoznik wrote:
>>>> These procedures will be used to store and bring back security
>>>> labels. So far, the idea is that tuple (path, model, label) is
>>>> enough. Well, certainly for DAC and SELinux. The functions are:
>>>
>>> I'm afraid that a 'path' per-se isn't enough to express storage
>>> locations in a unique way. Said that a string is fine though, but I'd
>>> rather call it 'identifier' or something similar and explicitly
>>> document some formats for possibly other storage systems that might use
>>> security labeling but don't have a local representation in the host.
>>>
>>> I'd imagine something like:
>>> STORAGE_SYSTEM_UUID:VOLUME_UUID
>>
>> I can imagine this, okay.
>>
>>> and perhaps
>>> HOST_UUID:/path for local files
>>
>> Hm... what good it will have to store UUID among with the path? I mean,
>> we can't know if two paths from two different hosts is the same file or
>> not anyway.
>>
>>>
>>> One other thing to consider is that if this approach will be used across
>>> multiple hosts the paths although equal string-wise might not result
>>> being the same file. Not honoring that would result into security
>>> implications.
>>
>> What security implications do you have in mind?
>>
>> On the other hand, I just realized that this approach will not fly. I
>> mean, virtlockd is running per host. So consider a file on NFS share,
>> and to make things more complicated assume it's accessible under
>> different paths from two different hosts. There is still a race between
>> those two virtlockd-s - they will not mutually exclude each other and
>> therefore can store already chown()-ed owership info. For instance:
>>
>> host1: stat()
>> host1: store ownership (e.g. 100:20)
>> host1: chown (e.g. to 20:20)
>>
>> host2: stat()
>> host2: store ownership (20:20)

host2: chown (e.g. to 20:40)

>>
>> host1: recall ownership (100:20)
>> host1: restore ownership (100:20)

(this will cut off host2)

>>
>> host2: recall ownership (20:20)
>> host2: restore ownership (20:20)
>>
>> Both hosts think that they are the last ones to use the file, therefore
>> they restore ownerships. However, after all the original ownership is
>> not preserved.
> 
> 
> Ah - the age old problem of networked and shared storage...  And it's
> compounded by two hosts that don't know they are modifying the same
> resource. If realistically the server is the 'only' one to know, then
> who's problem is this to solve?
> 
> Say nothing of choosing NFS...  For this one you have to also consider
> root_squash vs. non root squash environment. The attempt to change
> anything is managed by the server, but for the sake of your argument
> it's set up to allow "no_root_squash" - meaning clients can have more
> control...
> 
> In your scenario, H2 never chown()'s... So was it trying set to (20:20)
> - in which case, why would it do that since the stat it would get would
> show (20:20)... Or does it try to set say (40:20).  If the latter, then
> H1 is going to have access problems as soon as H2 is successful. If H1
> then goes to restore and finds the current stat of (40:20), but it
> believes it set (20:20), then should it restore?  Or does it "try to"
> restore anyway and then let virtlockd hold onto the restore until H2 is
> done? Eventually H2 will restore to (20:20) at which time whatever is
> waiting to restore H1's value finds the (20:20) and can finally
> successfully restore back to (100:20)...  Conversely if H2 did save and
> chown (20:20), but when it goes to restore (20:20) and finds (100:20),
> you wouldn't want some thread hanging around...  However, in this case
> if the "store" == "restore" and the current is something else, then
> perhaps the restore gets dropped.

Oh, yeah. H2 should chown() too. But not necessarily cutting off H1 (it
still can have access via user while H2 via group part of the permission).
And we should not consider just DAC labels here. The framework should be
capable of storing SELinux labels too. In that world, different labels
(or context ranges) do not necessarily mean EPERM like in DAC. Therefore
I think we should not check if  current label is the one we set earlier.

Moreover, even if we do check on restore if the seclabel has changed
since the time we set it and if it has do not restore - in my scenario
the file would has the wrong ownership at the end anyway, wouldn't it? I
mean, at restore, h1 finds the seclabel has changed (it's 20:40 and it
has set it to 20:20), so it leaves the file alone. Then, h2 comes around
and finds the seclabel has NOT changed, so it restores it to what it
thinks was the original label (20:20).

> 
> Maybe a few drinks will help the thought process more ;-)
> 
> John
>>
>> One way out of this could be that we already require virtlockd in order
>> to work properly that lockspace is on a shared mountpoint accessible by
>> both hosts. So instead of keeping seclabel space in memory, we can load
>> & store it within a file on that mountpoint and guard accesses via
>> flock(). But this is rather unpleasant as we would have to parse and
>> format the whole seclabel space just to update a refcounter to some entry.
>>
>> The other approach would be to have a seclabel space as a directory on
>> the shared mountpoint and create some file (with internal structure) per
>> path remembered. The internal structure of the file would then contain
>> tuple (model, seclabel, refcount) for each model. When updating the
>> refcount, only the file would need to be flock()-ed getting us much
>> higher throughput. But we would still need to parse and format a file on
>> each access (even though the file would be much smaller).

I've given this some thoughts and even after few beers (reaching Ballmer
peak) it still seems like good idea.

Michal