[libvirt] [PATCH v2 01/11] security: Always spawn process for transactions

Fri Oct 12 14:20:58 UTC 2018

On 10/12/18 4:45 AM, Michal Privoznik wrote:
> On 10/11/2018 11:06 PM, John Ferlan wrote:
>>
>>
>> On 10/11/18 8:03 AM, Michal Privoznik wrote:
>>> In the next commit the virSecurityManagerMetadataLock() is going
>>> to be turned thread unsafe. Therefore, we have to spawn a
>>> separate process for it. Always.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>>> ---
>>>  src/security/security_dac.c     | 12 ++++++------
>>>  src/security/security_selinux.c | 12 ++++++------
>>>  2 files changed, 12 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/src/security/security_dac.c b/src/security/security_dac.c
>>> index da4a6c72fe..21db3b9684 100644
>>> --- a/src/security/security_dac.c
>>> +++ b/src/security/security_dac.c
>>> @@ -562,12 +562,12 @@ virSecurityDACTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED,
>> ^^^^
>>
>> The virSecurityDACTransactionCommit description would need some
>> adjustment.... Well a lot of adjustment... and the caller
>> virSecurityManagerTransactionCommit would also require some adjustment.
>>
>>>          goto cleanup;
>>>      }
>>>  
>>> -    if ((pid == -1 &&
>>> -         virSecurityDACTransactionRun(pid, list) < 0) ||
>>> -        (pid != -1 &&
>>> -         virProcessRunInMountNamespace(pid,
>>> -                                       virSecurityDACTransactionRun,
>>> -                                       list) < 0))
>>> +    if (pid == -1)
>>> +        pid = getpid();
>>> +
>>> +    if (virProcessRunInMountNamespace(pid,
>>> +                                      virSecurityDACTransactionRun,
>>> +                                      list) < 0)
>>
>> As described in the v1 series cover letter, namespaces would need to be
>> enabled for this to work.
> 
> Not really. I mention namespaces just to show that on majority of setups
> we do fork() when relabeling paths anyway. Some might be worried that
> fork()-ing is very expensive and therefore might be against this
> approach. I am trying to prove them otherwise. In fact, this patch is
> just about that - fork() every time, even when namespaces are disabled.

Considering the number of fork's out there anyway, that expense wasn't
part of my consideration. It's a means to an end.

If someone disabled Namespace's for whatever reason, then how would this
path work?

    virProcessRunInMountNamespace => virProcessNamespaceHelper (child)

I get that it's "just" a fork, but my point is more that the namespace
is Namespace and if that's not true, then isn't the premise of the patch
not true?

And no IDC about non Linux (where Namespace wouldn't be available) since
non Linux wouldn't have DAC and/or SELinux either.

Perhaps the light dawning over marblehead moment for me is that the
security driver will use Namespaces similar to how the QEMU driver uses
them, but instead in the context of whatever process/thread that isn't
necessarily the domain process (in this case the libvirtd process).

Since it's been proven that the code works for the QEMU domain
processing, ergo it should "just work" for the libvirtd security driver
processing. That's very different from the commit message above and
perhaps the "assumption" I've been thinking under. There isn't a
"relationship" between the qemu.conf usage of namespaces and what's
being done here except for the libvirtd process as opposed to the qemu-*
process will use fork's and Namespaces.

Something perhaps for both the commit message *and* at least the two
modules that change here to more clearly describe the @pid usage. The
virSecurityManagerTransactionCommit will need adjustment too in order to
indicate that the underlying code will run in the Namespace specified by
the caller rather than the Namespace of the caller (a really vague, but
important difference).

> 
> Again, I have no numbers to support that it's the majority, but since
> namespaces are turned on by default (and no new bugs regarding
> namespaces were to be seen recently) I am assuming that is the case.
> 
>> Whether they are enabled true everywhere,
>> who's to say. I always assumed namespaces were used for a somewhat
>> narrow band of things and didn't pay that close attention to them.
> 
> No. Unless you take the extra step and disable them in qemu.conf you're
> using them. 

Well disabling isn't really well described in the qemu.conf description,
but I'm fairly certain it's by uncommenting the entry and removing the
"mount" from the [ ], leaving just "namespaces = [ ]"

For some more data points using the previously failing guest that now
works with the alteration below.

Running with the "default":

# nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
16
# ls -l /dev | wc -l
269
#

By changing to "namespaces = [ ]", I get:

# nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
269
# ls -l /dev | wc -l
269
#

Which I'm not sure I expected, but maybe that's right.

Next, if I run without the changes in this series and adding a couple of
debug messages to show when virProcessNamespaceHelper is called, when
namespaces = [ ] there are no calls to to the helper; whereas, there are
two using the default (I assume one each for DAC and SELinux).

With this series applied, when namespaces = [ ] there are 8 calls to to
the helper (all from the "parent" gdb pid); whereas, for the default the
last 2 come from the domain.

I assume that's all expected (at this point).

> And the fact, that you haven't noticed feels like pat on a
> shoulder. Good job :-) You can verify this by running the following:
> 
> # nsenter -m -t $(pgrep qemu) ls -l /dev
> 
> (works if there's just one qemu process, or just replace pgrep with qemu
> pid)
> 
> You will see very few items in /dev compared to how 'real' /dev looks:
> 
> # nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
> 15
> # ls -l /dev | wc -l
> 188
> 
>>
>> I will note that will a couple of well placed VIR_WARN()'s in each of
>> these functions, I can see DAC and SELinux transactionCommit's getting
>> called. Before the guest actually starts they're called with -1, then
>> once it starts there's a single call with the domain PID. So I guess I'm
>> using NS and didn't know.
> 
> Yes. However, during the cmd line construction process some files are
> created and qemuSecurityDomainSetPathLabel() is called even if there is
> no qemu running yet. Therefore, security driver sees pid == -1. For
> instance, qemuDomainWriteMasterKeyFile() is such function. But this is
> not a problem (from metadata locking POV).
> 
>>
>> However, looking at the qemu.conf namespace setting I note things like
>> "privileged" and "!defined(HAVE_SYS_ACL_H) || !defined(WITH_SELINUX)"
>> So this all then assumes MountNamespace is being used. And what if it
>> isn't? Does everything that's security related start failing? and the
>> only way to back that out is revert this?
> 
> No. Namespaces have nothing to do with this feature. Namespaces are
> Linux feature and therefore when qemu driver initializes itself it has
> to make decision on the defaults. And currently the default is to enable
> namespaces if supported by host (= Linux && (at least one of {ACL,
> SELinux} was enabled in the build) ). I'm not going to explain that
> process now as it has nothing to do with this feature.
> 
>>
>> I guess, what I don't understand is why this usage pattern cannot be
>> limited to meta locking. That is, if we're metalocking, then add the
>> pid; otherwise, business as usual.  At least that limits the scope.
> 
> Namespaces and metadata locking are orthogonal features.
> 

As I note above, I think I've had the light dawning moment now. I think
prior to this point I've been under the impression that the QEMU domain
setting for namespaces was somehow "affecting" or "related to" what's
being done/changed here. In reality it's making use or reusing the same
code that QEMU uses.

Still the one reservation is - what happens if setns isn't available.
For that I think you need to somehow decide that it isn't available and
just pass -1, unless you have some other grand idea.

Whether you build in some "disable" switch just in case is up to you.
You feel confident that things will just work and I can agree, but you
know there are pessimists that may think otherwise.

John

>>
>> FWIW:
>> In my "limited" testing, I have a guest that failed to start with this
>> code running:
>>
>> # virsh start f23
>> error: Failed to start domain f23
>> error: internal error: Child process (6767) unexpected fatal signal 11
> 
> Ouch. After some debugging I've found the problem. Previously, NULL
> paths were not passed to virSecurityManagerMetadataLock() because they
> were filtered out in DAC and SELinux drivers. But in v2 I've
> de-duplicated the code. And the first thing that
> virSecurityManagerMetadataLock() does is sort the paths (in order to
> avoid AB/BA locking issue). For that, I've used qsort() with plain
> strcmp() which can't handle NULL strings. The fix is quite easy:
> 
> diff --git i/src/security/security_manager.c
> w/src/security/security_manager.c
> index c054c74540..1f3e43b246 100644
> --- i/src/security/security_manager.c
> +++ w/src/security/security_manager.c
> @@ -1257,8 +1257,17 @@ struct _virSecurityManagerMetadataLockState {
>  static int
>  cmpstringp(const void *p1, const void *p2)
>  {
> +    const char *s1 = *(char * const *) p1;
> +    const char *s2 = *(char * const *) p2;
> +
> +    if (!s1 && !s2)
> +        return 0;
> +
> +    if (!s1 || !s2)
> +        return s2 ? -1 : 1;
> +
>      /* from man 3 qsort */
> -    return strcmp(* (char * const *) p1, * (char * const *) p2);
> +    return strcmp(s1, s2);
>  }
> 
>  #define METADATA_OFFSET 1
> 
> >
> I've squashed this into patch #2.
> 
> Michal
>