[libvirt] [PATCH 10/14] Avoid unsafe use of /proc/$PID/root in LXC disk hotplug

Eric Blake eblake at redhat.com
Fri Feb 7 18:22:12 UTC 2014


On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
> Rewrite lxcDomainAttachDeviceDiskLive function to use the
> virProcessRunInMountNamespace helper. This avoids risk of
> a malicious guest replacing /dev with a absolute symlink,
> tricking the driver into changing the host OS filesystem.
> 
> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
> ---
>  src/lxc/lxc_driver.c | 183 ++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 139 insertions(+), 44 deletions(-)
> 


> +    }
> +
> +
> +    /* Yes, the device name we're creating may not

Why two blank lines?

> +
> +    if (lxcContainerChown(data->vm->def, data->file) < 0)
> +        goto cleanup;

Hmm.  Calling _() can trigger malloc(), which is not async-safe in
general.  However, at least with glibc, I've never seen malloc deadlock
because of use after multi-threaded fork, and we already have other
spots in the code base that would also need cleaning if we truly want to
avoid malloc between fork and exec.

> +
> +    /* Labelling normally operates on src, but we need
> +     * to actually label the dst here, so hack the config */
> +    switch (data->def->type) {
> +    case VIR_DOMAIN_DEVICE_DISK: {
> +        virDomainDiskDefPtr def = data->def->data.disk;
> +        char *tmpsrc = def->src;
> +        def->src = data->file;
> +        if (virSecurityManagerSetImageLabel(data->driver->securityManager,
> +                                            data->vm->def, def) < 0) {

Wow, that's quite the call stack that gets pulled in, when auditing for
async safety.  In particular, I have no idea if setfilecon_raw() and
getfilecon_raw() are safe.  We might be able to argue that because you
grab virSecurityManagerPreFork, then no other libvirt threads will be
using getfilecon_raw() at the same time; and since this code is only run
in libvirtd, we don't have to worry about non-libvirt threads in the
context of a larger application that linked in libvirt.so.  So I _think_
we are safe.

> +            def->src = tmpsrc;
> +            goto cleanup;
> +        }
> +        def->src = tmpsrc;
> +    }   break;
> +

No real need to put def back to normal - we're in a forked child and
about to exit.  But doesn't hurt either, in case this gets copied and
pasted to something in parent context where it does matter.

> +static int lxcDomainAttachDeviceMknod(virLXCDriverPtr driver,

Style nit for breaking this to two lines, but not a showstopper.


>  
> -    if (lxcContainerChown(vm->def, dst) < 0)
> +    if (VIR_REALLOC_N(vm->def->disks, vm->def->ndisks+1) < 0)

spacing around +


> +    if (lxcDomainAttachDeviceMknod(driver,
> +                                   0700 | S_IFBLK,
> +                                   sb.st_rdev,
> +                                   vm,
> +                                   dev,
> +                                   file) < 0) {
> +        if (virCgroupDenyDevicePath(priv->cgroup, def->src,
> +                                    VIR_CGROUP_DEVICE_RWM) < 0)
> +            VIR_WARN("cannot deny device %s for domain %s",
> +                     def->src, vm->def->name);

This denies by path, while the allow was fixed to be by major:minor,
which means you aren't denying what you thought.  This needs to deny by
major:minor.

Do we need to audit this cleanup action?

>          goto cleanup;
>      }
>  
> @@ -3736,11 +3834,8 @@ lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver,
>      ret = 0;
>  
>  cleanup:
> -    def->src = tmpsrc;
>      virDomainAuditDisk(vm, NULL, def->src, "attach", ret == 0);

As written, you only audit the grant, not the corresponding deny on
failure.  That's a pre-existing issue, not made worse by your patch, but
something to consider whether it warrants a v2 or a followup patch to fix.

My overall thoughts:

If we had a way to do _just_ the mknod, then open the file, and pass the
fd back to the parent, then do labeling on the fd from the parent
context (rather than on the path in the child context), it would make
for a smaller child action easier to audit.  But I'm not sure that would
get the labeling right - it looks like we have to label the actual path
name in the child.  Or even if selinux took a leaf from openat() and
friends, and gave us the ability to do actions on a name relative to an
fd, then all we'd need to do is fork, change namespace, open the fd of
the container directory, pass that back, then do the remaining options
in the parent, where life is much easier.

But lacking either of those options, I think your approach is the best
we can do.  Please fix the cgroup deny, then I can give a reluctant:

ACK.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 604 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20140207/2c953c87/attachment-0001.sig>


More information about the libvir-list mailing list