[libvirt] [PATCH] LXC: create a bind mount for sysfs when enable userns but disable netns

Richard Weinberger richard.weinberger at gmail.com
Tue Mar 10 20:55:41 UTC 2015


On Mon, Jul 14, 2014 at 12:01 PM, Chen Hanxiao
<chenhanxiao at cn.fujitsu.com> wrote:
> kernel commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e
> forbid us doing a fresh mount for sysfs
> when enable userns but disable netns.
> This patch will create a bind mount in this senario.

Sorry for exhuming an already merged patch but today I ran into a
nasty issue caused by it.

> Signed-off-by: Chen Hanxiao <chenhanxiao at cn.fujitsu.com>
> ---
>  src/lxc/lxc_container.c | 44 +++++++++++++++++++++++++++++++++-----------
>  1 file changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 4d89677..8a27215 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -815,10 +815,13 @@ static int lxcContainerSetReadOnly(void)
>  }
>
>
> -static int lxcContainerMountBasicFS(bool userns_enabled)
> +static int lxcContainerMountBasicFS(bool userns_enabled,
> +                                    bool netns_disabled)
>  {
>      size_t i;
>      int rc = -1;
> +    char* mnt_src = NULL;
> +    int mnt_mflags;
>
>      VIR_DEBUG("Mounting basic filesystems");
>
> @@ -826,8 +829,25 @@ static int lxcContainerMountBasicFS(bool userns_enabled)
>          bool bindOverReadonly;
>          virLXCBasicMountInfo const *mnt = &lxcBasicMounts[i];
>
> +        /* When enable userns but disable netns, kernel will
> +         * forbid us doing a new fresh mount for sysfs.
> +         * So we had to do a bind mount for sysfs instead.
> +         */
> +        if (userns_enabled && netns_disabled &&
> +            STREQ(mnt->src, "sysfs")) {
> +            if (VIR_STRDUP(mnt_src, "/sys") < 0) {
> +                goto cleanup;
> +            }

This is clearly broken and looks very untested to me.

It will issue this mount call:
mount("/sys", "/sys", "sysfs", MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_BIND, NULL)
because the code runs after pivot_root(2).
i.e, /sys will be still empty after that and no sysfs at all there.
As libvirt will later remount /sys readonly creating a container will
fail with the most useless error message:
Error: internal error: guest failed to start: Unable to create
directory /sys/fs/: Read-only file system
or
Error: internal error: guest failed to start: Unable to create
directory /sys/fs/cgroup: Read-only file system

Please note that changing "/sys" to "/.oldroot/sys" will not solve the
issue as this code runs already in the new
namespace and therefore the old mount tree is locked, thus MS_BIND is
not allowed.

This brings me to the question, why do you handle the netns_disabled
case anyway?
If in the XML file no network is specified just create a new and empty
network namespace.
Bindmounting /sys into the container is a security issue. This is why
mounting sysfs without a netns
was disabled to begin with.

P.S: Sorry for the grumpy mail, I've wasted almost the whole day with
debugging that issue.

-- 
Thanks,
//richard




More information about the libvir-list mailing list