[libvirt] [PATCH] LXC: create a bind mount for sysfs when enable userns but disable netns

Richard Weinberger richard at nod.at
Wed Mar 11 09:36:05 UTC 2015


Am 11.03.2015 um 03:30 schrieb Chen, Hanxiao:
>>> @@ -826,8 +829,25 @@ static int lxcContainerMountBasicFS(bool userns_enabled)
>>>          bool bindOverReadonly;
>>>          virLXCBasicMountInfo const *mnt = &lxcBasicMounts[i];
>>>
>>> +        /* When enable userns but disable netns, kernel will
>>> +         * forbid us doing a new fresh mount for sysfs.
>>> +         * So we had to do a bind mount for sysfs instead.
>>> +         */
>>> +        if (userns_enabled && netns_disabled &&
>>> +            STREQ(mnt->src, "sysfs")) {
>>> +            if (VIR_STRDUP(mnt_src, "/sys") < 0) {
>>> +                goto cleanup;
>>> +            }
>>
>> This is clearly broken and looks very untested to me.
>>
> It's broken now.
> But when I submitted this patch last year, it's not.

Are you sure?
Just built libvirt v1.2.6-222-ga86b621, head is
commit a86b6215a74b1feb2667204e214fbfd2f7decc5c
Author: Chen Hanxiao <chenhanxiao at cn.fujitsu.com>
Date:   Mon Jul 14 18:01:51 2014 +0800

    LXC: create a bind mount for sysfs when enable userns but disable netns

/sys is still an empty directory but as at this time (most likely due to another bug)
libvirt was able to create /sys/fs/cgroup and mounted groups there.
But no sysfs at all is at /sys.

I mean, how is this supposed to work? You bind mount /sys over /sys...

>> It will issue this mount call:
>> mount("/sys", "/sys", "sysfs", MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_BIND, NULL)
>> because the code runs after pivot_root(2).
>> i.e, /sys will be still empty after that and no sysfs at all there.
>> As libvirt will later remount /sys readonly creating a container will
>> fail with the most useless error message:
>> Error: internal error: guest failed to start: Unable to create
>> directory /sys/fs/: Read-only file system
>> or
>> Error: internal error: guest failed to start: Unable to create
>> directory /sys/fs/cgroup: Read-only file system
>>
>> Please note that changing "/sys" to "/.oldroot/sys" will not solve the
>> issue as this code runs already in the new
>> namespace and therefore the old mount tree is locked, thus MS_BIND is
>> not allowed.
>>
>> This brings me to the question, why do you handle the netns_disabled
>> case anyway?
> 
> Please check the discussion at:
> http://lists.linux-foundation.org/pipermail/containers/2014-July/034721.html
> 
>> If in the XML file no network is specified just create a new and empty
>> network namespace.
>> Bindmounting /sys into the container is a security issue. This is why
>> mounting sysfs without a netns
>> was disabled to begin with.
> 
> Yes, I tried to propose enable netns by default,
> but Dan thought that we should allow containers sharing the host's network:
> http://www.redhat.com/archives/libvir-list/2013-August/msg01025.html
> 
> So we should allow user create containers without netns,
> they should know what they do if they read libvirt's docs
> See docs patch describe security considerations:
> http://www.redhat.com/archives/libvir-list/2013-September/msg00562.html

Thanks for the links!
//richard




More information about the libvir-list mailing list