[libvirt-users] Libvirt-LXC + systemd + user namespace
Daniel P. Berrange
berrange at redhat.com
Wed Jan 29 11:40:22 UTC 2014
On Wed, Jan 29, 2014 at 12:35:25PM +0100, Piotr Bartosiewicz wrote:
>
> On 28.01.2014 12:46, Daniel P. Berrange wrote:
> >On Tue, Jan 28, 2014 at 12:32:41PM +0100, Jan Olszak wrote:
> >>Hi there!
> >>
> >>I am trying to turn on user namespace by adding following lines to the
> >>config:
> >>
> >>
> >> <idmap>
> >>
> >> <uid start='0' target='0' count='100000'/>
> >>
> >> <gid start='0' target='0' count='100000'/>
> >>
> >> </idmap>
> >>
> >>
> >>As you can see the root in container is mapped to the root outside. I was
> >>expected to see no difference after adding this lines, but unfortunately
> >>there are some (see details below).
> >>
> >>Am I missing something or is there a problem with system, libvirt or kernel?
> >I've not had any chance to try LXC + user namespaces + systemd yet, but
> >based on the list of things which fail, it seems like it might not be
> >detecting that it is inside a container. Seems almost like it has still
> >got the CAP_MKNOD permission and so is strying to start things it should
> >not have like udev, and various filesystems.
> >
> >Daniel
>
> I was able to reduce the problem by not using libvirt nor systemd.
>
> I've created a bash process inside user namespace with mapping
> root_inside<->root_outside.
> I've used a program from https://lwn.net/Articles/532593/ :
> ./userns_child_exec -U -M '0 0 1' -G '0 0 1' bash
> This program simply calls clone with CLONE_NEWUSER flag and set
> proper uid_map and gid_map.
>
> The test commands are as follows:
> mkdir /test
> mount debugfs /test -t debugfs
>
> and strace shows:
> mount("debugfs", "/test", "debugfs", MS_MGC_VAL, NULL) = -1 EPERM
> (Operation not permitted)
>
>
> Now the question is:
> Is it a kernel bug or expected behavior ie. inside user namespace we
> have always limited permissions even if uid=0 inside container is
> mapped to uid=0 outside?
uid==0 inside the container will not have exactly the same
permissions as uid==0 in the host. The reason is due to the
way the kernel is checking capabilities. When a syscall
requires CAP_SYS_ADMIN, for example, the kernel will either
use capable(CAP_SYS_ADMIN) which only succeeds in the host,
or ns_capable(CAP_SYS_ADMIN) which is allowed to suceed in
the container.
Different filesystems have differing restrictions, but at
this time the vast majority of filesystems require that
capable(CAP_SYS_ADMIN) succeeed and thus you can only
mount them in the host.
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the libvirt-users
mailing list