[libvirt] [PATCH] lxc: Cleaning up mount setup

Daniel P. Berrange berrange at redhat.com
Thu Jan 8 14:06:49 UTC 2015

On Thu, Jan 08, 2015 at 03:02:59PM +0100, Richard Weinberger wrote:
> Am 08.01.2015 um 14:45 schrieb Daniel P. Berrange:
> > On Thu, Jan 08, 2015 at 02:36:36PM +0100, Richard Weinberger wrote:
> >> Am 08.01.2015 um 14:02 schrieb Daniel P. Berrange:
> >>> We have historically done a number of things with LXC that are
> >>> somewhat questionable in retrospect
> >>>
> >>>  1. Mounted /proc/sys read-only, but then mounted
> >>>     /proc/sys/net/ipv* read-write again
> >>>  2. Mounted /sys read only
> >>>  3. Mount /sys/fs/cgroup/NNN/the/guest/dir to /sys/fs/cgroup/NNN
> >>>  4. FUSE mount on /proc/meminfo
> >>>
> >>> Items 1 & 2 are pointless as they offer no security benefit either
> >>> with or without user namespaces. Without userns it is always insecure,
> >>> with userns it is always secure, no matter what the mount state is.
> >>
> >> I agree. Thanks a lot for addressing this, Daniel!
> >>
> >>> Item 3 is some what dubious, since /proc/self/cgroup paths for
> >>> processes are now not visible at /sys/fs/cgroup. This really
> >>> confuses systemd inside the container making it create a broken
> >>> layout
> >>
> >> The question is, how to support systemd in containers?
> >>
> >> As of now I'm not aware of a working concept.
> >> With current libvirt it kind of works but recently I found a very nasty issue:
> >> See: https://www.redhat.com/archives/libvir-list/2014-November/msg01090.html
> > 
> > That reply from Lennart suggests systemd should pretty much work,
> > albeit in a hacky way.
> What hack to you mean?

Lennarts reply detailing their workaround hacks:

"Our current strategy for still being able to clean everything up is

 [snip details]

 Complex? Awful? Disgusting? Yes, absolutely. But as far as I can see
 it should actually be good enough to all cases I ran into."

> > I've not done much in anger with systemd in containers, but I have
> > found it sufficient for application containers - ie not full OS
> > containers with interactive sessions.
> My use case is different. I need most of the time at least an init.
> And if the distro is systemd based....

When I said application containers there, I meant running with systemd,
but setup so it only runs a specific set of unit files enough to launch
the desired app, rather than running the full default Fedora OS unit

> >> Can we have a new machine type which enforces user namespaces?
> > 
> > Hmm, I'm not sure that would work. Not least because we need a way to
> > assume the UID/GID mapping, and the filesystems used with the container
> > need to have the right UID/GID permissions setup. IOW I don't think
> > user ns is something we can transparently / automatically turn on.
> Yeah but we have to warn the user that she is doing something insecure
> if no mappings are set up.

Ultimately I think that's a docs problem, or something that a higher level
app needs to deal with. eg OpenStack should setup LXC such that user
namespaces are unconditionally enabled all the time, even if that's not
the case in libvirt itself. OpenStack manages the whole machine, so it
has enough context to do the setup that libvirt cannot do.

|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

More information about the libvir-list mailing list