[PATCH 00/34] fs: idmapped mounts

Thu Oct 29 16:12:31 UTC 2020

Hi Eric,

On Thu, Oct 29, 2020 at 10:47:49AM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner at ubuntu.com> writes:
> 
> > Hey everyone,
> >
> > I vanished for a little while to focus on this work here so sorry for
> > not being available by mail for a while.
> >
> > Since quite a long time we have issues with sharing mounts between
> > multiple unprivileged containers with different id mappings, sharing a
> > rootfs between multiple containers with different id mappings, and also
> > sharing regular directories and filesystems between users with different
> > uids and gids. The latter use-cases have become even more important with
> > the availability and adoption of systemd-homed (cf. [1]) to implement
> > portable home directories.
> 
> Can you walk us through the motivating use case?
> 
> As of this year's LPC I had the distinct impression that the primary use
> case for such a feature was due to the RLIMIT_NPROC problem where two
> containers with the same users still wanted different uid mappings to
> the disk because the users were conflicting with each other because of
> the per user rlimits.
> 
> Fixing rlimits is straight forward to implement, and easier to manage
> for implementations and administrators.

Our use case is to have the same directory exposed to several
different containers which each have disjoint ID mappings.

> Reading up on systemd-homed it appears to be a way to have encrypted
> home directories.  Those home directories can either be encrypted at the
> fs or at the block level.  Those home directories appear to have the
> goal of being luggable between systems.  If the systems in question
> don't have common administration of uids and gids after lugging your
> encrypted home directory to another system chowning the files is
> required.
> 
> Is that the use case you are looking at removing the need for
> systemd-homed to avoid chowning after lugging encrypted home directories
> from one system to another?  Why would it be desirable to avoid the
> chown?

Not just systemd-homed, but LXD has to do this, as does our
application at Cisco, and presumably others.

Several reasons:

* the chown is slow
* the chown requires somewhere to write the delta in metadata (e.g. an
  overlay workdir, or an LV or something), and there are N copies of
  this delta, one for each container.
* it means we need to have a +w filesystem at some point during
  execution.
* it's ugly :). Conceptually, the kernel solves the uid shifting
  problem for us for most other kernel subsystems (including in a
  limited way fscaps) by configuring a user namespace. It feels like
  we should be able to do the same with the VFS.

Tycho