[PATCH 00/34] fs: idmapped mounts

Thu Oct 29 16:37:23 UTC 2020

Aleksa Sarai <cyphar at cyphar.com> writes:

> On 2020-10-29, Eric W. Biederman <ebiederm at xmission.com> wrote:
>> Christian Brauner <christian.brauner at ubuntu.com> writes:
>> 
>> > Hey everyone,
>> >
>> > I vanished for a little while to focus on this work here so sorry for
>> > not being available by mail for a while.
>> >
>> > Since quite a long time we have issues with sharing mounts between
>> > multiple unprivileged containers with different id mappings, sharing a
>> > rootfs between multiple containers with different id mappings, and also
>> > sharing regular directories and filesystems between users with different
>> > uids and gids. The latter use-cases have become even more important with
>> > the availability and adoption of systemd-homed (cf. [1]) to implement
>> > portable home directories.
>> 
>> Can you walk us through the motivating use case?
>> 
>> As of this year's LPC I had the distinct impression that the primary use
>> case for such a feature was due to the RLIMIT_NPROC problem where two
>> containers with the same users still wanted different uid mappings to
>> the disk because the users were conflicting with each other because of
>> the per user rlimits.
>> 
>> Fixing rlimits is straight forward to implement, and easier to manage
>> for implementations and administrators.
>
> This is separate to the question of "isolated user namespaces" and
> managing different mappings between containers. This patchset is solving
> the same problem that shiftfs solved -- sharing a single directory tree
> between containers that have different ID mappings. rlimits (nor any of
> the other proposals we discussed at LPC) will help with this problem.

First and foremost: A uid shift on write to a filesystem is a security
bug waiting to happen.  This is especially in the context of facilities
like iouring, that play very agressive games with how process context
makes it to  system calls.

The only reason containers were not immediately exploitable when iouring
was introduced is because the mechanisms are built so that even if
something escapes containment the security properties still apply.
Changes to the uid when writing to the filesystem does not have that
property.  The tiniest slip in containment will be a security issue.

This is not even the least bit theoretical.  I have seem reports of how
shitfs+overlayfs created a situation where anyone could read
/etc/shadow.

If you are going to write using the same uid to disk from different
containers the question becomes why can't those containers configure
those users to use the same kuid?

What fixing rlimits does is it fixes one of the reasons that different
containers could not share the same kuid for users that want to write to
disk with the same uid.

I humbly suggest that it will be more secure, and easier to maintain for
both developers and users if we fix the reasons people want different
containers to have the same user running with different kuids.

If not what are the reasons we fundamentally need the same on-disk user
using multiple kuids in the kernel?

Eric