[libvirt] [PATCHv2] lxc: give RW access to /proc/sys/net/ipv[46] to containers

Richard Weinberger richard at nod.at
Sat Dec 13 22:27:17 UTC 2014


Am 12.12.2014 um 10:33 schrieb Daniel P. Berrange:
> On Thu, Dec 11, 2014 at 10:06:40PM +0100, Richard Weinberger wrote:
>> On Tue, Dec 9, 2014 at 10:47 AM, Cédric Bosdonnat <cbosdonnat at suse.com> wrote:
>>> Some programs want to change some values for the network interfaces
>>> configuration in /proc/sys/net/ipv[46] folders. Giving RW access on them
>>> allows wicked to work on openSUSE 13.2+.
>>>
>>> In order to mount those folders RW but keep the rest of /proc/sys RO,
>>> we add temporary mounts for these folders before bind-mounting
>>> /proc/sys. Those mounts will be skipped if the container doesn't have
>>> its own network namespace.
>>>
>>> It may happen that one of the temporary mounts in /proc/ filesystem
>>> isn't available due to a missing kernel feature. We need not to fail
>>> in that case.
>>
>> IMHO we should drop the read-only /proc mount completely.
>> The idea behind having a read-only /proc was to make a container less
>> insecure because user namespaces did not exist yet.
> 
> Yep, read-only /proc was a (failed) attempt to predict the future - we
> originally expected we'd need that even when user namespaces arrived,
> but of course in the end it was a waste of time.

Correct. Let's reduce this waste of time and don't add more code. :-)

>> Now as user namespaces are mainline and considered stable we should
>> start dropping such hacks
>> instead of adding more of them.
> 
> I'm trying to think if there are any backwards compatibility problems
> if we got rid of read-only /proc but I can't imagine any app out there
> is actively checked for a read-only /proc, so we'd probably be safe
> to just switch it read-write.

Same here.
I'd be astonished if an application will break if you make /proc rw.
BTW: While we are here, let's make /sys/ also rw.
Again, if an application can do bad things, this is a plain kernel bug.

>> As consequence of that libvirt has to decide what kind of container it
>> wants to support.
>> IMHO the only sane way is to enforce user namespaces to provide
>> reasonable isolation.
>> If an user can do bad things with a read-write /proc it need to be
>> fixed in the kernel
>> and not in libvirt.
>>
>> Containers without user namespaces and a root within are insecure and
>> broken by design.
> 
> Well addition of MAC can make them secure, but of course if you have
> MAC, there's again no need to make /proc mount read-only.

The MAC policy has to be *perfect* and has to use white listing.
Also if you make your MAC too restrictive you'll break certain programs.
You need more than just deny access to some magic files in /sys and /proc.
If you deny for example mount(2) many applications will break, most notable systemd.

I propose the following:
a) Make /sys and /proc read-write
b) If one create a container without and uid/g mapping print a big fat warning
that such a container is not suitable for hostile guests.
If the user has a specific use case where he can trust all guests, fine. But we
have to document it clearly.
Maybe a new config flag a la <i_know_what_i_m_doing/> would help too. ;-)

Thanks,
//richard




More information about the libvir-list mailing list