[libvirt] [PATCH v2] Add some notes about security considerations when using LXC

Thu Sep 12 03:22:18 UTC 2013

> -----Original Message-----
> From: Daniel P. Berrange [mailto:berrange at redhat.com]
> Sent: Wednesday, September 11, 2013 6:57 PM
> To: libvir-list at redhat.com
> Cc: Gao feng; Chen Hanxiao; Daniel P. Berrange
> Subject: [PATCH v2] Add some notes about security considerations when
using
> LXC
> 
> From: "Daniel P. Berrange" <berrange at redhat.com>
> 
> Describe some of the issues to be aware of when configuring LXC
> guests with security isolation as a goal.
> 
> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
> ---
>  docs/drvlxc.html.in | 103
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 103 insertions(+)
> 
> In v2:
> 
>  - Clarify UNIX domain socket issues wrt filesystem & network namespaces
> 
> diff --git a/docs/drvlxc.html.in b/docs/drvlxc.html.in
> index 1e6aa1d..66d97e4 100644
> --- a/docs/drvlxc.html.in
> +++ b/docs/drvlxc.html.in
> @@ -168,6 +168,109 @@ Further block or character devices will be made
> available to containers
>  depending on their configuration.
>  </p>
> 
> +<h2><a name="security">Security considerations</a></h2>
> +
> +<p>
> +The libvirt LXC driver is fairly flexible in how it can be configured,
> +and as such does not enforce a requirement for strict security
> +separation between a container and the host. This allows it to be used
> +in scenarios where only resource control capabilities are important,
> +and resource sharing is desired. Applications wishing to ensure secure
> +isolation between a container and the host must ensure that they are
> +writing a suitable configuration.
> +</p>
> +
> +<h3><a name="securenetworking">Network isolation</a></h3>
> +
> +<p>
> +If the guest configuration does not list any network interfaces,
> +the <code>network</code> namespace will not be activated, and thus
> +the container will see all the host's network interfaces. This will
> +allow apps in the container to bind to/connect from TCP/UDP addresses
> +and ports from the host OS. It also allows applications to access
> +UNIX domain sockets associated with the host OS, which are in the
> +abstract namespace. If access to UNIX domains sockets in the abstract
> +namespace is not wanted, then applications should set the
> +<code><privnet/></code> flag in the
> +<code><features>....</features></code> element.
> +</p>
> +

This section is very detailed and helpful for developers, but sys admins may
not 
aware of issues like reboot.
Maybe some warnings about 'reboot issue' for sys admins are still needed.

How about keep the v1 patch's description:
Lacking of <code>network</code> namespace would allow <code>root</code>
in the container to do anything including shutting down the host OS.

Thanks

> +<h3><a name="securefs">Filesystem isolation</a></h3>
> +
> +<p>
> +If the guest configuration does not list any filesystems, then
> +the container will be set up with a root filesystem that matches
> +the host's root filesystem. As noted earlier, only a few locations
> +such as <code>/dev</code>, <code>/proc</code> and <code>/sys</code>
> +will be altered. This means that, in the absence of restrictions
> +from sVirt, a process running as user/group N:M inside the container
> +will be able to access almost exactly the same files as a process
> +running as user/group N:M in the host.
> +</p>
> +
> +<p>
> +There are multiple options for restricting this. It is possible to
> +simply map the existing root filesystem through to the container in
> +read-only mode. Alternatively a completely separate root filesystem
> +can be configured for the guest. In both cases, further sub-mounts
> +can be applied to customize the content that is made visible. Note
> +that in the absence of sVirt controls, it is still possible for the
> +root user in a container to unmount any sub-mounts applied. The user
> +namespace feature can also be used to restrict access to files based
> +on the UID/GID mappings.
> +</p>
> +
> +<p>
> +Sharing the host filesystem tree, also allows applications to access
> +UNIX domains sockets associated with the host OS, which are in the
> +filesystem namespaces. It should be noted that a number of init
> +systems including at least <code>systemd</code> and <code>upstart</code>
> +have UNIX domain socket which are used to control their operation.
> +Thus, if the directory/filesystem holding their UNIX domain socket is
> +exposed to the container, it will be possible for a user in the container
> +to invoke operations on the init service in the same way it could if
> +outside the container. This also applies to other applications in the
> +host which use UNIX domain sockets in the filesystem, such as DBus,
> +Libvirtd, and many more. If this is not desired, then applications
> +should either specify the UID/GID mapping in the configuration to
> +enable user namespaces & thus block access to the UNIX domain socket
> +based on permissions, or should ensure the relevant directories have
> +a bind mount to hide them. This is particularly important for the
> +<code>/run</code> or <code>/var/run</code> directories.
> +</p>
> +
> +
> +<h3><a name="secureusers">User and group isolation</a></h3>
> +
> +<p>
> +If the guest configuration does not list any ID mapping, then the
> +user and group IDs used inside the container will match those used
> +outside the container. In addition, the capabilities associated with
> +a process in the container will infer the same privileges they would
> +for a process in the host. This has obvious implications for security,
> +since a root user inside the container will be able to access any
> +file owned by root that is visible to the container, and perform more
> +or less any privileged kernel operation. In the absence of additional
> +protection from sVirt, this means that the root user inside a container
> +is effectively as powerful as the root user in the host. There is no
> +security isolation of the root user.
> +</p>
> +
> +<p>
> +The ID mapping facility was introduced to allow for stricter control
> +over the privileges of users inside the container. It allows apps to
> +define rules such as "user ID 0 in the container maps to user ID 1000
> +in the host". In addition the privileges associated with capabilities
> +are somewhat reduced so that they can not be used to escape from the
> +container environment. A full description of user namespaces is outside
> +the scope of this document, however LWN has
> +<a href="https://lwn.net/Articles/532593/">a good write-up on the
topic</a>.
> +From the libvirt point of view, the key thing to remember is that
defining
> +an ID mapping for users and groups in the container XML configuration
> +causes libvirt to activate the user namespace feature.
> +</p>
> +
> +
>  <h2><a name="activation">Systemd Socket Activation Integration</a></h2>
> 
>  <p>
> --
> 1.8.3.1