[libvirt] [FW: An introduction to libvirt's LXC (LinuX Container) support]
Daniel P. Berrange
berrange at redhat.com
Wed Sep 17 15:35:09 UTC 2008
FYI this mail i just sent to containers at lists.linux-foundation.org
where all the kernel container developers hang out.
Daniel
----- Forwarded message from "Daniel P. Berrange" <berrange at redhat.com> -----
> Date: Wed, 17 Sep 2008 16:06:35 +0100
> From: "Daniel P. Berrange" <berrange at redhat.com>
> To: containers at lists.linux-foundation.org
> Subject: An introduction to libvirt's LXC (LinuX Container) support
>
> This is a short^H^H^H^H^H long mail to introduce / walk-through some
> recent developments in libvirt to support native Linux hosted
> container virtualization using the kernel capabilities the people
> on this list have been adding in recent releases. We've been working
> on this for a few months now, but not really publicised it before
> now, and I figure the people working on container virt extensions
> for Linux might be interested in how it is being used.
>
> For those who aren't familiar with libvirt, it provides a stable API
> for managing virtualization hosts and their guests. It started with
> a Xen driver, and over time has evolved to add support for QEMU, KVM,
> OpenVZ and most recently of all a driver we're calling "LXC" short
> for "LinuX Containers". The key is that no matter what hypervisor
> you are using, there is a consistent set of APIs, and standardized
> configuration format for userspace management applications in the
> host (and remote secure RPC to the host).
>
> The LXC driver is the result of a combined effort from a number of
> people in the libvirt community, most notably Dave Leskovec contributed
> the original code, and Dan Smith now leads development along with my
> own contributions to its architecture to better integrate with libvirt.
>
> We have a couple of goals in this work. Overall, libvirt wants to be
> the defacto standard, open source management API for all virtualization
> platforms and native Linux virtualization capabilities are a strong
> focus. The LXC driver is attempting to provide a general purpose
> management solution for two container virt use cases:
>
> - Application workload isolation
> - Virtual private servers
>
> In the first use case we want to provide the ability to run an
> application in primary host OS with partial restrictons on its
> resource / service access. It will still run with the same root
> directory as the host OS, but its filesystem namespace may have
> some additional private mount points present. It may have a
> private network namespace to restrict its connectivity, and it
> will ultimately have restrictions on its resource usage (eg
> memory, CPU time, CPU affinity, I/O bandwidth).
>
> In the second use case, we want to provide completely virtualized
> operating system in the container (running the host kernel of
> course), akin to the capabilities of OpenVZ / Linux-VServer. The
> container will have a totally private root filesystem, private
> networking namespace, whatever other namespace isolation the
> kernel provides, and again resource restirctions. Some people
> like to think of this as 'a better chroot than chroot'.
>
> In terms of technical implementation, at its core is direct usage
> of the new clone() flags. By default all containers get created
> with CLONE_NEWPID, CLONE_NEWNS, CLONE_NEWUTS, CLONE_NEWUSER, and
> CLONE_NEWIPC. If private network config was requested they also
> get CLONE_NEWNET.
>
> For the workload isolation case, after creating the container we
> just add a number of filesystem mounts in the containers private
> FS namespace. In the VPS case, we'll do a pivot_root() onto the
> new root directory, and then add any extra filesystem mounts the
> container config requested.
>
> The stdin/out/err of the process leader in the container is bound
> to the slave end of a Psuedo TTY, libvirt owning the master end
> so it can provide a virtual text console into the guest container.
> Once the basic container setup is complete, libvirt exec the so
> called 'init' process. Things are thus setup such that when the
> 'init' process exits, the container is terminated / cleaned up.
>
> On the host side, the libvirt LXC driver creates what we call a
> 'controller' process for each container. This is done with a small
> binary /usr/libexec/libvirt_lxc. This is the process which owns the
> master end of the Pseduo-TTY, along with a second Pseduo-TTY pair.
> When the host admin wants to interact with the contain, they use
> the command 'virsh console CONTAINER-NAME'. The LXC controller
> process takes care of forwarding I/O between the two slave PTYs,
> one slave opened by virsh console, the other being the containers'
> stdin/out/err. If you kill the controller, then the container
> also dies. Basically you can think of the libvirt_lxc controller
> as serving the equivalent purpose to the 'qemu' command for full
> machine virtualization - it provides the interface between host
> and guest, in this case just the container setup, and access to
> text console - perhaps more in the future.
>
> For networking, libvirt provides two core concepts
>
> - Shared physical device. A bridge containing one of your
> physical network interfaces on the host, along with one or
> more of the guest vnet interfaces. So the container appears
> as if its directly on the LAN
>
> - Virtual network. A bridge containing only guest vnet
> interfaces, and NO physical device from the host. IPtables
> and forwarding provide routed (+ optionally NATed)
> connectivity to the LAN for guests.
>
> The latter use case is particularly useful for machines without
> a permanent wired ethernet - eg laptops, using wifi, as it lets
> guests talk to each other even when there's no active host network.
> Both of these network setups are fully supported in the LXC driver
> in precense of a suitably new host kernel.
>
> That's a 100ft overview and the current functionality is working
> quite well from an architectural/technical point of view, but there
> is plenty more work we still need todo to provide an system which
> is mature enough for real world production deployment.
>
> - Integration with cgroups. Although I talked about resource
> restrictions, we've not implemented any of this yet. In the
> most immediate timeframe we want to use cgroups' device
> ACL support to prevent the container having any ability to
> access to device nodes other than the usual suspects of
> /dev/{null,full,zero,console}, and possibly /dev/urandom.
> The other important one is to provide a memory cap across
> the entire container. CPU based resource control is lower
> priority at the moment.
>
> - Efficient query of resource utilization. We need to be able
> to get the cumulative CPU time of all the processes inside
> the container, without having to iterate over every PIDs'
> /proc/$PID/stat file. I'm not sure how we'll do this yet..
> We want to get this data this for all CPUs, and per-CPU.
>
> - devpts virtualization. libvirt currently just bind mount the
> host's /dev/pts into the container. Clearly this isn't a
> serious impl. We've been monitoring the devpts namespace
> patches and these look like they will provide the capabilities
> we need for the full virtual private server use case
>
> - network sysfs virtualization. libvirt can't currently use the
> CLONE_NEWNET flag in most Linux distros, since current released
> kernel has this capability conflicting with SYSFS in KConfig.
> Again we're looking forward to seeing this addressed in next
> kernel
>
> - UID/GID virtualization. While we spawn all containers as root,
> applications inside the container may witch to unprivileged
> UIDs. We don't (neccessarily) want users in the host with
> equivalent UIDs to be able to kill processes inside the
> container. It would also be desirable to allow unprivileged
> users to create containers without needing root on the host,
> but allowing them to be root & any other user inside their
> container. I'm not aware if anyone's working on this kind of
> thing yet ?
>
> There're probably more things Dan Smith is thinking of but that
> list is a good starting point.
>
> Finally, a 30 second overview of actually using LXC usage with
> libvirt to create a simple VPS using busybox in its root fs...
>
> - Create a simple chroot environment using busybox
>
> mkdir /root/mycontainer
> mkdir /root/mycontainer/bin
> mkdir /root/mycontainer/sbin
> cp /sbin/busybox /root/mycontainer/sbin
> for cmd in sh ls chdir chmod rm cat vi
> do
> ln -s /root/mycontainer/bin/$cmd ../sbin/busybox
> done
> cat > /root/mycontainer/sbin/init <<EOF
> #!/sbin/busybox
> sh
> EOF
>
>
> - Create a simple libvirt configuration file for the
> container, defining the root filesystem, the network
> connection (bridged to br0 in this case), and the
> path to the 'init' binary (defaults to /sbin/init if
> omitted)
>
> # cat > mycontainer.xml <<EOF
> <domain type='lxc'>
> <name>mycontainer</name>
> <memory>500000</memory>
> <os>
> <type>exe</type>
> <init>/sbin/init</init>
> </os>
> <devices>
> <filesystem type='mount'>
> <source dir='/root/mycontainer'/>
> <target dir='/'/>
> </filesystem>
> <interface type='bridge'>
> <source network='br0'/>
> <mac address='00:11:22:34:34:34'/>
> </interface>
> <console type='pty' />
> </devices>
> </domain>
> EOF
>
> - Load the configuration into libvirt
>
> # virsh --connect lxc:/// define mycontainer.xml
> # virsh --connect lxc:/// list --inactive
> Id Name State
> ----------------------------------
> - mycontainer shutdown
>
>
>
> - Start the VM and query some information about it
>
> # virsh --connect lxc:/// start mycontainer
> # virsh --connect lxc:/// list
> Id Name State
> ----------------------------------
> 28407 mycontainer running
>
> # virsh --connect lxc:/// dominfo mycontainer
> Id: 28407
> Name: mycontainer
> UUID: 8369f1ac-7e46-e869-4ca5-759d51478066
> OS Type: exe
> State: running
> CPU(s): 1
> Max memory: 500000 kB
> Used memory: 500000 kB
>
>
> NB. the CPU/memory info here is not enforce yet.
>
> - Interact with the container
>
> # virsh --connect lxc:/// console mycontainer
>
> NB, Ctrl+] to exit when done
>
> - Query the live config - eg to discover what PTY its
> console is connected to
>
>
> # virsh --connect lxc:/// dumpxml mycontainer
> <domain type='lxc' id='28407'>
> <name>mycontainer</name>
> <uuid>8369f1ac-7e46-e869-4ca5-759d51478066</uuid>
> <memory>500000</memory>
> <currentMemory>500000</currentMemory>
> <vcpu>1</vcpu>
> <os>
> <type arch='i686'>exe</type>
> <init>/sbin/init</init>
> </os>
> <clock offset='utc'/>
> <on_poweroff>destroy</on_poweroff>
> <on_reboot>restart</on_reboot>
> <on_crash>destroy</on_crash>
> <devices>
> <filesystem type='mount'>
> <source dir='/root/mycontainer'/>
> <target dir='/'/>
> </filesystem>
> <console type='pty' tty='/dev/pts/22'>
> <source path='/dev/pts/22'/>
> <target port='0'/>
> </console>
> </devices>
> </domain>
>
> - Shutdown the container
>
> # virsh --connect lxc:/// destroy mycontainer
>
> There is lots more I could say, but hopefully this serves as
> a useful introduction to the LXC work in libvirt and how it
> is making use of the kernel's container based virtualization
> support. For those interested in finding out more, all the
> source is in the libvirt CVS repo, the files being those
> named src/lxc_conf.c, src/lxc_container.c, src/lxc_controller.c
> and src/lxc_driver.c.
>
> http://libvirt.org/downloads.html
>
> or via the GIT mirror of our CVS repo
>
> git clone git://git.et.redhat.com/libvirt.git
>
> Regards,
> Daniel
> --
> |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
> |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
> |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
> _______________________________________________
> Containers mailing list
> Containers at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
>
----- End forwarded message -----
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
More information about the libvir-list
mailing list