[libvirt] [PATCH] Add docs about cgroups layout and usage

Daniel P. Berrange berrange at redhat.com
Fri May 3 16:49:26 UTC 2013


From: "Daniel P. Berrange" <berrange at redhat.com>

Describe the new cgroups layout, how to customize placement
of guests and what virsh commands are used to access the
parameters.

Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
---
 docs/cgroups.html.in | 285 +++++++++++++++++++++++++++++++++++++++++++++++++++
 docs/sitemap.html.in |   4 +
 2 files changed, 289 insertions(+)
 create mode 100644 docs/cgroups.html.in

diff --git a/docs/cgroups.html.in b/docs/cgroups.html.in
new file mode 100644
index 0000000..3be0672
--- /dev/null
+++ b/docs/cgroups.html.in
@@ -0,0 +1,285 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <body>
+    <h1>Control Groups Resource Management</h1>
+
+    <ul id="toc"></ul>
+
+    <p>
+      The QEMU and LXC drivers make use of the Linux "Control Groups" facility
+      for applying resource management to their virtual machines & containers.
+    </p>
+
+    <h2><a name="requiredControllers">Required controllers</a></h2>
+
+    <p>
+      The control groups filesystem supports multiple "controllers". By default
+      the init system (such as systemd) should mount all controllers compiled
+      into the kernel at <code>/sys/fs/cgroup/$CONTROLLER-NAME</code>. Libvirt
+      will never attempt to mount any controllers itself, merely detect where
+      they are mounted.
+    </p>
+
+    <p>
+      The QEMU driver is capable of using the <code>cpuset</code>,
+      <code>cpu</code>, <code>memory</code>, <code>blkio</code> and
+      <code>devices</code> controllers. None of them are compulsory.
+      If any controller is not mounted, the resource management APIs
+      which use it will cease to operate. It is possible to explicitly
+      turn off use of a controller, even when mounted, via the
+      <code>/etc/libvirt/qemu.conf</code> configuration file.
+    </p>
+
+    <p>
+      The LXC driver is capable of using the <code>cpuset</code>,
+      <code>cpu</code>, <code>cpuset</code>, <code>freezer</code>,
+      <code>memory</code>, <code>blkio</code> and <code>devices</code>
+      controllers. The <code>cpuset</code>, <code>devices</code>
+      and <code>memory</code> controllers are compulsory. Without
+      them mounted, no containers can be started. If any of the
+      other controllers are not mounted, the resource management APIs
+      which use them will cease to operate.
+    </p>
+
+    <h2><a name="currentLayout">Current cgroups layout</a></h2>
+
+    <p>
+      As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been
+      simplified, in order to facilitate the setup of resource control policies by
+      administrators / management applications. The layout is based on the concepts of
+      "partitions" and "consumers". Each virtual machine or container is a consumer,
+      and has a corresponding cgroup named <code>$VMNAME.libvirt-{qemu,lxc}</code>.
+      Each consumer is associated with exactly one partition, which also have a
+      corresponding cgroup usually named <code>$PARTNAME.partition</code>. The
+      exceptions to this naming rule are the three top level default partitions,
+      named <code>/system</code> (for system services), <code>/user</code> (for
+      user login sessions) and <code>/machine</code> (for virtual machines and
+      containers). By default every consumer will of course be associated with
+      the <code>/machine</code> partition. This leads to a hierarchy that looks
+      like
+    </p>
+
+    <pre>
+$ROOT
+  |
+  +- system
+  |   |
+  |   +- libvirtd.service
+  |
+  +- machine
+      |
+      +- vm1.libvirt-qemu
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- vm2.libvirt-qemu
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- vm3.libvirt-qemu
+      |   |
+      |   +- emulator
+      |   +- vcpu0
+      |   +- vcpu1
+      |
+      +- container1.libvirt-lxc
+      |
+      +- container2.libvirt-lxc
+      |
+      +- container3.libvirt-lxc
+    </pre>
+
+    <p>
+      The default cgroups layout ensures that, when there is contention for
+      CPU time, it is shared equally between system services, user sessions
+      and virtual machines / containers. This prevents virtual machines from
+      locking the administrator out of the host, or impacting execution of
+      system services. ConverselyWhen there is no contention from
+      system services / user sessions, it is possible for virtual machines
+      to fully utilize the host CPUs.
+    </p>
+
+    <h2><a name="customPartiton">Using custom partitions</a></h2>
+
+    <p>
+      If there is a need to apply resource constraints to groups of
+      virtual machines or containers, then the single default
+      partition <code>/machine</code> may not be sufficiently
+      flexible. The administrator may wish to sub-divide the
+      default partition, for example into "testing" and "production"
+      partitions, and then assign each guest to a specific
+      sub-partition. This is achieved via a small element addition
+      to the guest domain XML config, just below the main <code>domain</code>
+      element
+    </p>
+
+    <pre>
+  ...
+  <resource>
+    <partition>/machine/production</partition>
+  </resource>
+  ...
+    </pre>
+
+    <p>
+      Libvirt will not auto-create the cgroups directory to back
+      this partition. In the future, libvirt / virsh will provide
+      APIs / commands to create custom partitions, but currently
+      this is left as an exercise for the administrator. For
+      example, given the XML config above, the admin would need
+      to create a cgroup named '/machine/production.partition'
+    </p>
+
+    <pre>
+# cd /sys/fs/cgroup
+# for i in blkio cpu,cpuacct cpuset devices freezer memory net_cls perf_event
+  do
+    mkdir $i/machine/production.partition
+  done
+# for i in cpuset.cpus  cpuset.mems
+  do
+    cat cpuset/machine/$i > cpuset/machine/production.partition/$i
+  done
+</pre>
+
+    <p>
+      <strong>Note:</strong> the cgroups directory created as a ".partition"
+      suffix, but the XML config does not require this suffix.
+    </p>
+
+    <p>
+      <strong>Note:</strong> the ability to place guests in custom
+      partitions is only available with libvirt >= 1.0.5, using
+      the new cgroup layout. The legacy cgroups layout described
+      later did not support customization per guest.
+    </p>
+
+    <h2><a name="resourceAPIs">Resource management APIs/commands</a></h2>
+
+    <p>
+      Since libvirt aims to provide an API which is portable across
+      hypervisors, the concept of cgroups is not exposed directly
+      in the API or XML configuration. It is considered to be an
+      internal implementation detail. Instead libvirt provides a
+      set of APIs for applying resource controls, which are then
+      mapped to corresponding cgroup tunables
+    </p>
+
+    <h3>Scheduler tuning</h3>
+
+    <p>
+     Parameters from the "cpu" controller are exposed via the
+     <code>schedinfo</code> command in virsh.
+    </p>
+
+    <pre>
+# virsh schedinfo demo
+Scheduler      : posix
+cpu_shares     : 1024
+vcpu_period    : 100000
+vcpu_quota     : -1
+emulator_period: 100000
+emulator_quota : -1</pre>
+
+
+    <h3>Block I/O tuning</h3>
+
+    <p>
+     Parameters from the "blkio" controller are exposed via the
+     <code>bkliotune</code> command in virsh.
+    </p>
+
+
+    <pre>
+# virsh blkiotune demo
+weight         : 500
+device_weight  : </pre>
+
+    <h3>Memory tuning</h3>
+
+    <p>
+     Parameters from the "memory" controller are exposed via the
+     <code>memtune</code> command in virsh.
+    </p>
+
+    <pre>
+# virsh memtune demo
+hard_limit     : 580192
+soft_limit     : unlimited
+swap_hard_limit: unlimited
+    </pre>
+
+    <h3>Network tuning</h3>
+
+    <p>
+      The <code>net_cls</code> is not currently used. Instead traffic
+      filter policies are set directly against individual virtual
+      network interfaces.
+    </p>
+
+    <h2><a name="legacyLayout">Legacy cgroups layout</a></h2>
+
+    <p>
+      Prior to libvirt 1.0.5, the cgroups layout created by libvirt was different
+      from that described above, and did not allow for administrator customization.
+      Libvirt used a fixed, 3-level hiearchy <code>libvirt/{qemu,lxc}/$VMNAME</code>
+      which was rooted at the point in the hiearchy where libvirtd itself was
+      located. So if libvirtd was placed at <code>/system/libvirtd.service</code>
+      by systemd, the groups for each virtual machine / container would be located
+      at <code>/system/libvirtd.service/libvirt/{qemu,lxc}/$VMNAME</code>. In addition
+      to this, the QEMU drivers further child groups for each vCPU thread and the
+      emulator thread(s). This leads to a hiearchy that looked like
+    </p>
+
+
+    <pre>
+$ROOT
+  |
+  +- system
+      |
+      +- libvirtd.service
+           |
+           +- libvirt
+               |
+               +- qemu
+               |   |
+               |   +- vm1
+               |   |   |
+               |   |   +- emulator
+               |   |   +- vcpu0
+               |   |   +- vcpu1
+               |   |
+               |   +- vm2
+               |   |   |
+               |   |   +- emulator
+               |   |   +- vcpu0
+               |   |   +- vcpu1
+               |   |
+               |   +- vm3
+               |       |
+               |       +- emulator
+               |       +- vcpu0
+               |       +- vcpu1
+               |
+               +- lxc
+                   |
+                   +- container1
+                   |
+                   +- container2
+                   |
+                   +- container3
+    </pre>
+
+    <p>
+      Although current releases are much improved, historically the use of deep
+      hiearchies has had a significant negative impact on the kernel scalability.
+      The legacy libvirt cgroups layout highlighted these problems, to the detriment
+      of the performance of virtual machines and containers.
+    </p>
+  </body>
+</html>
diff --git a/docs/sitemap.html.in b/docs/sitemap.html.in
index 619e4a1..cb7cc5b 100644
--- a/docs/sitemap.html.in
+++ b/docs/sitemap.html.in
@@ -89,6 +89,10 @@
                 <span>Ensuring exclusive guest access to disks</span>
               </li>
               <li>
+                <a href="cgroups.html">CGroups</a>
+                <span>Control groups integration</span>
+              </li>
+              <li>
                 <a href="hooks.html">Hooks</a>
                 <span>Hooks for system specific management</span>
               </li>
-- 
1.8.1.4




More information about the libvir-list mailing list