[libvirt] [PATCH] libvirt/hooks: Static and dynamic hugepage hooks

Alex Williamson alex.williamson at redhat.com
Wed Jun 24 18:57:33 UTC 2015


This patch provides scripts for hugepage allocation, as well as a bit
of infrastructure and common hook config file that I hope may some day
be enabled by default in libvirt.  For now, we place the files in
/usr/share and ask users to install the config file and copy or link
the scripts, more like "contrib" scripts for now.

Two methods of hugepage allocation are provided, static and dynamic.
The static mechanism allocates pages at libvirt daemon startup and
releases them at shutdown.  It allows full size, locality, and policy
configuration.  For instance, if I want to allocate a set of 2M pages
exclusively on host NUMA node 1, it can do that, along with plenty
more.  This is especially useful for 1G hugepages on x86, since they
can now be allocated dynamically, but become impractical to allocate
due to memory fragmentation as the host runs.  Systems dedicated to
hosting VMs are also likely to prefer static allocation.  Static
allocation requires explicit XML entries in the hook config file to
be activated.

The dynamic method allocates hugepages only around the instantiation
of the VM.  This is enabled by adding an entry for the domain in the
config file and configuring the domain normally for hugepages.  The
dynamic hugepage script is activated via the QEMU domain prepare
hook, reads the domain XML and allocates hugepages as necessary.  On
domain shutdown, hugepages are freed via the release hook.  This
model is more appropriate for systems that are not dedicated VM
hosts and guests that use hugepage sizes and quantities are are
likely to be dynamically allocated as the VM is started.

In addition to the documentation provided within each script, a README
file is provided with overal instructions and summaries of the
individual scripts.

Signed-off-by: Alex Williamson <alex.williamson at redhat.com>
---
 Makefile.am                        |    2 
 configure.ac                       |    3 
 hooks/Makefile.am                  |   27 ++++
 hooks/README                       |   47 +++++++
 hooks/daemon                       |   33 +++++
 hooks/daemon.d/Makefile.am         |   21 +++
 hooks/daemon.d/static-hugepages.sh |  100 +++++++++++++++
 hooks/functions.sh                 |  180 +++++++++++++++++++++++++++
 hooks/libvirt-hook-config.xml      |   38 ++++++
 hooks/qemu                         |   40 ++++++
 hooks/qemu.d/Makefile.am           |   21 +++
 hooks/qemu.d/dynamic-hugepages.sh  |  238 ++++++++++++++++++++++++++++++++++++
 libvirt.spec.in                    |   13 ++
 13 files changed, 761 insertions(+), 2 deletions(-)
 create mode 100644 hooks/Makefile.am
 create mode 100644 hooks/README
 create mode 100755 hooks/daemon
 create mode 100644 hooks/daemon.d/Makefile.am
 create mode 100755 hooks/daemon.d/static-hugepages.sh
 create mode 100755 hooks/functions.sh
 create mode 100644 hooks/libvirt-hook-config.xml
 create mode 100755 hooks/qemu
 create mode 100644 hooks/qemu.d/Makefile.am
 create mode 100755 hooks/qemu.d/dynamic-hugepages.sh

diff --git a/Makefile.am b/Makefile.am
index 9796069..e4a709a 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -24,7 +24,7 @@ SUBDIRS = . gnulib/lib include src daemon tools docs gnulib/tests \
   examples/dominfo examples/domsuspend examples/apparmor \
   examples/xml/nwfilter examples/openauth examples/systemtap \
   tools/wireshark examples/dommigrate \
-  examples/lxcconvert examples/domtop
+  examples/lxcconvert examples/domtop hooks
 
 ACLOCAL_AMFLAGS = -I m4
 
diff --git a/configure.ac b/configure.ac
index 93f9e38..b0dc035 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2812,7 +2812,8 @@ AC_CONFIG_FILES([\
         examples/xml/nwfilter/Makefile \
         examples/lxcconvert/Makefile \
         tools/wireshark/Makefile \
-        tools/wireshark/src/Makefile])
+        tools/wireshark/src/Makefile \
+        hooks/Makefile hooks/daemon.d/Makefile hooks/qemu.d/Makefile])
 AC_OUTPUT
 
 AC_MSG_NOTICE([])
diff --git a/hooks/Makefile.am b/hooks/Makefile.am
new file mode 100644
index 0000000..1fe6076
--- /dev/null
+++ b/hooks/Makefile.am
@@ -0,0 +1,27 @@
+## Copyright (C) 2015 Red Hat, Inc.
+##
+## This library is free software; you can redistribute it and/or
+## modify it under the terms of the GNU Lesser General Public
+## License as published by the Free Software Foundation; either
+## version 2.1 of the License, or (at your option) any later version.
+##
+## This library is distributed in the hope that it will be useful,
+## but WITHOUT ANY WARRANTY; without even the implied warranty of
+## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+## Lesser General Public License for more details.
+##
+## You should have received a copy of the GNU Lesser General Public
+## License along with this library.  If not, see
+## <http://www.gnu.org/licenses/>.
+
+SUBDIRS = daemon.d qemu.d
+
+hooksdir = $(pkgdatadir)/hooks
+hooks_DATA = \
+	daemon \
+	functions.sh \
+	libvirt-hook-config.xml \
+	qemu \
+	README
+
+EXTRA_DIST = $(hooks_DATA)
diff --git a/hooks/README b/hooks/README
new file mode 100644
index 0000000..8114c16
--- /dev/null
+++ b/hooks/README
@@ -0,0 +1,47 @@
+libivrt contributed and example hook scripts
+
+libvirt provides several static hooks for use around daemon startup and
+shutdown as well as various points around domain and network lifecycle.
+See https://www.libvirt.org/hooks.html for formal documentation.
+
+Summary of scripts available here:
+
+- daemon:
+	Supports sub-scripts in daemon.d directory
+- qemu:
+	Supports sub-scripts in qemu.d directory
+- daemon.d/static-hugepages.sh:
+	Supports hugepage allocations around libvirt daemon init
+	(Requires: xmllint numactl)
+- qemu.d/dynamic-hugepages.sh:
+	Supports hugepage allocations around QEMU domain instantiation
+	(Requires: xmllint numactl)
+- functions.sh:
+	Shared utility functions for scripts
+- libvirt-hook-config.xml:
+	Base config file for scripts
+
+Please consult each script for further details.
+
+To use:
+
+1) Copy libvirt-hook-config.xml to /etc/sysconfig/ and modify as desired
+
+2) If only a single sub-script per hook type is needed, it may be copied
+   directly to /etc/libvirt/hooks, Ex:
+	cp qemu.d/dynamic-hugepages.sh /etc/libvirt/hooks/qemu
+
+   If multiple sub-scripts per hook type are needed, or to support additional
+   sub-scripts later, use the stub qemu and daemon scripts here, create
+   qemu.d and daemon.d sub-directories under /etc/libvirt/hooks/ and copy
+   the desired scripts to the sub-directories, Ex:
+	cp qemu /etc/libvirt/hooks/
+	mkdir /etc/libvirt/hooks/qemu.d
+	cp qemu.d/dynamic-hugepages.sh /etc/libvirt/hooks/qemu.d/
+
+   To easily get all of the scripts, it's possible to simply soft link
+   the entire directory, Ex:
+	ln -s /usr/share/libvirt/hooks /etc/libvirt/hooks
+
+3) Restart the libvirt daemon to re-read hook scripts:
+	systemctl restart libvirtd.service
diff --git a/hooks/daemon b/hooks/daemon
new file mode 100755
index 0000000..7b8947d
--- /dev/null
+++ b/hooks/daemon
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# This script is intended to implement a libvirt /etc/libvirt/hooks/daemon
+# script and extend daemon execution out to all executable scripts in the
+# daemon.d subdirectory.  To use, copy to /etc/libvirt/hooks/ and place
+# any sub-scripts in a daemon.d subdirectory.
+
+DIR="/etc/libvirt/hooks/daemon.d/"
+
+if [ -d $DIR ]; then
+  for SCRIPT in $(find $DIR -type f -executable); do
+    $SCRIPT $@
+    if [ $? -ne 0 ]; then
+      exit 1
+    fi
+  done
+fi
diff --git a/hooks/daemon.d/Makefile.am b/hooks/daemon.d/Makefile.am
new file mode 100644
index 0000000..ad9d6cb
--- /dev/null
+++ b/hooks/daemon.d/Makefile.am
@@ -0,0 +1,21 @@
+## Copyright (C) 2015 Red Hat, Inc.
+##
+## This library is free software; you can redistribute it and/or
+## modify it under the terms of the GNU Lesser General Public
+## License as published by the Free Software Foundation; either
+## version 2.1 of the License, or (at your option) any later version.
+##
+## This library is distributed in the hope that it will be useful,
+## but WITHOUT ANY WARRANTY; without even the implied warranty of
+## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+## Lesser General Public License for more details.
+##
+## You should have received a copy of the GNU Lesser General Public
+## License along with this library.  If not, see
+## <http://www.gnu.org/licenses/>.
+
+daemonddir = $(pkgdatadir)/hooks/daemon.d
+daemond_DATA = \
+	static-hugepages.sh
+
+EXTRA_DIST = $(daemond_DATA)
diff --git a/hooks/daemon.d/static-hugepages.sh b/hooks/daemon.d/static-hugepages.sh
new file mode 100755
index 0000000..081209d
--- /dev/null
+++ b/hooks/daemon.d/static-hugepages.sh
@@ -0,0 +1,100 @@
+#!/bin/sh
+
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# This script makes use of the libvirt hook interface for the libvirt daemon
+# to allocate hugepages for domains during libvirt startup, releasing pages
+# at libvirt shutdown.  Hugepages are allocated statically, so long as the
+# libvirt daemon is running.  This mechanism is predominantly useful for
+# hosts where the domains use very large hugepages or the majority of host
+# memory, making dynamic allocation unreliable.  The config file entry to
+# enable this hook includes some number of hugepages elements of the form:
+#
+# <config>
+#   <daemon>
+#     <static-hugepages>
+#       <hugepages size="2048" unit="k" mode="strict" nodeset="0-1">1024</hugepages>
+#       <hugepages size="1" unit="G" mode="preferred" nodeset="2">4</hugepages>
+#     </static-hugepages>
+#   </daemon>
+# </config>
+#
+# The above example allocates 1024, 2M pages (2G total), strictly across nodes
+# 0-1 and 4, 1G pages (4G) total with a preferred node policy of node 2.  Like
+# the numatune elements in libvirt, the @mode may be one of "strict",
+# "interleave", or "preferred", matching the libnuma policy.  The default mode
+# is strict.  @size and @unit specify the size of hugepage, using the standard
+# libvirt unit values.  1KiB is the default unit size.  @nodeset makes use of
+# the standard libvirt range support, allowing sets, ranges, and exclusions.
+# NB, mode="preferred" requires a singleton nodeset.
+
+CONFIG="/etc/sysconfig/libvirt-hook-config.xml"
+
+source /usr/share/libvirt/hooks/functions.sh
+
+# Utilities required to parse our config file and set NUMA policy
+which xmllint > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+  exit 0
+fi
+
+which numactl > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+  exit 0
+fi
+
+if [ ! -e $CONFIG ]; then
+  exit 0
+fi
+
+case "$2" in
+  start | shutdown)
+    ;;
+  *)
+    exit 0
+    ;;
+esac
+
+XMLLINT="xmllint --xpath"
+
+BASE="/config/daemon/static-hugepages/hugepages"
+
+for i in $(seq 1 $($XMLLINT "count($BASE)" $CONFIG 2>/dev/null)); do
+  SIZE=$($XMLLINT "string($BASE[$i]/@size)" $CONFIG 2>/dev/null)
+  UNIT=$($XMLLINT "string($BASE[$i]/@unit)" $CONFIG 2>/dev/null)
+  if [ -z "$UNIT" ]; then
+    UNIT=KiB
+  fi
+  COUNT=$($XMLLINT "$BASE[$i]/text()" $CONFIG 2>/dev/null)
+
+  POLICY=$($XMLLINT "string($BASE[$i]/@mode)" $CONFIG 2>/dev/null)
+  NODESET=$($XMLLINT "string($BASE[$i]/@nodeset)" $CONFIG 2>/dev/null)
+  if [ -n "$NODESET" ] && [ -z "$POLICY" ]; then
+    POLICY=strict
+  fi
+
+  case "$2" in
+    start)
+      addhugepages $SIZE $UNIT $COUNT $NODESET $POLICY
+      ;;
+    shutdown)
+      addhugepages $SIZE $UNIT -$COUNT $NODESET $POLICY
+      ;;
+  esac
+done
+
+exit 0
diff --git a/hooks/functions.sh b/hooks/functions.sh
new file mode 100755
index 0000000..07f52db
--- /dev/null
+++ b/hooks/functions.sh
@@ -0,0 +1,180 @@
+#!/bin/sh
+
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# parserange($RANGE)
+#
+# This takes a libvirt defined range, of the form "1-7,^4,15-17" and
+# parses it into an explicit list "1 2 3 5 6 7 15 16 17".  The negations
+# are only allowed for single numbers, not ranges because that's what
+# libvirt does.  Ordering is also important, "1-7,^4" is not the same as
+# "^4,1-7".  We assume a well formed range since we get it from libvirt.
+function parserange {
+  declare -a BITMAP=()
+  CUR=0
+
+  while [ $CUR -lt ${#1} ]; do
+    unset NEGATE
+    START=0
+    END=0
+
+    if [ ${1:$CUR:1} == ',' ]; then
+      CUR=$(( $CUR + 1 ))
+      continue
+    fi
+
+    if [ ${1:$CUR:1} == '^' ]; then
+      CUR=$(( $CUR + 1 ))
+      NEGATE=1
+    fi
+
+    while [ $CUR -lt ${#1} ]; do
+      case ${1:$CUR:1} in
+        [0-9])
+          START=$(( ( $START * 10 ) + ${1:$CUR:1} ))
+          CUR=$(( $CUR + 1 ))
+          ;;
+        *)
+          break
+          ;;
+      esac
+    done
+
+    if [ $CUR -eq ${#1} ] || [ ${1:$CUR:1} == ',' ]; then
+      if [ -n "$NEGATE" ]; then
+        unset BITMAP[$START]
+      else
+        BITMAP[$START]=$START
+      fi
+    elif [ ${1:$CUR:1} == '-' ]; then
+      CUR=$(( $CUR + 1 ))
+      while [ $CUR -lt ${#1} ]; do
+        case ${1:$CUR:1} in
+          [0-9])
+            END=$(( ( $END * 10 ) + ${1:$CUR:1} ))
+            CUR=$(( $CUR + 1 ))
+            ;;
+          *)
+            break
+            ;;
+        esac
+      done
+
+      for i in $(seq $START $END); do
+        BITMAP[$i]=$i
+      done
+    fi
+  done
+
+  echo ${BITMAP[@]}
+}
+
+# parsesize($SIZE [$UNIT])
+#
+# This takes a $SIZE and optional $UNIT and returns bytes.  If a unit is not
+# provided, 1KiB is assumed.
+function parsesize {
+  MULTIPLIER=1024
+  if [ -n "$2" ]; then
+    case "$2" in
+      b)
+        MULTIPLIER=1
+        ;;
+      KB)
+        MULTIPLIER=1000
+        ;;
+      KiB | k | kB)
+        MULTIPLIER=1024
+        ;;
+      MB)
+        MULTIPLIER=1000000
+        ;;
+      MiB | M)
+        MULTIPLIER=$(( 1024 * 1024 ))
+        ;;
+      GB)
+        MULTIPLIER=1000000000
+        ;;
+      GiB | G)
+        MULTIPLIER=$(( 1024 * 1024 * 1024 ))
+        ;;
+      *)
+        echo "$0: Invalid hugepage unit: $2" >&2
+        exit 1
+    esac
+  fi
+
+  echo $(( ( $1 * $MULTIPLIER ) ))
+}
+
+# addhugepages($SIZE $UNIT $COUNT [$NODES [$POLICY]])
+#
+# Given a hugepage size, as defined by $SIZE and $UNIT, attempt to allocate
+# $COUNT, with an optional $NODES NUMA node range and $POLICY.  This uses
+# the raw sysfs hugepages interface rather than `virsh allocpages` in order
+# to avoid issues with calling libvirt from within a hook function.
+function addhugepages {
+  SYSFSNODE="/sys/kernel/mm/hugepages/hugepages-"
+
+  HUGE=$(parsesize $1 $2)
+  SYSFSNODE+=$(( $HUGE / 1024 ))
+  SYSFSNODE+="kB"
+
+  if [ ! -d $SYSFSNODE ]; then
+    echo "$0: Hugepage size $(( $HUGE / 1024 ))kB not supported by kernel" >&2
+    exit 1
+  fi
+
+  NR="nr_hugepages"
+  PREFIX=""
+  if [ -n "$4" ]; then
+    NR+="_mempolicy"
+
+    NODES=$(parserange "$4")
+    NODES=$(echo $NODES | sed -e 's/ /,/g')
+
+    if [ -z "$5" ]; then
+      PREFIX="numactl -m $NODES"
+    else
+      case "$5" in
+        strict)
+          PREFIX="numactl -m $NODES"
+          ;;
+        interleave)
+          PREFIX="numactl -i $NODES"
+          ;;
+        preferred)
+          if [ $(expr index "$NODES" ",") -ne 0 ]; then
+            echo "$0: \"preferred\" memory policy only supports single node" >&2
+            exit 1
+          fi
+          PREFIX="numactl --preferred $NODES"
+          ;;
+      esac
+    fi
+  fi
+
+  # NB: count ($3) can be negative!
+  NEW=$(( $(cat $SYSFSNODE/$NR) + $3 ))
+  if [ $NEW -lt 0 ]; then
+    NEW=0
+  fi
+  $PREFIX echo $NEW > $SYSFSNODE/$NR
+
+  # TBD: should we check for sufficient free pages or go on blindly?
+  # For now, the latter.
+}
diff --git a/hooks/libvirt-hook-config.xml b/hooks/libvirt-hook-config.xml
new file mode 100644
index 0000000..1d8bdce
--- /dev/null
+++ b/hooks/libvirt-hook-config.xml
@@ -0,0 +1,38 @@
+<!--
+     This is an examples config file intended to be extensisble to all
+     libvirt hooks by using namespaces specific to the type of hook a
+     hook script name.  This file should be installed as:
+     /etc/sysconfig/libvirt-hook-config.xml
+     -->
+
+<config>
+  <daemon>
+    <static-hugepages>
+      <!--
+           See /usr/share//libvirt/hooks/daemon.d/static-hugepages.sh
+
+           Examples:
+           <hugepages size="2048" nodeset="0">1024</hugepages>
+           <hugepages size="2048" nodeset="1">2048</hugepages>
+
+           This allocates 1024 2M hugepages on host node 0 and 2048 2M
+           hugepages on host node 1 at libvirt daemon startup and release
+           them on daemon shutdown.
+           -->
+    </static-hugepages>
+  </daemon>
+  <qemu>
+    <dynamic-hugepages>
+      <!--
+           See /usr/share/libvirt/hooks/qemu.d/dynamic-hugepages.sh
+
+           Examples:
+           <domain name="VM1"/>
+
+           The XML for domain VM1 will be parsed during the prepare and
+           release hooks, allocating and freeing any configured hugepages
+           for the domain around the instantiation.
+           -->
+    </dynamic-hugepages>
+  </qemu>
+</config>
diff --git a/hooks/qemu b/hooks/qemu
new file mode 100755
index 0000000..2652831
--- /dev/null
+++ b/hooks/qemu
@@ -0,0 +1,40 @@
+#!/bin/sh
+
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# This script is intended to implement a libvirt /etc/libvirt/hooks/qemu
+# script and extend qemu execution out to all executable scripts in the
+# qemu.d subdirectory.  To use, copy to /etc/libvirt/hooks/ and place
+# any sub-scripts in a qemu.d subdirectory.
+
+DIR="/etc/libvirt/hooks/qemu.d/"
+
+# Domain xml is passed as stdin to the script, supply the same for sub-scripts
+XML=$(mktemp)
+cat - > $XML
+
+if [ -d $DIR ]; then
+  for SCRIPT in $(find $DIR -type f -executable); do
+    $SCRIPT $@ < $XML
+    if [ $? -ne 0 ]; then
+      rm $XML
+      exit 1
+    fi
+  done
+fi
+
+rm $XML
diff --git a/hooks/qemu.d/Makefile.am b/hooks/qemu.d/Makefile.am
new file mode 100644
index 0000000..64fe64d
--- /dev/null
+++ b/hooks/qemu.d/Makefile.am
@@ -0,0 +1,21 @@
+## Copyright (C) 2015 Red Hat, Inc.
+##
+## This library is free software; you can redistribute it and/or
+## modify it under the terms of the GNU Lesser General Public
+## License as published by the Free Software Foundation; either
+## version 2.1 of the License, or (at your option) any later version.
+##
+## This library is distributed in the hope that it will be useful,
+## but WITHOUT ANY WARRANTY; without even the implied warranty of
+## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+## Lesser General Public License for more details.
+##
+## You should have received a copy of the GNU Lesser General Public
+## License along with this library.  If not, see
+## <http://www.gnu.org/licenses/>.
+
+qemuddir = $(pkgdatadir)/hooks/qemu.d
+qemud_DATA = \
+	dynamic-hugepages.sh
+
+EXTRA_DIST = $(qemud_DATA)
diff --git a/hooks/qemu.d/dynamic-hugepages.sh b/hooks/qemu.d/dynamic-hugepages.sh
new file mode 100755
index 0000000..438c0c9
--- /dev/null
+++ b/hooks/qemu.d/dynamic-hugepages.sh
@@ -0,0 +1,238 @@
+#!/bin/sh
+
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# This script makes use of the libvirt hook interface for qemu in order to
+# attempt to allocate hugepages for a domain around instantiation.  We will
+# only try to allocate hugepages for VMs both configured for hugepages and
+# listed in our config file as enabled for dynamic hugepages.  The config file
+# entry is simply a list of domain names enabled for dynamic hugepages, ex:
+#
+# <config>
+#   <qemu>
+#     <dynamic-hugepages>
+#       <domain name="domain1"/>
+#       <domain name="domain2"/>
+#     </dynamic-hugepages>
+#   </qemu>
+# </config>
+#
+# The host system will often be able to accomodate hugepage requests so long
+# memory isn't too fragmented.  Domains making use of a significant portion of
+# host memory or very large hugepages may be better served using static
+# hugepage hooks.
+
+CONFIG="/etc/sysconfig/libvirt-hook-config.xml"
+
+source /usr/share/libvirt/hooks/functions.sh
+
+# Utilities required to parse our config file and set NUMA policy
+which xmllint > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+  exit 0
+fi
+
+which numactl > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+  exit 0
+fi
+
+if [ ! -e $CONFIG ]; then
+  exit 0
+fi
+
+case "$2" in
+  prepare | release)
+    ;;
+  *)
+    exit 0
+    ;;
+esac
+
+XMLLINT="xmllint --xpath"
+
+BASE="/config/qemu/dynamic-hugepages/domain"
+
+# Is this domain enabled in the config file?
+$XMLLINT "$BASE[@name=\"$1\"]/@name" $CONFIG > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+  exit 0
+fi
+
+# Domain XML is provided via stdin, save it somewhere
+XML=$(mktemp)
+cat - > $XML
+
+# Abort if the domain isn't configure for hugepages
+$XMLLINT "/domain/memoryBacking/hugepages" $XML > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+  rm $XML
+  exit 0
+fi
+
+# The easy non-NUMA guest case first.  If no page size is specified, read the
+# default hugepage size from /proc/meminfo.  NB, our size parsing needs to
+# handle "kB" in addition to the libvirt defined units for this.  If page size
+# is specified, there can be only one, so parse @size, @unit, and @nodeset and
+# do the allocation/deallocation.
+
+$XMLLINT "/domain/cpu/numa" $XML > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+  BASE="/domain/memoryBacking/hugepages/page"
+  $XMLLINT "$BASE/@size" $XML > /dev/null 2>&1
+  if [ $? -ne 0 ]; then
+    # grep Hugepagesize /proc/meminfo
+    # Hugepagesize:       2048 kB
+    HPAGESIZE=$(grep Hugepagesize /proc/meminfo | awk '{print $2}')
+    HPAGEUNIT=$(grep Hugepagesize /proc/meminfo | awk '{print $3}')
+  else
+    HPAGESIZE=$($XMLLINT "string($BASE/@size)" $XML 2>/dev/null)
+    HPAGEUNIT=$($XMLLINT "string($BASE/@unit)" $XML 2>/dev/null)
+    HPAGENODE=$($XMLLINT "string($BASE/@nodeset)" $XML 2>/dev/null)
+  fi
+
+
+  MEMORY=$($XMLLINT "/domain/memory/text()" $XML 2>/dev/null)
+  UNIT=$($XMLLINT "string(/domain/memory/@unit)" $XML 2> /dev/null)
+  MEMORY=$(parsesize $MEMORY $UNIT)
+  HSIZE=$(parsesize $HPAGESIZE $HPAGEUNIT)
+  COUNT=$(( $MEMORY / $HSIZE ))
+  case "$2" in
+    prepare)
+      addhugepages $HPAGESIZE $HPAGEUNIT $COUNT $HPAGENODE
+      ;;
+    release)
+      addhugepages $HPAGESIZE $HPAGEUNIT -$COUNT $HPAGENODE
+      ;;
+  esac
+
+  rm $XML
+  exit 0
+fi
+
+# The harder case, get the guest NUMA id and node memory size from the cpu/numa
+# tags.  Cross reference this to numatune options to get the host nodeset for
+# the memory.  Then, figure out which page sizes are allowed on which host
+# nodes.  The libvirt xml is not well specified here, a numatune nodeset can
+# span host nodes, which are not required to use the same page size.  We can
+# also still have hugepages that are not associated with a node.  This code
+# assumes that lack of a nodeset matches anything; we don't look for the "best"
+# match and that the numatune nodeset must be fully contained within a host
+# hugepages size set, or else we skip it.
+for ID in $($XMLLINT "/domain/cpu/numa/cell/@id" $XML 2>/dev/null); do
+  BASE="/domain/cpu/numa/cell"
+  MEMORY=$($XMLLINT "string($BASE[@$ID]/@memory)" $XML 2>/dev/null)
+  UNIT=$($XMLLINT "string($BASE[@$ID]/@unit)" $XML 2>/dev/null)
+  if [ -z "$UNIT" ]; then
+    UNIT=KiB
+  fi
+  CELLID=$($XMLLINT "string($BASE[@$ID]/@id)" $XML 2>/dev/null)
+  CELLID=$(echo cellid=\"$CELLID\")
+
+  unset POLICY
+  unset CELLNODE
+
+  # First choice, use a memnode matching this cellid
+  BASE="/domain/numatune/memnode"
+  $XMLLINT "$BASE" $XML > /dev/null 2>&1
+  if [ $? -eq 0 ]; then
+    $XMLLINT "$BASE[@$CELLID]" $XML > /dev/null 2>&1
+    if [ $? -eq 0 ]; then
+      POLICY=$($XMLLINT "string($BASE[@$CELLID]/@mode)" $XML 2>/dev/null)
+      CELLNODE=$($XMLLINT "string($BASE[@$CELLID]/@nodeset)" $XML 2>/dev/null)
+      if [ -n "$CELLNODE" ] && [ -z "$POLICY" ]; then
+        POLICY=strict
+      fi
+    fi
+  fi
+
+  # Second choice, use the memory element as the default (if exists)
+  if [ -z "$POLICY" ]; then
+    BASE="/domain/numatune/memory"
+    $XMLLINT "$BASE" $XML > /dev/null 2>&1
+    if [ $? -eq 0 ]; then
+      POLICY=$($XMLLINT "string($BASE/@mode)" $XML 2>/dev/null)
+      CELLNODE=$($XMLLINT "string($BASE/@nodeset)" $XML 2>/dev/null)
+      if [ -n "$CELLNODE" ] && [ -z "$POLICY" ]; then
+        POLICY=strict
+      fi
+    fi
+  fi
+
+  FOUND=0
+  BASE="/domain/memoryBacking/hugepages/page"
+  $XMLLINT "$BASE/@size" $XML > /dev/null 2>&1
+  if [ $? -ne 0 ]; then
+    HPAGESIZE=$(grep Hugepagesize /proc/meminfo | awk '{print $2}')
+    HPAGEUNIT=$(grep Hugepagesize /proc/meminfo | awk '{print $3}')
+    FOUND=1
+  else
+    for SIZE in $($XMLLINT "$BASE/@size" $XML 2>/dev/null); do
+      HPAGESIZE=$($XMLLINT "string($BASE[@$SIZE]/@size)" $XML 2>/dev/null)
+      HPAGEUNIT=$($XMLLINT "string($BASE[@$SIZE]/@unit)" $XML 2>/dev/null)
+      HPAGENODE=$($XMLLINT "string($BASE[@$SIZE]/@nodeset)" $XML 2>/dev/null)
+
+      # If the host hugepage size has no node association, we assume it's
+      # global.  If the guest node has no hostnode association take the first
+      # available hugepage entry.
+      if [ -z "$HPAGENODE" ] || [ -z "$CELLNODE" ]; then
+        FOUND=1
+        break
+      fi
+
+      # Create an arry for the nodeset for the hugepages
+      declare -a NODEMAP=()
+      for i in $(parserange $HPAGENODE); do
+        NODEMAP[$i]=$i
+      done
+
+      FAIL=0
+      # Verify that each node of the numatune nodeset is accounted for
+      for i in $(parserange $CELLNODE); do
+        if [ -z "${NODEMAP[$i]}" ]; then
+          FAIL=1
+          break
+        fi
+      done
+      if [ $FAIL -ne 0 ]; then
+        continue
+      fi
+
+      FOUND=1
+      break
+    done
+  fi
+
+  if [ $FOUND -eq 0 ]; then
+    continue
+  fi
+
+  MEMORY=$(parsesize $MEMORY $UNIT)
+  HSIZE=$(parsesize $HPAGESIZE $HPAGEUNIT)
+  COUNT=$(( $MEMORY / $HSIZE ))
+  case "$2" in
+    prepare)
+      addhugepages $HPAGESIZE $HPAGEUNIT $COUNT $CELLNODE $POLICY
+      ;;
+    release)
+      addhugepages $HPAGESIZE $HPAGEUNIT -$COUNT $CELLNODE $POLICY
+      ;;
+  esac
+done
+
+rm $XML
+exit 0
diff --git a/libvirt.spec.in b/libvirt.spec.in
index dcd174a..ac9f07b 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -2248,6 +2248,9 @@ exit 0
 
 %dir %{_datadir}/libvirt/
 %dir %{_datadir}/libvirt/schemas/
+%dir %{_datadir}/libvirt/hooks/
+%dir %{_datadir}/libvirt/hooks/daemon.d/
+%dir %{_datadir}/libvirt/hooks/qemu.d/
 
 %{_datadir}/libvirt/schemas/basictypes.rng
 %{_datadir}/libvirt/schemas/capability.rng
@@ -2268,6 +2271,16 @@ exit 0
 %{_datadir}/libvirt/cpu_map.xml
 %{_datadir}/libvirt/libvirtLogo.png
 
+%{_datadir}/libvirt/hooks/README
+%{_datadir}/libvirt/hooks/libvirt-hook-config.xml
+%{_datadir}/libvirt/hooks/functions.sh
+%attr(0755, root, root) %{_datadir}/libvirt/hooks/daemon
+%attr(0755, root, root) %{_datadir}/libvirt/hooks/qemu
+
+%attr(0755, root, root) %{_datadir}/libvirt/hooks/daemon.d/static-hugepages.sh
+
+%attr(0755, root, root) %{_datadir}/libvirt/hooks/qemu.d/dynamic-hugepages.sh
+
 %if %{with_systemd}
 %{_unitdir}/libvirt-guests.service
 %else




More information about the libvir-list mailing list