[libvirt] [PATCH 09/11] Add initial docs about the lock managers

Daniel P. Berrange berrange at redhat.com
Mon Jan 24 15:13:08 UTC 2011


---
 docs/internals-locking.html.in |  301 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 301 insertions(+), 0 deletions(-)
 create mode 100644 docs/internals-locking.html.in

diff --git a/docs/internals-locking.html.in b/docs/internals-locking.html.in
new file mode 100644
index 0000000..90054f0
--- /dev/null
+++ b/docs/internals-locking.html.in
@@ -0,0 +1,301 @@
+<html>
+  <body>
+    <h1>Resource Lock Manager</h1>
+
+    <ul id="toc"></ul>
+
+    <p>
+      This page describes the design of the resource lock manager
+      that is used for locking disk images with the QEMU driver.
+    </p>
+
+    <h2><a name="goals">Goals</a></h2>
+
+    <p>
+      The high level goal is to prevent the same disk image being
+      used by more than one QEMU instance at a time (unless the
+      disk is marked as sharable, or readonly). The scenarios
+      to be prevented are thus:
+    </p>
+
+    <ol>
+      <li>
+	Two different guests running configured to point at the
+	same disk image.
+      </li>
+      <li>
+	One guest being started more than once on two different
+	machines due to admin mistake
+      </li>
+      <li>
+	One guest being started more than once on a single machine
+	due to libvirt driver bug on aa single machine.
+      </li>
+    </ol>
+
+    <h2><a name="requirement">Requirements</a></h2>
+
+    <p>
+      The high level goal leads to a set of requirements
+      for the lock manager design
+    </p>
+
+    <ol>
+      <li>
+	A lock must be held on a disk whenever a QEMU process
+	has the disk open
+      </li>
+      <li>
+	The lock scheme must allow QEMU to be configured with
+	readonly, shared write, or exclusive writable disks
+      </li>
+      <li>
+	A lock must be held on a disk whenever libvirtd makes
+	changes to user/group ownership and SELinux labelling.
+      </li>
+      <li>
+	At least one locking impl must allow use of libvirtd on
+	a single host without any admin config tasks
+      </li>
+      <li>
+	A lock handover must be performed during the migration
+	process where 2 QEMU processes will have the same disk
+	open concurrently.
+      </li>
+      <li>
+	The lock manager must be able to identify and kill the
+	process accessing the resource if the lock is revoked.
+      </li>
+    </ol>
+
+    <h2><a name="design">Design</a></h2>
+
+    <p>
+      The requirements call for a design with two distinct lockspaces:
+    </p>
+
+    <ol>
+      <li>
+	The <strong>primary lockspace</strong> is used to protect the content of
+	disk images. This will honour the disk sharing modes to
+	allow readonly/shared disk to be assigned to multiple
+	guests concurrently.
+      </li>
+      <li>
+	The <strong>secondary lockspace</strong> is used to protect the metadata
+	of disk images. This lock will be held whenever file
+	permissions / ownership / attributes are changed, and
+	is always exclusive, regardless of sharing mode. The
+	primary lock will be held prior to obtaining the secondary
+	lock.
+      </li>
+    </ol>
+
+    <p>
+      Within each lockspace the following operations will need to be
+      supported
+    </p>
+
+    <ul>
+      <li>
+	<strong>Acquire object lock</strong>
+	Acquire locks on all resources initially
+	registered against an object
+      </li>
+      <li>
+	<strong>Release object lock</strong>
+	Release locks on all resources currently
+	registered against an object
+      </li>
+      <li>
+	<strong>Associate object lock</strong>
+	Associate the current process with an existing
+	set of locks for an object
+      </li>
+      <li>
+	<strong>Deassociate object lock</strong>
+	Deassociate the current process with an
+	existing set of locks for an object.
+      </li>
+      <li>
+	<strong>Register resource</strong>
+	Register an initial resource against an object
+      </li>
+      <li>
+	<strong>Get object lock state</strong>
+	Obtain an representation of the current object
+	lock state.
+      </li>
+      <li>
+	<strong>Acquire a resource lock</strong>
+	Register and acquire a lock for a resource
+	to be added to a locked object.
+      </li>
+      <li>
+	<strong>Release a resource lock</strong>
+	Dereigster and release a lock for a resource
+	to be removed from a lock object
+      </li>
+    </ul>
+
+    <h2><a name="impl">Plugin Implementations</a></h2>
+
+    <p>
+      Lock manager implementations are provided as LGPLv2+
+      licensed, dlopen()able library modules. A different
+      lock manager implementation may be used
+      for the primary and secondary lockspaces. With the
+      QEMU driver, these can be configured via the
+      <code>/etc/libvirt/qemu.conf</code> configuration
+      file by specifying the lock manager name.
+    </p>
+
+    <pre>
+      contentLockManager="fcntl"
+      metadataLockManager="fcntl"
+    </pre>
+
+    <p>
+      Lock manager implmentations are free to support
+      both content and metadata locks, however, if the
+      plugin author is only able to handle one lockspace,
+      the other can be delegated to the standard fcntl
+      lock manager. The QEMU driver will load the lock
+      manager plugin binaries from the following location
+    </p>
+
+    <pre>
+/usr/{lib,lib64}/libvirt/lock_manager/$NAME.so
+</pre>
+
+    <p>
+      The lock manager plugin must export a single ELF
+      symbol named <code>virLockDriverImpl</code>, which is
+      a static instance of the <code>virLockDriver</code>
+      struct. The struct is defined in the header file
+    </p>
+
+    <pre>
+      #include <libvirt/plugins/lock_manager.h>
+    </pre>
+
+    <p>
+      All callbacks in the struct must be initialized
+      to non-NULL pointers. The semantics of each
+      callback are defined in the API docs embedded
+      in the previously mentioned header file
+    </p>
+
+    <h2><a name="usagePatterns">Lock usage patterns</a></h2>
+
+    <p>
+      The following psuedo code illustrates the common
+      patterns of operations invoked on the lock
+      manager plugin callbacks.
+    </p>
+
+    <h3><a name="usageLockAcquire">Lock acquisition</a></h3>
+
+    <p>
+      Lock acquisition will always be performed from the
+      process that is to own the lock. This is typically
+      the QEMU child process, in between the fork+exec
+      pairing, but it may occassionally be held directly
+      by libvirtd.
+    </p>
+
+    <pre>
+      mgr = virLockManagerNew(lockPlugin,
+                              VIR_LOCK_MANAGER_MODE_CONTENT,
+                              VIR_LOCK_MANAGER_TYPE_DOMAIN);
+      virLockManagerSetParameter(mgr, "uuid", $uuid);
+      virLockManagerSetParameter(mgr, "name", $name);
+
+      foreach (initial disks)
+          virLockManagerAddResource(mgr,
+                                    VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK,
+                                    $path, $flags);
+
+      if (virLockManagerAcquireObject(mgr) < 0)
+        ...abort...
+    </pre>
+
+    <p>
+      The lock is implicitly released when the process
+      that acquired it exits, however, a process may
+      voluntarily give up the lock by running
+    </p>
+
+    <pre>
+      virLockManagerReleaseObject(mgr);
+    </pre>
+
+    <h3><a name="usageLockAttach">Lock attachment</a></h3>
+
+    <p>
+      Any time a process needs todo work on behalf of
+      another process that holds a lock, it will associate
+      itself with the existing lock. This sequence is
+      identical to the previous one, except for the
+      last step.
+    </p>
+
+
+    <pre>
+      mgr = virLockManagerNew(contentLock,
+                              VIR_LOCK_MANAGER_MODE_CONTENT,
+                              VIR_LOCK_MANAGER_TYPE_DOMAIN);
+      virLockManagerSetParameter(mgr, "uuid", $uuid);
+      virLockManagerSetParameter(mgr, "name", $name);
+
+      foreach (current disks)
+          virLockManagerAddResource(mgr,
+                                    VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK,
+                                    $path, $flags);
+
+      if (virLockManagerAttachObject(mgr, $pid) < 0)
+        ...abort...
+    </pre>
+
+    <p>
+      A lock association will always be explicitly broken
+      by running
+    </p>
+
+    <pre>
+      virLockManagerDetachObject(mgr, $pid);
+    </pre>
+
+
+    <h3><a name="usageLiveResourceChange">Live resource changes</a></h3>
+
+    <p>
+      When adding a resource to an existing locked object (eg to
+      hotplug a disk into a VM), the lock manager will first
+      attach to the locked object, acquire a lock on the
+      new resource, then detach from the locked object.
+    </p>
+
+    <pre>
+      ... initial glue ...
+      if (virLockManagerAttachObject(mgr, $pid) < 0)
+        ...abort...
+
+      if (virLockManagerAcquireResource(mgr,
+                                        VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK,
+                                        $path, $flags) < 0)
+        ...abort...
+
+      ...assign resource to object
+
+      virLockManagerDetachObject(mgr, $pid)
+    </pre>
+
+    <p>
+      Removing a resource from an existing object is an identical
+      process, but with <code>virLockManagerReleaseResource</code>
+      invoked instead
+    </p>
+
+  </body>
+</html>
-- 
1.7.3.4




More information about the libvir-list mailing list