[Libguestfs] [RFC nbdkit PATCH] multi-conn: New filter

Wed Feb 24 17:33:00 UTC 2021

Implement a TODO item of emulating multi-connection consistency via
multiple plugin flush calls to allow a client to assume that a flush
on a single connection is good enough.  This also gives us some
fine-tuning over whether to advertise the bit, including some setups
that are unsafe but may be useful in timing tests.

Testing is interesting: I used the sh plugin to implement a server
that intentionally keeps a per-connection cache.

Note that this filter assumes that multiple connections will still
share the same data (other than caching effects); effects are not
guaranteed when trying to mix it with more exotic plugins like info
that violate that premise.
---

I'm still working on the test; the sh plugin is good enough that it
does what I want when playing with it manually, but I still need to
write up various scenarios in test-multi-conn.sh to match what I've
played with manually.

I'm open to feedback on the set of options I've exposed during .config
(too many, not enough, better names?)  Right now, it is:
multi-conn-mode=auto|plugin|disable|emulate|unsafe
multi-conn-track-dirty=fast|connection|off

 .../multi-conn/nbdkit-multi-conn-filter.pod   | 169 +++++++
 configure.ac                                  |   4 +-
 filters/multi-conn/Makefile.am                |  68 +++
 tests/Makefile.am                             |  11 +-
 filters/multi-conn/multi-conn.c               | 467 ++++++++++++++++++
 tests/test-multi-conn-plugin.sh               | 121 +++++
 tests/test-multi-conn.sh                      |  85 ++++
 TODO                                          |   7 -
 8 files changed, 923 insertions(+), 9 deletions(-)
 create mode 100644 filters/multi-conn/nbdkit-multi-conn-filter.pod
 create mode 100644 filters/multi-conn/Makefile.am
 create mode 100644 filters/multi-conn/multi-conn.c
 create mode 100755 tests/test-multi-conn-plugin.sh
 create mode 100755 tests/test-multi-conn.sh

diff --git a/filters/multi-conn/nbdkit-multi-conn-filter.pod b/filters/multi-conn/nbdkit-multi-conn-filter.pod
new file mode 100644
index 00000000..ae2873df
--- /dev/null
+++ b/filters/multi-conn/nbdkit-multi-conn-filter.pod
@@ -0,0 +1,169 @@
+=head1 NAME
+
+nbdkit-multi-conn-filter - nbdkit multi-conn filter
+
+=head1 SYNOPSIS
+
+ nbdkit --filter=multi-conn plugin \
+   [multi-conn-mode=MODE] [multi-conn-track-dirty=LEVEL] [plugin-args...]
+
+=head1 DESCRIPTION
+
+C<nbdkit-multi-conn-filter> is a filter that enables alterations to
+the server's advertisement of NBD_FLAG_MULTI_CONN.  When a server
+permits multiple simultaneous clients, and sets this flag, a client
+may assume that all connections see a consistent view (after getting
+the server reply from a write in one connection, sending a flush
+command on a single connection and waiting for that reply then
+guarantees that all connections will then see the just-written data).
+If the flag is not advertised, a client must presume that separate
+connections may have utilized independent caches, and where a flush on
+one connection does not affect the cache of a second connection.
+
+The main use of this filter is to emulate consistent semantics across
+multiple connections when not already provided by a plugin, although
+it also has additional modes useful for evaluating performance and
+correctness of client and plugin multi-conn behaviors.  This filter
+assumes that multiple connections to a plugin will eventually share
+data, other than any caching effects; it is not suitable for use with
+a plugin that produces completely independent data per connection.
+
+Additional control over the behavior of client flush commands is
+possible by combining this filter with L<nbdkit-fua-filter(1)>.
+
+=head1 PARAMETERS
+
+=over 4
+
+=item B<multi-conn-mode=auto>
+
+This filter defaults to B<auto> mode.  If the plugin advertises
+multi-conn, then this filter behaves the same as B<plugin> mode;
+otherwise, this filter behaves the same as B<emulate> mode.  Either
+way, this mode advertises NBD_FLAG_MULTI_CONN to the client.
+
+=item B<multi-conn-mode=emulate>
+
+When B<emulate> mode is chosen, then this filter tracks all parallel
+connections.  When a client issues a flush command over any one
+connection (including a write command with the FUA (force unit access)
+flag set), the filter then replicates that flush across each
+connection to the plugin (although the amount of plugin calls can be
+tuned by adjusting B<multi-conn-track-dirty>).  This assumes that
+flushing each connection is enough to clear any per-connection cached
+data, in order to give each connection a consistent view of the image;
+therefore, this mode advertises NBD_FLAG_MULTI_CONN to the client.
+
+Note that in this mode, a client will be unable to connect if the
+plugin lacks support for flush, as there would be no way to emulate
+cross-connection consistency.
+
+=item B<multi-conn-mode=disable>
+
+When B<disable> mode is chosen, this filter disables advertisement of
+NBD_FLAG_MULTI_CONN to the client, even if the plugin supports it, and
+does not replicate flush commands across connections.  This is useful
+for testing whether a client with multiple connections properly sends
+multiple flushes in order to overcome per-connection caching.
+
+=item B<multi-conn-mode=plugin>
+
+When B<plugin> mode is chosen, the filter does not change whether
+NBD_FLAG_MULTI_CONN is advertised by the plugin, and does not
+replicate flush commands across connections; but still honors
+B<multi-conn-track-dirty> for minimizing the number of flush commands
+passed on to the plugin.
+
+=item B<multi-conn-mode=unsafe>
+
+When B<unsafe> mode is chosen, this filter blindly advertises
+NBD_FLAG_MULTI_CONN to the client even if the plugin lacks support.
+This is dangerous, and risks data corruption if the client makes
+assumptions about data consistency that were not actually met.
+
+=item B<multi-conn-track-dirty=fast>
+
+When dirty tracking is set to B<fast>, the filter tracks whether any
+connection has caused the image to be dirty (any write, zero, or trim
+commands since the last flush, regardless of connection); if all
+connections are clean, a client flush command is ignored rather than
+sent on to the plugin.  In this mode, a flush action on one connection
+marks all other connections as clean, regardless of whether the filter
+actually advertised NBD_FLAG_MULTI_CONN, which can result in less
+activity when a client sends multiple flushes rather than taking
+advantage of multi-conn semantics.  This is safe with
+B<multi-conn-mode=emulate>, but potentially unsafe with
+B<multi-conn-mode=plugin> when the plugin did not advertise
+multi-conn, as it does not track whether a read may have cached stale
+data prior to a flush.
+
+=item B<multi-conn-track-dirty=connection>
+
+Dirty tracking is set to B<connection> by default, where the filter
+tracks whether a given connection is dirty (any write, zero, or trim
+commands since the last flush on the given connection, and any read
+since the last flush on any other connection); if the connection is
+clean, a flush command to that connection (whether directly from the
+client, or replicated by B<multi-conn-mode=emulate> is ignored rather
+than sent on to the plugin.  This mode may result in more flush calls
+than B<multi-conn-track-dirty=fast>, but in turn is safe to use with
+B<multi-conn-mode=plugin>.
+
+=item B<multi-conn-track-dirty=off>
+
+When dirty tracking is set to B<off>, all flush commands from the
+client are passed on to the plugin, regardless of whether the flush
+would be needed for consistency.  Note that when combined with
+B<multi-conn-mode=emulate>, a client which disregards
+NBD_FLAG_MULTI_CONN by flushing on each connection itself results in a
+quadratic number of flush operations on the plugin.
+
+=back
+
+=head1 EXAMPLES
+
+Provide consistent cross-connection flush semantics on top of a plugin
+that lacks it natively:
+
+ nbdkit --filter=multi-conn split file.part1 file.part2
+
+Minimize the number of expensive flush operations performed when
+utilizing a plugin that has multi-conn consistency from a client that
+blindly flushes across every connection:
+
+ nbdkit --filter=multi-conn file multi-conn-mode=plugin \
+   multi-conn-track-dirty=fast disk.img
+
+=head1 FILES
+
+=over 4
+
+=item F<$filterdir/nbdkit-multi-conn-filter.so>
+
+The filter.
+
+Use C<nbdkit --dump-config> to find the location of C<$filterdir>.
+
+=back
+
+=head1 VERSION
+
+C<nbdkit-multi-conn-filter> first appeared in nbdkit 1.26.
+
+=head1 SEE ALSO
+
+L<nbdkit(1)>,
+L<nbdkit-file-plugin(1)>,
+L<nbdkit-filter(3)>,
+L<nbdkit-fua-filter(1)>,
+L<nbdkit-nocache-filter(1)>,
+L<nbdkit-noextents-filter(1)>,
+L<nbdkit-nozero-filter(1)>.
+
+=head1 AUTHORS
+
+Eric Blake
+
+=head1 COPYRIGHT
+
+Copyright (C) 2018-2021 Red Hat Inc.
diff --git a/configure.ac b/configure.ac
index cb18dd88..2b3e214e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1,5 +1,5 @@
 # nbdkit
-# Copyright (C) 2013-2020 Red Hat Inc.
+# Copyright (C) 2013-2021 Red Hat Inc.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are
@@ -128,6 +128,7 @@ filters="\
         ip \
         limit \
         log \
+	multi-conn \
         nocache \
         noextents \
         nofilter \
@@ -1259,6 +1260,7 @@ AC_CONFIG_FILES([Makefile
                  filters/ip/Makefile
                  filters/limit/Makefile
                  filters/log/Makefile
+		 filters/multi-conn/Makefile
                  filters/nocache/Makefile
                  filters/noextents/Makefile
                  filters/nofilter/Makefile
diff --git a/filters/multi-conn/Makefile.am b/filters/multi-conn/Makefile.am
new file mode 100644
index 00000000..778b8947
--- /dev/null
+++ b/filters/multi-conn/Makefile.am
@@ -0,0 +1,68 @@
+# nbdkit
+# Copyright (C) 2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+include $(top_srcdir)/common-rules.mk
+
+EXTRA_DIST = nbdkit-multi-conn-filter.pod
+
+filter_LTLIBRARIES = nbdkit-multi-conn-filter.la
+
+nbdkit_multi_conn_filter_la_SOURCES = \
+	multi-conn.c \
+	$(top_srcdir)/include/nbdkit-filter.h \
+	$(NULL)
+
+nbdkit_multi_conn_filter_la_CPPFLAGS = \
+	-I$(top_srcdir)/include \
+	-I$(top_srcdir)/common/include \
+	-I$(top_srcdir)/common/utils \
+	$(NULL)
+nbdkit_multi_conn_filter_la_CFLAGS = $(WARNINGS_CFLAGS)
+nbdkit_multi_conn_filter_la_LIBADD = \
+	$(top_builddir)/common/utils/libutils.la \
+	$(IMPORT_LIBRARY_ON_WINDOWS) \
+	$(NULL)
+nbdkit_multi_conn_filter_la_LDFLAGS = \
+	-module -avoid-version -shared $(NO_UNDEFINED_ON_WINDOWS) \
+	-Wl,--version-script=$(top_srcdir)/filters/filters.syms \
+	$(NULL)
+
+if HAVE_POD
+
+man_MANS = nbdkit-multi-conn-filter.1
+CLEANFILES += $(man_MANS)
+
+nbdkit-multi-conn-filter.1: nbdkit-multi-conn-filter.pod
+	$(PODWRAPPER) --section=1 --man $@ \
+	    --html $(top_builddir)/html/$@.html \
+	    $<
+
+endif HAVE_POD
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 70898f20..4b3ee65c 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1,5 +1,5 @@
 # nbdkit
-# Copyright (C) 2013-2020 Red Hat Inc.
+# Copyright (C) 2013-2021 Red Hat Inc.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are
@@ -1538,6 +1538,15 @@ EXTRA_DIST += \
 	test-log-script-info.sh \
 	$(NULL)

+# multi-conn filter test.
+TESTS += \
+	test-multi-conn.sh \
+	$(NULL)
+EXTRA_DIST += \
+	test-multi-conn-plugin.sh \
+	test-multi-conn.sh \
+	$(NULL)
+
 # nofilter test.
 TESTS += test-nofilter.sh
 EXTRA_DIST += test-nofilter.sh
diff --git a/filters/multi-conn/multi-conn.c b/filters/multi-conn/multi-conn.c
new file mode 100644
index 00000000..3b244cb7
--- /dev/null
+++ b/filters/multi-conn/multi-conn.c
@@ -0,0 +1,467 @@
+/* nbdkit
+ * Copyright (C) 2021 Red Hat Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * * Neither the name of Red Hat nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdbool.h>
+#include <assert.h>
+#include <pthread.h>
+
+#include <nbdkit-filter.h>
+
+#include "cleanup.h"
+#include "vector.h"
+
+/* Track results of .config */
+static enum MultiConnMode {
+  AUTO,
+  EMULATE,
+  PLUGIN,
+  UNSAFE,
+  DISABLE,
+} mode;
+
+static enum TrackDirtyMode {
+  CONN,
+  FAST,
+  OFF,
+} track;
+
+enum dirty {
+  WRITE = 1, /* A write may have populated a cache */
+  READ = 2, /* A read may have populated a cache */
+};
+
+/* Coordination between connections. */
+static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
+
+/* The list of handles to active connections. */
+struct handle {
+  struct nbdkit_next_ops *next_ops;
+  void *nxdata;
+  enum MultiConnMode mode; /* Runtime resolution of mode==AUTO */
+  enum dirty dirty; /* What aspects of this connection are dirty */
+};
+DEFINE_VECTOR_TYPE(conns_vector, struct handle *);
+static conns_vector conns = empty_vector;
+static bool dirty; /* True if any connection is dirty */
+
+/* Accept 'multi-conn-mode=mode' and 'multi-conn-track-dirty=level' */
+static int
+multi_conn_config (nbdkit_next_config *next, void *nxdata,
+                   const char *key, const char *value)
+{
+  if (strcmp (key, "multi-conn-mode") == 0) {
+    if (strcmp (value, "auto") == 0)
+      mode = AUTO;
+    else if (strcmp (value, "emulate") == 0)
+      mode = EMULATE;
+    else if (strcmp (value, "plugin") == 0)
+      mode = PLUGIN;
+    else if (strcmp (value, "disable") == 0)
+      mode = DISABLE;
+    else if (strcmp (value, "unsafe") == 0)
+      mode = UNSAFE;
+    else {
+      nbdkit_error ("unknown multi-conn mode '%s'", value);
+      return -1;
+    }
+    return 0;
+  }
+  else if (strcmp (key, "multi-conn-track-dirty") == 0) {
+    if (strcmp (value, "connection") == 0 ||
+        strcmp (value, "conn") == 0)
+      track = CONN;
+    else if (strcmp (value, "fast") == 0)
+      track = FAST;
+    else if (strcmp (value, "off") == 0)
+      track = OFF;
+    else {
+      nbdkit_error ("unknown multi-conn track-dirty setting '%s'", value);
+      return -1;
+    }
+    return 0;
+  }
+  return next (nxdata, key, value);
+}
+
+#define multi_conn_config_help \
+  "multi-conn-mode=<MODE>          'emulate' (default), 'plugin', 'disable',\n" \
+  "                                or 'unsafe'.\n" \
+  "multi-conn-track-dirty=<LEVEL>  'conn' (default), 'fast', or 'off'.\n"
+
+static void *
+multi_conn_open (nbdkit_next_open *next, void *nxdata,
+                 int readonly, const char *exportname, int is_tls)
+{
+  struct handle *h;
+
+  if (next (nxdata, readonly, exportname) == -1)
+    return NULL;
+
+  /* Allocate here, but populate and insert into list in .prepare */
+  h = calloc (1, sizeof *h);
+  if (h == NULL) {
+    nbdkit_error ("calloc: %m");
+    return NULL;
+  }
+  return h;
+}
+
+static int
+multi_conn_prepare (struct nbdkit_next_ops *next_ops, void *nxdata,
+                    void *handle, int readonly)
+{
+  struct handle *h = handle;
+  int r;
+
+  h->next_ops = next_ops;
+  h->nxdata = nxdata;
+  if (mode == AUTO) {
+    r = next_ops->can_multi_conn (nxdata);
+    if (r == -1)
+      return -1;
+    if (r == 0)
+      h->mode = EMULATE;
+    else
+      h->mode = PLUGIN;
+  }
+  else
+    h->mode = mode;
+  if (h->mode == EMULATE && next_ops->can_flush (nxdata) != 1) {
+    nbdkit_error ("emulating multi-conn requires working flush");
+    return -1;
+  }
+
+  ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+  conns_vector_append (&conns, h);
+  return 0;
+}
+
+static int
+multi_conn_finalize (struct nbdkit_next_ops *next_ops, void *nxdata,
+                     void *handle)
+{
+  struct handle *h = handle;
+
+  ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+  assert (h->next_ops == next_ops);
+  assert (h->nxdata == nxdata);
+
+  /* XXX should we add a config param to flush if the client forgot? */
+  for (size_t i = 0; i < conns.size; i++) {
+    if (conns.ptr[i] == h) {
+      conns_vector_remove (&conns, i);
+      break;
+    }
+  }
+  return 0;
+}
+
+static void
+multi_conn_close (void *handle)
+{
+  free (handle);
+}
+
+static int
+multi_conn_can_fua (struct nbdkit_next_ops *next_ops, void *nxdata,
+                    void *handle)
+{
+  /* If the backend has native FUA support but is not multi-conn
+   * consistent, and we have to flush on every connection, then we are
+   * better off advertising emulated fua rather than native.
+   */
+  struct handle *h = handle;
+  int fua = next_ops->can_fua (nxdata);
+
+  assert (h->mode != AUTO);
+  if (fua == NBDKIT_FUA_NATIVE && h->mode == EMULATE)
+    return NBDKIT_FUA_EMULATE;
+  return fua;
+}
+
+static int
+multi_conn_can_multi_conn (struct nbdkit_next_ops *next_ops, void *nxdata,
+                           void *handle)
+{
+  struct handle *h = handle;
+
+  switch (h->mode) {
+  case EMULATE:
+    return 1;
+  case PLUGIN:
+    return next_ops->can_multi_conn (nxdata);
+  case DISABLE:
+    return 0;
+  case UNSAFE:
+    return 1;
+  case AUTO: /* Not possible, see .prepare */
+  default:
+    abort ();
+  }
+}
+
+static void
+mark_dirty (struct handle *h, bool is_read)
+{
+  /* No need to grab lock here: the NBD spec is clear that a client
+   * must wait for the response to a flush before sending the next
+   * command that expects to see the result of that flush, so any race
+   * in accessing dirty can be traced back to the client improperly
+   * sending a flush in parallel with other live commands.
+   */
+  switch (track) {
+  case CONN:
+    h->dirty |= is_read ? READ : WRITE;
+    break;
+  case FAST:
+    if (!is_read)
+      dirty = true;
+    break;
+  case OFF:
+    break;
+  default:
+    abort ();
+  }
+}
+
+static int
+multi_conn_flush (struct nbdkit_next_ops *next_ops, void *nxdata,
+                  void *handle, uint32_t flags, int *err);
+
+static int
+multi_conn_pread (struct nbdkit_next_ops *next_ops, void *nxdata,
+                  void *handle, void *buf, uint32_t count, uint64_t offs,
+                  uint32_t flags, int *err)
+{
+  struct handle *h = handle;
+
+  mark_dirty (h, true);
+  return next_ops->pread (nxdata, buf, count, offs, flags, err);
+}
+
+static int
+multi_conn_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata,
+                   void *handle, const void *buf, uint32_t count,
+                   uint64_t offs, uint32_t flags, int *err)
+{
+  struct handle *h = handle;
+  bool need_flush = false;
+
+  if (flags & NBDKIT_FLAG_FUA) {
+    if (h->mode == EMULATE) {
+      mark_dirty (h, false);
+      need_flush = true;
+      flags &= ~NBDKIT_FLAG_FUA;
+    }
+  }
+  else
+    mark_dirty (h, false);
+
+  if (next_ops->pwrite (nxdata, buf, count, offs, flags, err) == -1)
+    return -1;
+  if (need_flush)
+    return multi_conn_flush (next_ops, nxdata, h, 0, err);
+  return 0;
+}
+
+static int
+multi_conn_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
+                 void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+                 int *err)
+{
+  struct handle *h = handle;
+  bool need_flush = false;
+
+  if (flags & NBDKIT_FLAG_FUA) {
+    if (h->mode == EMULATE) {
+      mark_dirty (h, false);
+      need_flush = true;
+      flags &= ~NBDKIT_FLAG_FUA;
+    }
+  }
+  else
+    mark_dirty (h, false);
+
+  if (next_ops->zero (nxdata, count, offs, flags, err) == -1)
+    return -1;
+  if (need_flush)
+    return multi_conn_flush (next_ops, nxdata, h, 0, err);
+  return 0;
+}
+
+static int
+multi_conn_trim (struct nbdkit_next_ops *next_ops, void *nxdata,
+                 void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+                 int *err)
+{
+  struct handle *h = handle;
+  bool need_flush = false;
+
+  if (flags & NBDKIT_FLAG_FUA) {
+    if (h->mode == EMULATE) {
+      mark_dirty (h, false);
+      need_flush = true;
+      flags &= ~NBDKIT_FLAG_FUA;
+    }
+  }
+  else
+    mark_dirty (h, false);
+
+  if (next_ops->trim (nxdata, count, offs, flags, err) == -1)
+    return -1;
+  if (need_flush)
+    return multi_conn_flush (next_ops, nxdata, h, 0, err);
+  return 0;
+}
+
+static int
+multi_conn_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
+                  void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+                  int *err)
+{
+  struct handle *h = handle;
+
+  mark_dirty (h, true);
+  return next_ops->cache (nxdata, count, offs, flags, err);
+}
+
+static int
+multi_conn_flush (struct nbdkit_next_ops *next_ops, void *nxdata,
+                  void *handle, uint32_t flags, int *err)
+{
+  struct handle *h = handle, *h2;
+  size_t i;
+
+  if (h->mode == EMULATE) {
+    /* Optimize for the common case of a single connection: flush all
+     * writes on other connections, then flush both read and write on
+     * the current connection, then finally flush all other
+     * connections to avoid reads seeing stale data, skipping the
+     * flushes that make no difference according to dirty tracking.
+     */
+    bool updated = h->dirty & WRITE;
+
+    ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+    for (i = 0; i < conns.size; i++) {
+      h2 = conns.ptr[i];
+      if (h == h2)
+        continue;
+      if (dirty || h2->dirty & WRITE) {
+        if (h2->next_ops->flush (h2->nxdata, flags, err) == -1)
+          return -1;
+        h2->dirty &= ~WRITE;
+        updated = true;
+      }
+    }
+    if (dirty || updated) {
+      if (next_ops->flush (nxdata, flags, err) == -1)
+        return -1;
+    }
+    h->dirty = 0;
+    dirty = false;
+    for (i = 0; i < conns.size; i++) {
+      h2 = conns.ptr[i];
+      if (updated && h2->dirty & READ) {
+        assert (h != h2);
+        if (h2->next_ops->flush (h2->nxdata, flags, err) == -1)
+          return -1;
+      }
+      h2->dirty &= ~READ;
+    }
+  }
+  else {
+    /* !EMULATE: Check if the image is clean, allowing us to skip a flush. */
+    switch (track) {
+    case CONN:
+      if (!h->dirty)
+        return 0;
+      break;
+    case FAST:
+      if (!dirty)
+        return 0;
+      break;
+    case OFF:
+      break;
+    default:
+      abort ();
+    }
+    /* Perform the flush, then update dirty tracking. */
+    if (next_ops->flush (nxdata, flags, err) == -1)
+      return -1;
+    switch (track) {
+    case CONN:
+      if (next_ops->can_multi_conn (nxdata) == 1) {
+        ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+        for (i = 0; i < conns.size; i++)
+          conns.ptr[i]->dirty = 0;
+      }
+      else
+        h->dirty = 0;
+      break;
+    case FAST:
+      dirty = false;
+      break;
+    case OFF:
+      break;
+    default:
+      abort ();
+    }
+  }
+  return 0;
+}
+
+static struct nbdkit_filter filter = {
+  .name              = "multi-conn",
+  .longname          = "nbdkit multi-conn filter",
+  .config            = multi_conn_config,
+  .config_help       = multi_conn_config_help,
+  .open              = multi_conn_open,
+  .prepare           = multi_conn_prepare,
+  .finalize          = multi_conn_finalize,
+  .close             = multi_conn_close,
+  .can_fua           = multi_conn_can_fua,
+  .can_multi_conn    = multi_conn_can_multi_conn,
+  .pread             = multi_conn_pread,
+  .pwrite            = multi_conn_pwrite,
+  .trim              = multi_conn_trim,
+  .zero              = multi_conn_zero,
+  .cache             = multi_conn_cache,
+  .flush             = multi_conn_flush,
+};
+
+NBDKIT_REGISTER_FILTER(filter)
diff --git a/tests/test-multi-conn-plugin.sh b/tests/test-multi-conn-plugin.sh
new file mode 100755
index 00000000..c580b89a
--- /dev/null
+++ b/tests/test-multi-conn-plugin.sh
@@ -0,0 +1,121 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2018-2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+# Test plugin used by test-multi-conn.sh.
+# This plugin purposefully maintains a per-connection cache.
+# An optional parameter tightfua=true controls whether FUA acts on
+# just the given region, or on all pending ops in the current connection.
+# Note that an earlier cached write on one connection can overwrite a later
+# FUA write on another connection - this is okay (the client is buggy if
+# it ever sends overlapping writes without coordinating flushes and still
+# expects any particular write to occur last).
+
+fill_cache() {
+    if test ! -f "$tmpdir/$1"; then
+        cp "$tmpdir/0" "$tmpdir/$1"
+    fi
+}
+do_fua() {
+    case ,$4, in
+        *,fua,*)
+            if test -f "$tmpdir/strictfua"; then
+                dd of="$tmpdir/0" if="$tmpdir/$1" skip=$3 seek=$3 count=$2 \
+                  conv=notrunc iflag=count_bytes,skip_bytes oflag=seek_bytes
+            else
+                do_flush $1
+            fi ;;
+    esac
+}
+do_flush() {
+    if test -f "$tmpdir/$1-replay"; then
+        while read cnt off; do
+            dd of="$tmpdir/0" if="$tmpdir/$1" skip=$off seek=$off count=$cnt \
+               conv=notrunc iflag=count_bytes,skip_bytes oflag=seek_bytes
+        done < "$tmpdir/$1-replay"
+    fi
+    rm -f "$tmpdir/$1" "$tmpdir/$1-replay"
+}
+case "$1" in
+    config)
+        case $2 in
+            strictfua)
+                case $3 in
+                    true | on | 1) touch "$tmpdir/strictfua" ;;
+                    false | off | 0) ;;
+                    *) echo "unknown value for strictfua $3" >&2; exit 1 ;;
+                esac ;;
+            *) echo "unknown config key $2" >&2; exit 1 ;;
+        esac
+        ;;
+    get_ready)
+        printf "%-32s" 'Initial contents' > "$tmpdir/0"
+        echo 0 > "$tmpdir/counter"
+        ;;
+    get_size)
+        echo 32
+        ;;
+    can_write | can_zero | can_trim | can_flush)
+        exit 0
+        ;;
+    can_fua | can_cache)
+        echo native
+        ;;
+    open)
+        read i < "$tmpdir/counter"
+        echo $((i+1)) | tee "$tmpdir/counter"
+        ;;
+    pread)
+        fill_cache $2
+        dd if="$tmpdir/$2" skip=$4 count=$3 iflag=count_bytes,skip_bytes
+        ;;
+    cache)
+        fill_cache $2
+        ;;
+    pwrite)
+        fill_cache $2
+        dd of="$tmpdir/$2" seek=$4 conv=notrunc oflag=seek_bytes
+        echo $3 $4 >> "$tmpdir/$2-replay"
+        do_fua $2 $3 $4 $5
+        ;;
+    zero | trim)
+        fill_cache $2
+        dd of="$tmpdir/$2" if="/dev/zero" seek=$4 conv=notrunc oflag=seek_bytes
+        echo $3 $4 >> "$tmpdir/$2-replay"
+        do_fua $2 $3 $4 $5
+        ;;
+    flush)
+        do_flush $2
+        ;;
+    *)
+        exit 2
+        ;;
+esac
diff --git a/tests/test-multi-conn.sh b/tests/test-multi-conn.sh
new file mode 100755
index 00000000..01efd108
--- /dev/null
+++ b/tests/test-multi-conn.sh
@@ -0,0 +1,85 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2018-2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+# Demonstrate various multi-conn filter behaviors.
+
+source ./functions.sh
+set -e
+set -x
+
+requires_plugin sh
+requires nbdsh -u "nbd://nosuch" --version
+requires dd iflag=count_bytes </dev/null
+
+files="test-multi-conn.out test-multi-conn.stat"
+rm -f $files
+cleanup_fn rm -f $files
+
+fail=0
+export handles preamble handles uri
+uri= # will be set by --run later
+handles=2
+preamble='
+import os
+
+uri = os.environ["uri"]
+handles = int(os.environ["handles"])
+h = []
+for i in range(handles):
+  h.append(nbd.NBD())
+  h[i].connect_uri(uri)
+print(h[0].can_multi_conn())
+'
+
+# Demonstrate the caching required with plugin alone
+nbdkit -vf -U - sh test-multi-conn-plugin.sh \
+  --run 'nbdsh -c "$preamble" -c "
+"' > test-multi-conn.out || fail=1
+diff -u <(cat <<\EOF
+False
+EOF
+	 ) test-multi-conn.out || fail=1
+
+# Demonstrate that FUA alone does not have to sync full disk
+ # TODO
+# Demonstrate multi-conn defaults
+ # TODO
+# Use --filter=stats to show track-dirty effects
+nbdkit -vf -U - sh test-multi-conn-plugin.sh \
+  --filter=stats statsfile=test-multi-conn.stat \
+  --run 'nbdsh -c "$preamble" -c "
+h[0].flush()
+"' > test-multi-conn.out || fail=1
+cat test-multi-conn.stat
+grep 'flush: 1 ops' test-multi-conn.stat || fail=1
+
+exit $fail
diff --git a/TODO b/TODO
index d8dd7ef2..e41e38e8 100644
--- a/TODO
+++ b/TODO
@@ -206,13 +206,6 @@ Suggestions for filters
 * masking plugin features for testing clients (see 'nozero' and 'fua'
   filters for examples)

-* multi-conn filter to adjust advertisement of multi-conn bit. In
-  particular, if the plugin lacks .can_multi_conn, then .open/.close
-  track all open connections, and .flush and FUA flag will call
-  next_ops->flush() on all of them.  Conversely, if plugin supports
-  multi-conn, we can cache whether the image is dirty, and avoid
-  expense of next_ops->flush when it is clean.
-
 * "bandwidth quota" filter which would close a connection after it
   exceeded a certain amount of bandwidth up or down.

-- 
2.30.1