[Libguestfs] [nbdkit PATCH] cache: Advertise flush/fua/multi_conn if cache=unsafe
Richard W.M. Jones
rjones at redhat.com
Fri Feb 19 19:37:14 UTC 2021
On Thu, Feb 18, 2021 at 05:23:20PM -0600, Eric Blake wrote:
> When using cache=unsafe, we are already stating that the plugin will
> not necessarily ever get our cached data written back, but locally
> everything is already consistent (we grab a shared lock before
> consulting the cache), so we can behave as if client flush requests
> always works in a multi-conn safe manner, even without bothering with
> fdatasync() on our cache file.
>
> When using cache=writethrough, our behavior under multiple connections
> is at the mercy of the plugin (if connection A and B both write, but
> only B flushes, we can only guarantee that the data from A was flushed
> if the plugin advertises multi-conn; as otherwise, the write by
> connection A may still be in a cache that was not flushed by B's
> command). But when using cache=writeback, we are guaranteed that only
> one connection is ever writing back to the server at a time (that is,
> once we flush, we are flushing data from ALL connections), so we can
> always advertise multi-conn.
> ---
>
> This is fallout from IRC discussion we had earlier today about whether
> it was safe to enable multi-conn on the ssh plugin.
>
> After re-reading
> https://sourceforge.net/p/nbd/mailman/nbd-general/?viewmonth=201610&viewday=3,
> my take is that the original goal for MULTI_CONN is that when clear,
> if connection A and B both write, then both must FLUSH to guarantee
> that their writes reached persistent storage (if only one flushes, the
> other connection may still have cached data that remains unflushed).
> But when MULTI_CONN is set, only one of the two connections has to
> issue a flush after both connections have received replies to all
> writes that are intended to be made persistent. And the way we share
> our cache for cache=unsafe and cache=writeback meets that goal.
>
> filters/cache/cache.c | 56 +++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 54 insertions(+), 2 deletions(-)
>
> diff --git a/filters/cache/cache.c b/filters/cache/cache.c
> index b979f256..9e5a3e80 100644
> --- a/filters/cache/cache.c
> +++ b/filters/cache/cache.c
> @@ -257,6 +257,53 @@ cache_can_fast_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
> return 1;
> }
>
> +/* Override the plugin's .can_flush, if we are cache=unsafe */
> +static int
> +cache_can_flush (struct nbdkit_next_ops *next_ops, void *nxdata,
> + void *handle)
> +{
> + if (cache_mode == CACHE_MODE_UNSAFE)
> + return 1;
> + return next_ops->can_flush (nxdata);
> +}
> +
> +
> +/* Override the plugin's .can_fua, if we are cache=unsafe */
> +static int
> +cache_can_fua (struct nbdkit_next_ops *next_ops, void *nxdata,
> + void *handle)
> +{
> + if (cache_mode == CACHE_MODE_UNSAFE)
> + return NBDKIT_FUA_NATIVE;
> + return next_ops->can_fua (nxdata);
> +}
> +
> +/* Override the plugin's .can_multi_conn, if we are not cache=writethrough */
> +static int
> +cache_can_multi_conn (struct nbdkit_next_ops *next_ops, void *nxdata,
> + void *handle)
> +{
> + /* For CACHE_MODE_NONE, we always advertise a no-op flush because
> + * our local cache access is consistent between connections, and we
> + * don't care about persisting the data to the underlying plugin.
> + *
> + * For CACHE_MODE_WRITEBACK, things are more subtle: we only write
> + * to the plugin during NBD_CMD_FLUSH, at which point that one
> + * connection writes back ALL cached blocks regardless of which
> + * connection originally wrote them, so a client can be assured that
> + * blocks from all connections have reached the plugin's permanent
> + * storage with only one connection having to send a flush.
> + *
> + * But for CACHE_MODE_WRITETHROUGH, we are at the mercy of the
> + * plugin; data written by connection A is not guaranteed to be made
> + * persistent by a flush from connection B unless the plugin itself
> + * supports multi-conn.
> + */
> + if (cache_mode !== CACHE_MODE_WRITETHROUGH)
> + return 1;
> + return next_ops->can_multi_conn (nxdata);
> +}
> +
> /* Read data. */
> static int
> cache_pread (struct nbdkit_next_ops *next_ops, void *nxdata,
> @@ -352,7 +399,8 @@ cache_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata,
> }
>
> if ((flags & NBDKIT_FLAG_FUA) &&
> - next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE) {
> + (cache_mode == CACHE_MODE_UNSAFE ||
> + next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE)) {
> flags &= ~NBDKIT_FLAG_FUA;
> need_flush = true;
> }
> @@ -442,7 +490,8 @@ cache_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
>
> flags &= ~NBDKIT_FLAG_MAY_TRIM;
> if ((flags & NBDKIT_FLAG_FUA) &&
> - next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE) {
> + (cache_mode == CACHE_MODE_UNSAFE ||
> + next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE)) {
> flags &= ~NBDKIT_FLAG_FUA;
> need_flush = true;
> }
> @@ -639,6 +688,9 @@ static struct nbdkit_filter filter = {
> .get_size = cache_get_size,
> .can_cache = cache_can_cache,
> .can_fast_zero = cache_can_fast_zero,
> + .can_flush = cache_can_flush,
> + .can_fua = cache_can_fua,
> + .can_multi_conn = cache_can_multi_conn,
> .pread = cache_pread,
> .pwrite = cache_pwrite,
> .zero = cache_zero,
> --
> 2.30.1
ACK
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
More information about the Libguestfs
mailing list