[Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter

Martin Kletzander mkletzan at redhat.com
Tue Jun 11 14:31:03 UTC 2019


On Tue, Jun 11, 2019 at 08:49:37AM -0500, Eric Blake wrote:
>On 6/11/19 3:49 AM, Martin Kletzander wrote:
>> This filter caches the last result of the extents() call and offers a nice
>> speed-up for clients that only support req_one=1 in combination with plugins
>> like vddk, which has no overhead for returning information for multiple extents
>> in one call, but that call is very time-consuming.
>>
>> Quick test showed that on a fast connection and a sparsely allocated 16G disk
>> with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it
>> takes 5s to the first extents request).  For 100G disk with no data on it, that
>> is one hole extent spanning the whole disk (where there is no space for
>> improvement) this does not add any noticeable overhead.
>>
>> Signed-off-by: Martin Kletzander <mkletzan at redhat.com>
>> ---
>>
>> Notes:
>>     v2:
>>
>>      - Clearing the REQ_ONE when forwarding the request to plugin
>>
>>      - I hate the wording of the comment explaining it, please suggest a reword (or
>>        just just fix it)
>
>I'll give it a shot.
>
>>
>>      - Tests are now using custom shell plugin logging all accesses and also qemu-io
>>        testing all other expected behaviour
>>
>
>> +++ b/filters/cacheextents/cacheextents.c
>
>> +
>> +/* Cached extents from the last extents () call and its start and end for the
>> +   sake of simplicity. */
>
>The prevailing style seems to be the use of leading ' *' on subsequent
>comment lines, and */ on its own line.
>
>> +struct nbdkit_extents *cache_extents;
>> +static uint64_t cache_start;
>> +static uint64_t cache_end;
>> +
>> +static void
>> +cacheextents_unload (void)
>> +{
>> +  nbdkit_extents_free (cache_extents);
>> +}
>> +
>> +static int
>> +cacheextents_add (struct nbdkit_extents *extents, int *err)
>> +{
>> +  size_t i = 0;
>> +
>> +  for (i = 0; i < nbdkit_extents_count (cache_extents); i++) {
>> +    struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i);
>> +    if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) == -1) {
>> +      *err = errno;
>> +      return -1;
>> +    }
>> +  }
>> +
>> +  return 0;
>> +}
>> +
>> +static int
>> +cacheextents_fill (struct nbdkit_extents *extents, int *err)
>> +{
>> +  size_t i = 0;
>> +  size_t count = nbdkit_extents_count (extents);
>> +  struct nbdkit_extent first = nbdkit_get_extent (extents, 0);
>> +  struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1);
>> +
>> +  nbdkit_extents_free (cache_extents);
>> +  cache_start = first.offset;
>> +  cache_end = last.offset + last.length;
>> +  cache_extents = nbdkit_extents_new (cache_start, cache_end);
>
>Needs a check for failure.
>
>Hmm - my earlier comments about appending to the cache are hard if we
>cap it to cache_end; I may still play with tweaking the cache algorithm
>to allow appending, but it would be a separate patch on top.
>

Well, it just means that if this is about to be rewritten, then it cannot use
the nbd_extents struct provided and instead an internal one (that would allow
growing in both directions, maybe also being non-contiguous) would be used here.

>> +
>> +  for (i = 0; i < count; i++) {
>> +    struct nbdkit_extent ex = nbdkit_get_extent (extents, i);
>> +    nbdkit_debug ("cacheextents: updating cache with"
>> +                  ": offset=%" PRIu64
>> +                  ": length=%" PRIu64
>> +                  "; type=%x",
>
>Punctuation looks a bit funny; I'll probably normalize to
>

Really weird leftover, thanks for noticing

>updating cache with: offset=512 length=512 type=0
>
>to match...
>
>> +                  ex.offset, ex.length, ex.type);
>> +    if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1) {
>> +      *err = errno;
>> +      return -1;
>> +    }
>> +  }
>> +
>> +  return 0;
>> +}
>> +
>> +static int
>> +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
>> +                      void *handle, uint32_t count, uint64_t offset, uint32_t flags,
>> +                      struct nbdkit_extents *extents,
>> +                      int *err)
>> +{
>> +  ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
>> +
>> +  nbdkit_debug ("cacheextents:"
>> +                " cache_start=%" PRIu64
>> +                " cache_end=%" PRIu64
>> +                " cache_extents=%p",
>
>...this style.
>
>> +                cache_start, cache_end, cache_extents);
>> +
>> +  if (cache_extents &&
>> +      offset >= cache_start && offset < cache_end) {
>> +    nbdkit_debug ("cacheextents: returning from cache");
>> +    return cacheextents_add (extents, err);
>> +  }
>> +
>> +  nbdkit_debug ("cacheextents: cache miss");
>> +  /* Clearing the REQ_ONE here in order to get (possibly) more data from the
>> +   * plugin.  This should not increase the cost as plugins can still return just
>> +   * one extent if it's too costly to return more. */
>> +  flags &= ~(NBDKIT_FLAG_REQ_ONE);
>> +  if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1)
>> +    return -1;
>> +
>> +  return cacheextents_fill (extents, err);
>> +}
>> +
>> +/* Any changes to the data need to clean the cache.
>> + *
>> + * Similarly to readahead filter this could be more intelligent, but there would
>> + * be very little benefit.
>> + */
>> +
>> +static void
>> +kill_cacheextents (void)
>
>> +++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod
>> @@ -0,0 +1,47 @@
>> +=head1 NAME
>> +
>> +nbdkit-cacheextents-filter - cache extents
>> +
>> +=head1 SYNOPSIS
>> +
>> + nbdkit --filter=cacheextents plugin
>> +
>> +=head1 DESCRIPTION
>> +
>> +C<nbdkit-cacheextents-filter> is a filter that caches the result of last
>> +extents() call.
>> +
>> +A common use for this filter is to accelerate returning extents data for
>> +clients which ask for extents with the REQ_ONE flag set (like
>> +S<C<qemu-img convert>>) and is especially useful in combination with
>> +plugins that report multiple extents in one call, but with high latency
>> +for each of those calls (like L<nbdkit-vddk-plugin(1)>).  For example:
>
>You asked for a re-word; how about this (with proper markup):
>
>A common use for this filter is to improve performance when using a
>client performing a linear pass over the entire image while asking for
>only one extent at a time (such as qemu-img convert), but where the
>plugin can provide multiple extents for the same high latency as a
>single extent (such as nbdkit-vddk-plugin).
>

Yes, this reads better (although the one I really wanted to reword was the comment in cacheextents_extents():

+  /* Clearing the REQ_ONE here in order to get (possibly) more data from the
+   * plugin.  This should not increase the cost as plugins can still return just
+   * one extent if it's too costly to return more. */
+  flags &= ~(NBDKIT_FLAG_REQ_ONE);

...

>> +
>> + nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ...
>> +
>> +For files with big extents (when it is unlikely for one extents() call
>> +to return multiple different extents) this does not slow down the
>> +access.
>
>I'd also like to see a mention of nbdkit-cache-filter, probably worded as:
>
>Note that this caches only plugin metadata; this filter can be combined
>with L<nbdkit-cache-filter(1)> to also cache data.
>
>and a parallel sentence added to nbdkit-cache-filter.pod.
>

and nbdkit-filters, yes, I forgot to do this even though you already pointed
that out in the previous version.  I just got to this after too long.

>
>> diff --git a/tests/test-cacheextents.sh b/tests/test-cacheextents.sh
>> new file mode 100755
>> index 000000000000..5b0667ad2b95
>> --- /dev/null
>> +++ b/tests/test-cacheextents.sh
>> @@ -0,0 +1,110 @@
>
>> +requires grep --version
>> +requires qemu-img --version
>> +requires qemu-io --version
>> +
>> +sock="$(mktemp -u)"
>> +sockurl="nbd+unix:///?socket=$sock"
>> +pidfile="test-cacheextents.pid"
>> +accessfile="test-cacheextents-access.log"
>> +accessfile_full="$(pwd)/test-cacheextents-access.log"
>
>$PWD saves a process compared to $(pwd)
>
>> +files="$pidfile $sock"
>> +rm -f $files $accessfile
>> +cleanup_fn rm -f $files
>> +
>> +# Intentionally using EOF rather than 'EOF' so that we can pass in the
>> +# $accessfile_full
>> +start_nbdkit \
>> +    -P $pidfile \
>> +    -U $sock \
>> +    --filter=cacheextents \
>> +    sh - <<EOF
>> +echo "Call: \$@" >>$accessfile_full
>> +size=4M
>> +block_size=\$((1024*1024))
>> +case "\$1" in
>> +  get_size) echo \$size ;;
>> +  can_extents) ;;
>> +  extents)
>> +    echo "extents request: \$@" >>$accessfile_full
>> +    offset=\$((\$4/\$block_size))
>> +    count=\$((\$3/\$block_size))
>> +    length=\$((\$offset+\$count))
>
>Adding spaces around the operators would make this more legible.
>
>> +    for i in \$(seq \$offset \$length); do
>> +      echo \${i}M \$block_size \$((i%4)) >>$accessfile_full
>> +      echo \${i}M \$block_size \$((i%4))
>> +    done
>> +    ;;
>> +  pread) dd if=/dev/zero count=\$3 iflag=count_bytes ;;
>> +  can_write) ;;
>> +  pwrite) dd of=/dev/null ;;
>> +  can_trim) ;;
>> +  trim) ;;
>> +  can_zero) ;;
>> +  trim) ;;
>
>zero)
>
>> +  *) exit 2 ;;
>> +esac
>> +EOF
>> +
>> +
>> +test_me() {
>> +    num_accesses=$1
>> +    shift
>> +
>> +    qemu-io -f raw "$@" "$sockurl"
>> +    test "$(grep -c "^extents request: " $accessfile)" -eq "$num_accesses"
>> +    ret=$?
>> +    rm -f "$accessfile"
>> +    return $ret
>> +}
>> +
>> +# First one causes caching, the rest should be returned from cache.
>> +test_me 1 -c 'map' -c 'map' -c 'map'
>> +# First one is still cached from last time, discard should kill the cache, then
>> +# one request should go through.
>> +test_me 1 -c 'map' -c 'discard 0 1' -c 'map'
>> +# Same as above, only this time the cache is killed before all the operations as
>> +# well.  This is used from now on to clear the cache as it seems nicer and
>> +# faster than running new nbdkit for each test.
>> +test_me 2 -c 'discard 0 1' -c 'map' -c 'discard 0 1' -c 'map'
>> +# Write should kill the cache as well.
>> +test_me 2 -c 'discard 0 1' -c 'map' -c 'write 0 1' -c 'map'
>> +# Alloc should use cached data from map
>> +test_me 1 -c 'discard 0 1' -c 'map' -c 'alloc 0'
>> +# Read should not kill the cache
>> +test_me 1 -c 'discard 0 1' -c 'map' -c 'read 0 1' -c 'map' -c 'alloc 0'
>>
>
>Looks good. Unless you have objections, I'll go ahead and make the
>tweaks mentioned, and push this.
>

Yes, all of them make sense, thank you very much.

>--
>Eric Blake, Principal Software Engineer
>Red Hat, Inc.           +1-919-301-3226
>Virtualization:  qemu.org | libvirt.org
>



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190611/2e7de79a/attachment.sig>


More information about the Libguestfs mailing list