[Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
Martin Kletzander
mkletzan at redhat.com
Tue Jun 11 14:31:03 UTC 2019
On Tue, Jun 11, 2019 at 08:49:37AM -0500, Eric Blake wrote:
>On 6/11/19 3:49 AM, Martin Kletzander wrote:
>> This filter caches the last result of the extents() call and offers a nice
>> speed-up for clients that only support req_one=1 in combination with plugins
>> like vddk, which has no overhead for returning information for multiple extents
>> in one call, but that call is very time-consuming.
>>
>> Quick test showed that on a fast connection and a sparsely allocated 16G disk
>> with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it
>> takes 5s to the first extents request). For 100G disk with no data on it, that
>> is one hole extent spanning the whole disk (where there is no space for
>> improvement) this does not add any noticeable overhead.
>>
>> Signed-off-by: Martin Kletzander <mkletzan at redhat.com>
>> ---
>>
>> Notes:
>> v2:
>>
>> - Clearing the REQ_ONE when forwarding the request to plugin
>>
>> - I hate the wording of the comment explaining it, please suggest a reword (or
>> just just fix it)
>
>I'll give it a shot.
>
>>
>> - Tests are now using custom shell plugin logging all accesses and also qemu-io
>> testing all other expected behaviour
>>
>
>> +++ b/filters/cacheextents/cacheextents.c
>
>> +
>> +/* Cached extents from the last extents () call and its start and end for the
>> + sake of simplicity. */
>
>The prevailing style seems to be the use of leading ' *' on subsequent
>comment lines, and */ on its own line.
>
>> +struct nbdkit_extents *cache_extents;
>> +static uint64_t cache_start;
>> +static uint64_t cache_end;
>> +
>> +static void
>> +cacheextents_unload (void)
>> +{
>> + nbdkit_extents_free (cache_extents);
>> +}
>> +
>> +static int
>> +cacheextents_add (struct nbdkit_extents *extents, int *err)
>> +{
>> + size_t i = 0;
>> +
>> + for (i = 0; i < nbdkit_extents_count (cache_extents); i++) {
>> + struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i);
>> + if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) == -1) {
>> + *err = errno;
>> + return -1;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +cacheextents_fill (struct nbdkit_extents *extents, int *err)
>> +{
>> + size_t i = 0;
>> + size_t count = nbdkit_extents_count (extents);
>> + struct nbdkit_extent first = nbdkit_get_extent (extents, 0);
>> + struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1);
>> +
>> + nbdkit_extents_free (cache_extents);
>> + cache_start = first.offset;
>> + cache_end = last.offset + last.length;
>> + cache_extents = nbdkit_extents_new (cache_start, cache_end);
>
>Needs a check for failure.
>
>Hmm - my earlier comments about appending to the cache are hard if we
>cap it to cache_end; I may still play with tweaking the cache algorithm
>to allow appending, but it would be a separate patch on top.
>
Well, it just means that if this is about to be rewritten, then it cannot use
the nbd_extents struct provided and instead an internal one (that would allow
growing in both directions, maybe also being non-contiguous) would be used here.
>> +
>> + for (i = 0; i < count; i++) {
>> + struct nbdkit_extent ex = nbdkit_get_extent (extents, i);
>> + nbdkit_debug ("cacheextents: updating cache with"
>> + ": offset=%" PRIu64
>> + ": length=%" PRIu64
>> + "; type=%x",
>
>Punctuation looks a bit funny; I'll probably normalize to
>
Really weird leftover, thanks for noticing
>updating cache with: offset=512 length=512 type=0
>
>to match...
>
>> + ex.offset, ex.length, ex.type);
>> + if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1) {
>> + *err = errno;
>> + return -1;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
>> + void *handle, uint32_t count, uint64_t offset, uint32_t flags,
>> + struct nbdkit_extents *extents,
>> + int *err)
>> +{
>> + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
>> +
>> + nbdkit_debug ("cacheextents:"
>> + " cache_start=%" PRIu64
>> + " cache_end=%" PRIu64
>> + " cache_extents=%p",
>
>...this style.
>
>> + cache_start, cache_end, cache_extents);
>> +
>> + if (cache_extents &&
>> + offset >= cache_start && offset < cache_end) {
>> + nbdkit_debug ("cacheextents: returning from cache");
>> + return cacheextents_add (extents, err);
>> + }
>> +
>> + nbdkit_debug ("cacheextents: cache miss");
>> + /* Clearing the REQ_ONE here in order to get (possibly) more data from the
>> + * plugin. This should not increase the cost as plugins can still return just
>> + * one extent if it's too costly to return more. */
>> + flags &= ~(NBDKIT_FLAG_REQ_ONE);
>> + if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1)
>> + return -1;
>> +
>> + return cacheextents_fill (extents, err);
>> +}
>> +
>> +/* Any changes to the data need to clean the cache.
>> + *
>> + * Similarly to readahead filter this could be more intelligent, but there would
>> + * be very little benefit.
>> + */
>> +
>> +static void
>> +kill_cacheextents (void)
>
>> +++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod
>> @@ -0,0 +1,47 @@
>> +=head1 NAME
>> +
>> +nbdkit-cacheextents-filter - cache extents
>> +
>> +=head1 SYNOPSIS
>> +
>> + nbdkit --filter=cacheextents plugin
>> +
>> +=head1 DESCRIPTION
>> +
>> +C<nbdkit-cacheextents-filter> is a filter that caches the result of last
>> +extents() call.
>> +
>> +A common use for this filter is to accelerate returning extents data for
>> +clients which ask for extents with the REQ_ONE flag set (like
>> +S<C<qemu-img convert>>) and is especially useful in combination with
>> +plugins that report multiple extents in one call, but with high latency
>> +for each of those calls (like L<nbdkit-vddk-plugin(1)>). For example:
>
>You asked for a re-word; how about this (with proper markup):
>
>A common use for this filter is to improve performance when using a
>client performing a linear pass over the entire image while asking for
>only one extent at a time (such as qemu-img convert), but where the
>plugin can provide multiple extents for the same high latency as a
>single extent (such as nbdkit-vddk-plugin).
>
Yes, this reads better (although the one I really wanted to reword was the comment in cacheextents_extents():
+ /* Clearing the REQ_ONE here in order to get (possibly) more data from the
+ * plugin. This should not increase the cost as plugins can still return just
+ * one extent if it's too costly to return more. */
+ flags &= ~(NBDKIT_FLAG_REQ_ONE);
...
>> +
>> + nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ...
>> +
>> +For files with big extents (when it is unlikely for one extents() call
>> +to return multiple different extents) this does not slow down the
>> +access.
>
>I'd also like to see a mention of nbdkit-cache-filter, probably worded as:
>
>Note that this caches only plugin metadata; this filter can be combined
>with L<nbdkit-cache-filter(1)> to also cache data.
>
>and a parallel sentence added to nbdkit-cache-filter.pod.
>
and nbdkit-filters, yes, I forgot to do this even though you already pointed
that out in the previous version. I just got to this after too long.
>
>> diff --git a/tests/test-cacheextents.sh b/tests/test-cacheextents.sh
>> new file mode 100755
>> index 000000000000..5b0667ad2b95
>> --- /dev/null
>> +++ b/tests/test-cacheextents.sh
>> @@ -0,0 +1,110 @@
>
>> +requires grep --version
>> +requires qemu-img --version
>> +requires qemu-io --version
>> +
>> +sock="$(mktemp -u)"
>> +sockurl="nbd+unix:///?socket=$sock"
>> +pidfile="test-cacheextents.pid"
>> +accessfile="test-cacheextents-access.log"
>> +accessfile_full="$(pwd)/test-cacheextents-access.log"
>
>$PWD saves a process compared to $(pwd)
>
>> +files="$pidfile $sock"
>> +rm -f $files $accessfile
>> +cleanup_fn rm -f $files
>> +
>> +# Intentionally using EOF rather than 'EOF' so that we can pass in the
>> +# $accessfile_full
>> +start_nbdkit \
>> + -P $pidfile \
>> + -U $sock \
>> + --filter=cacheextents \
>> + sh - <<EOF
>> +echo "Call: \$@" >>$accessfile_full
>> +size=4M
>> +block_size=\$((1024*1024))
>> +case "\$1" in
>> + get_size) echo \$size ;;
>> + can_extents) ;;
>> + extents)
>> + echo "extents request: \$@" >>$accessfile_full
>> + offset=\$((\$4/\$block_size))
>> + count=\$((\$3/\$block_size))
>> + length=\$((\$offset+\$count))
>
>Adding spaces around the operators would make this more legible.
>
>> + for i in \$(seq \$offset \$length); do
>> + echo \${i}M \$block_size \$((i%4)) >>$accessfile_full
>> + echo \${i}M \$block_size \$((i%4))
>> + done
>> + ;;
>> + pread) dd if=/dev/zero count=\$3 iflag=count_bytes ;;
>> + can_write) ;;
>> + pwrite) dd of=/dev/null ;;
>> + can_trim) ;;
>> + trim) ;;
>> + can_zero) ;;
>> + trim) ;;
>
>zero)
>
>> + *) exit 2 ;;
>> +esac
>> +EOF
>> +
>> +
>> +test_me() {
>> + num_accesses=$1
>> + shift
>> +
>> + qemu-io -f raw "$@" "$sockurl"
>> + test "$(grep -c "^extents request: " $accessfile)" -eq "$num_accesses"
>> + ret=$?
>> + rm -f "$accessfile"
>> + return $ret
>> +}
>> +
>> +# First one causes caching, the rest should be returned from cache.
>> +test_me 1 -c 'map' -c 'map' -c 'map'
>> +# First one is still cached from last time, discard should kill the cache, then
>> +# one request should go through.
>> +test_me 1 -c 'map' -c 'discard 0 1' -c 'map'
>> +# Same as above, only this time the cache is killed before all the operations as
>> +# well. This is used from now on to clear the cache as it seems nicer and
>> +# faster than running new nbdkit for each test.
>> +test_me 2 -c 'discard 0 1' -c 'map' -c 'discard 0 1' -c 'map'
>> +# Write should kill the cache as well.
>> +test_me 2 -c 'discard 0 1' -c 'map' -c 'write 0 1' -c 'map'
>> +# Alloc should use cached data from map
>> +test_me 1 -c 'discard 0 1' -c 'map' -c 'alloc 0'
>> +# Read should not kill the cache
>> +test_me 1 -c 'discard 0 1' -c 'map' -c 'read 0 1' -c 'map' -c 'alloc 0'
>>
>
>Looks good. Unless you have objections, I'll go ahead and make the
>tweaks mentioned, and push this.
>
Yes, all of them make sense, thank you very much.
>--
>Eric Blake, Principal Software Engineer
>Red Hat, Inc. +1-919-301-3226
>Virtualization: qemu.org | libvirt.org
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190611/2e7de79a/attachment.sig>
More information about the Libguestfs
mailing list