[Libguestfs] [PATCH nbdkit v2 1/2] New filter: nbdkit-retry-request-filter

Fri Oct 15 15:12:26 UTC 2021

On Fri, Oct 15, 2021 at 03:17:16PM +0100, Richard W.M. Jones wrote:
> Similar to nbdkit-retry-filter, but this only retries single failing
> requests rather than reopening the whole plugin.  This is intended to
> be used with the curl filter to solve:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=2013000
> ---

> +++ b/filters/retry-request/nbdkit-retry-request-filter.pod
> @@ -0,0 +1,74 @@
> +=head1 NAME
> +
> +nbdkit-retry-request-filter - retry single requests on error
> +
> +=head1 SYNOPSIS
> +
> + nbdkit --filter=retry-request PLUGIN
> +                   [retry-request-retries=N] [retry-request-delay=N]
> +                   [retry-request-open=false]
> +
> +=head1 DESCRIPTION
> +
> +C<nbdkit-retry-request-filter> is a filter for nbdkit that
> +transparently retries single requests if they fail.  This is useful
> +for plugins that are not completely reliable or have random behaviour.
> +For example L<nbdkit-curl-plugin(1)> might behave this way if pointed
> +at a load balancer which sometimes redirects to web server that is not

to a web

> +responsive.
> +
> +An alternative filter with different trade-offs is
> +L<nbdkit-retry-filter(1)>.  That filter is more heavyweight because it
> +always reopens the whole plugin connection on failure.
> +
> +=head1 PARAMETERS
> +
> +=over 4
> +
> +=item B<retry-request-retries=>N
> +
> +The number of times any single request will be retried before we give
> +up and fail the operation.  The default is 2.
> +
> +=item B<retry-request-delay=>N
> +
> +The number of seconds to wait before retrying.  The default is 2
> +seconds.
> +
> +=item B<retry-request-open=false>
> +
> +If set to false, do not retry opening the plugin.  The default is to
> +treat plugin open in the same way as other requests.
> +
> +++ b/filters/retry-request/retry-request.c
> @@ -0,0 +1,278 @@

> +
> +/* These macros encapsulate the logic of retrying.  The code between
> + * RETRY_START...RETRY_END must set r to 0 or -1 on success or
> + * failure.
> + */
> +#define RETRY_START                                                     \
> +  {                                                                     \
> +    unsigned i;                                                         \
> +                                                                        \
> +    r = -1;                                                             \
> +    for (i = 0; r == -1 && i <= retries; ++i) {                         \
> +      if (i > 0) {                                                      \
> +        nbdkit_debug ("retry %u: waiting %u seconds before retrying",   \
> +                      i, delay);                                        \
> +        if (nbdkit_nanosleep (delay, 0) == -1) {                        \
> +          if (*err == 0)                                                \
> +            *err = errno;                                               \

You documented 'r', but not 'err' needing to be in scope as well.

> +          break;                                                        \
> +        }                                                               \
> +      }                                                                 \
> +      do
> +#define RETRY_END                                                       \
> +      while (0);                                                        \
> +    }                                                                   \
> +  }
> +
> +static void *
> +retry_request_open (nbdkit_next_open *next, nbdkit_context *nxdata,
> +                    int readonly, const char *exportname, int is_tls)
> +{
> +  int r;
> +
> +  if (retry_open_call) {
> +    int *err = &errno;          /* used by the RETRY* macros */
> +
> +    RETRY_START
> +      r = next (nxdata, readonly, exportname);
> +    RETRY_END;

Most other calls into the plugin are obviously safe, but there are
lots of asserts around calls into nbdkit_next_open obeying
expectations - if you didn't trip any when .open happened to hit the
bad mirror out of the round robin, then I think we're okay here on
life-cycle.

> +  }
> +  else {
> +    r = next (nxdata, readonly, exportname);
> +  }
> +
> +  return r == 0 ? NBDKIT_HANDLE_NOT_NEEDED : NULL;
> +}
> +
> +static int
> +retry_request_pread (nbdkit_next *next,
> +                     void *handle, void *buf, uint32_t count, uint64_t offset,
> +                     uint32_t flags, int *err)
> +{
> +  int r;
> +
> +  RETRY_START
> +    r = next->pread (next, buf, count, offset, flags, err);
> +  RETRY_END;

The simple case is still legible,...

> +static int
> +retry_request_extents (nbdkit_next *next,
> +                       void *handle,
> +                       uint32_t count, uint64_t offset, uint32_t flags,
> +                       struct nbdkit_extents *extents, int *err)
> +{
> +  CLEANUP_EXTENTS_FREE struct nbdkit_extents *extents2 = NULL;
> +  int r;
> +
> +  RETRY_START {
> +    /* Each retry must begin with extents reset to the right beginning. */
> +    nbdkit_extents_free (extents2);
> +    extents2 = nbdkit_extents_new (offset, next->get_size (next));
> +    if (extents2 == NULL) {
> +      *err = errno;
> +      return -1; /* Not worth a retry after ENOMEM. */
> +    }
> +    r = next->extents (next, count, offset, flags, extents2, err);
> +  } RETRY_END;

...and the complex case seems much nicer. Yeah, I like this layout better.

> +
> +  if (r == 0) {
> +    size_t i;
> +
> +    /* Transfer the successful extents back to the caller. */
> +    for (i = 0; i < nbdkit_extents_count (extents2); ++i) {
> +      struct nbdkit_extent e = nbdkit_get_extent (extents2, i);
> +
> +      if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) {
> +        *err = errno;
> +        return -1;

This is another spot where failure is not due to a flaky plugin, but
to internal errors (probably ENOMEM); I agree with your analysis that
retrying the transfer is not worthwhile.

> +++ b/tests/test-retry-request.sh
> @@ -0,0 +1,94 @@
> +#!/usr/bin/env bash

> +
> +requires_plugin sh
> +requires nbdcopy --version
> +requires dd iflag=count_bytes </dev/null
> +
> +files="retry-request.img retry-request-count"
> +rm -f $files
> +cleanup_fn rm -f $files
> +
> +touch retry-request-count
> +start_t=$SECONDS
> +
> +# Create a custom plugin which will test retrying requests.
> +nbdkit -v -U - \
> +       sh - \
> +       --filter=retry-request retry-request-retries=3 retry-request-delay=1 \
> +       --run 'nbdcopy --synchronous "$uri" retry-request.img' <<'EOF'
> +#!/usr/bin/env bash
> +case "$1" in
> +    get_size) echo 512 ;;
> +    pread)
> +        # Fail 3 times then succeed.
> +        read i < retry-request-count
> +        ((i++))
> +        echo $i > retry-request-count
> +        if [ $i -le 3 ]; then
> +            echo "EIO pread failed" >&2
> +            exit 1
> +        else
> +            dd if=/dev/zero count=$3 iflag=count_bytes
> +        fi
> +        ;;
> +
> +    *) exit 2 ;;
> +esac
> +EOF
> +
> +# In this test we should see 3 failures:
> +# pread FAILS
> +# retry and wait 1 second
> +# pread FAILS
> +# retry and wait 1 second
> +# pread FAILS
> +# retry and wait 1 second
> +# pread succeeds
> +
> +# The minimum time for the test should be 3 seconds.
> +end_t=$SECONDS
> +if [ $((end_t - start_t)) -lt 3 ]; then
> +    echo "$0: test ran too quickly"
> +    exit 1
> +fi
> +
> +# Check retry-request-count = 4 (because we write #retries+1 to the file)
> +read retry_count < retry-request-count
> +if [ $retry_count -ne 4 ]; then
> +    echo "$0: retry-request-count ($retry_count) != 4"
> +    exit 1
> +fi

Looks okay. But we should also test a retry of .open failing, since
that one goes through enough difference in code path to be worth
checking closely.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org