[Libguestfs] [PATCH libnbd v3 3/3] copy/copy-nbd-error.sh: Make this test non-stochastic

Wed Jun 29 17:21:08 UTC 2022

On Wed, Jun 29, 2022 at 06:01:54PM +0200, Laszlo Ersek wrote:
> On 06/29/22 15:36, Richard W.M. Jones wrote:
> > Because the test previously used error rates of 50%, it could
> > sometimes "fail to fail".  This is noticable if you run the test
> > repeatedly:
> > 
> > $ while make -C copy check TESTS=copy-nbd-error.sh >& /tmp/log ; do echo -n . ; done
> > 
> > This now happens more often because of the larger requests made by the
> > new multi-threaded loop, resulting in fewer calls to the error filter,
> > so a greater chance that a series of 50% coin tosses will come up all
> > heads in the test.
> > 
> > Fix this by making the test non-stocastic.
> > 
> > Fixes: commit 8d444b41d09a700c7ee6f9182a649f3f2d325abb
> > ---
> >  copy/copy-nbd-error.sh | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/copy/copy-nbd-error.sh b/copy/copy-nbd-error.sh
> > index 0088807f54..01524a890c 100755
> > --- a/copy/copy-nbd-error.sh
> > +++ b/copy/copy-nbd-error.sh
> > @@ -40,7 +40,7 @@ $VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
> >  # Failure to read should be fatal
> >  echo "Testing read failures on non-sparse source"
> >  $VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
> > -    error-pread-rate=0.5 ] null: && fail=1
> > +    error-pread-rate=1 ] null: && fail=1
> >  
> >  # However, reliable block status on a sparse image can avoid the need to read
> >  echo "Testing read failures on sparse source"
> > @@ -51,7 +51,7 @@ $VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \
> >  echo "Testing write data failures on arbitrary destination"
> >  $VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \
> >      [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \
> > -        memory 5M error-pwrite-rate=0.5 ] && fail=1
> > +        memory 5M error-pwrite-rate=1 ] && fail=1
> >  
> >  # However, writing zeroes can bypass the need for normal writes
> >  echo "Testing write data failures from sparse source"
> > 
> 
> Wasn't the original intent of the 50% error rate that the first error
> manifest usually at a different offset every time? If we change the
> error rate to 1, the tests will fail upon the first access, which kind
> of breaks the original intent.
> 
> I wonder if we could determine a random offset in advance, and make sure
> that the read or write access fails 100%, but only if the request covers
> that offset.

It wasn't very clear why the error rate was originally set to 50%, but
it could be as you say.  It is possible to inject an error at a
specific offset (though not with the error filter):

----------------------------------------------------------------------
#!/bin/bash

case "$1" in
    get_size) echo 64M ;;
    pread)
        if [ $4 -le 100000 ] && [ $(( $4+$3 )) -gt 100000 ]; then
            echo EIO Bad block >&2
            exit 1
        else
            dd if=/dev/zero count=$3 iflag=count_bytes
        fi ;;
    *) exit 2 ;;
esac
----------------------------------------------------------------------

$ nbdkit sh read-error.sh

- - -

Anyway, looking closer at the bug, I now understand why it happens
much more frequently after applying this patch series.

It's because the test uses the null: destination
(libnbd.git/copy/null-ops.c).  This advertises a preferred block size
of 4M, because I reasoned that we should prefer large blocks for this
target (since it throws away the data).

This causes request-size to be increased to 4M, so we're copying 4M
blocks (versus 256K before).

With 5M total disk size, we're reading 4M + 1M in two requests, so in
1 in 4 cases we're going to get two heads in a row, and the test will
fail to fail.

Although there is still definitely a bug in this test, I think I will
adjust the null: target so it sets a small preferred block size,
eg. 4K, so that copying to null: doesn't have this unexpected effect
on request size.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org