[Libguestfs] [PATCH libnbd v3 3/3] copy/copy-nbd-error.sh: Make this test non-stochastic

Wed Jun 29 16:01:54 UTC 2022

On 06/29/22 15:36, Richard W.M. Jones wrote:
> Because the test previously used error rates of 50%, it could
> sometimes "fail to fail".  This is noticable if you run the test
> repeatedly:
> 
> $ while make -C copy check TESTS=copy-nbd-error.sh >& /tmp/log ; do echo -n . ; done
> 
> This now happens more often because of the larger requests made by the
> new multi-threaded loop, resulting in fewer calls to the error filter,
> so a greater chance that a series of 50% coin tosses will come up all
> heads in the test.
> 
> Fix this by making the test non-stocastic.
> 
> Fixes: commit 8d444b41d09a700c7ee6f9182a649f3f2d325abb
> ---
>  copy/copy-nbd-error.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/copy/copy-nbd-error.sh b/copy/copy-nbd-error.sh
> index 0088807f54..01524a890c 100755
> --- a/copy/copy-nbd-error.sh
> +++ b/copy/copy-nbd-error.sh
> @@ -40,7 +40,7 @@ $VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
>  # Failure to read should be fatal
>  echo "Testing read failures on non-sparse source"
>  $VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
> -    error-pread-rate=0.5 ] null: && fail=1
> +    error-pread-rate=1 ] null: && fail=1
>  
>  # However, reliable block status on a sparse image can avoid the need to read
>  echo "Testing read failures on sparse source"
> @@ -51,7 +51,7 @@ $VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \
>  echo "Testing write data failures on arbitrary destination"
>  $VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \
>      [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \
> -        memory 5M error-pwrite-rate=0.5 ] && fail=1
> +        memory 5M error-pwrite-rate=1 ] && fail=1
>  
>  # However, writing zeroes can bypass the need for normal writes
>  echo "Testing write data failures from sparse source"
> 

Wasn't the original intent of the 50% error rate that the first error
manifest usually at a different offset every time? If we change the
error rate to 1, the tests will fail upon the first access, which kind
of breaks the original intent.

I wonder if we could determine a random offset in advance, and make sure
that the read or write access fails 100%, but only if the request covers
that offset.

...

The probability that n subsequent accesses *don't* fail is
(1-error_rate)^n. (The probability that at least one access fails is
1-(1-error_rate)^n.)

And n is given by (I think?) image_size/request_size. So, if we change
the request_size, we can recalculate "n", for the test not to fail with
the same probability as before.

  (1-err1)^(imgsz/rsz1) = (1-err2)^(imgsz/rsz2)

draw the imgsz'th root of both sides

  (1-err1)^(1/rsz1) = (1-err2)^(1/rsz2)

raise both sides to the rsz2'nd power

  (1-err1)^(rsz2/rsz1) = 1-err2

  err2 = 1 - (1-err1)^(rsz2/rsz1)

I know that err1=0.5, but don't know rsz2 and rsz1 (the request sizes
after, and before, the last patch in the series). Assuming (just
guessing!) we increased the request size 8-fold, we'd have to go from
error rate 0.5 to:

  err2 = 1 - (1-0.5)^8
       = 1 - (1/2)^8
       = 1 - (1 / 256)
       = 255/256
       = 0.99609375

We basically group every eight coin tosses into one super-toss, and want
the latter to show "failure" with the same probability as *at least one*
of the original 8 tosses failing.

Laszlo