[Libguestfs] [PATCH nbdkit 5/5] New filter: evil

Tue May 16 16:24:58 UTC 2023

On Tue, May 16, 2023 at 10:55:25AM -0500, Eric Blake wrote:
> > +static int
> > +evil_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
> > +             const char *key, const char *value)
> > +{
> > +  if (strcmp (key, "evil") == 0 || strcmp (key, "evil-mode") == 0) {
> > +    if (strcmp (value, "cosmic-rays") == 0 ||
> > +        strcmp (value, "cosmic") == 0) {
> 
> Do we want strcasecmp on these?  Do we want to allow _ as a synonym to -?

We're a bit random about this.  There are a few places where we use
ascii_strcasecmp, but mostly we use strcmp ...

[...]
> > +  /* Choose the block size based on the probability, so that at least
> > +   * 100 bits are expected to be corrupted in the block.  Block size
> > +   * must be a power of 2.
> > +   */
> > +  block_size = next_power_of_2 ((uint64_t) (100. / evil_probability));
> 
> ...and I'm worried that your block_size computation reaches a point
> where it does not do what we want if the probably gets too close to
> zero.  Even if it is not division by zero, division by a subnormal
> float can easily produce overflow to infinity, which converts to
> UINT64_MAX, and next_power_of_2(UINT64_MAX) was untested in patch 3's
> unit tests, but appears like it will be '1 << (64 - 64)' == 1, which
> isn't really desirable.

Yes this is definitely a bug.  I meant to avoid divide by zero, and
did in one place, but not everywhere.

I think it's worthwhile allowing the probability to be zero, not least
because it avoids a special case for the user.  However I will need to
add a test to make sure I've avoided all the problems.

Also as you say very small but non-zero probabilities can make
block_size blow up.

Let me fix all of that & add tests.

> > +
> > +  uint64_t offs, intvl, i, rand;
> > +  const uint64_t dinvp = (uint64_t) (2.0 * (1.0 / evil_probability));
> 
> [1] Ah, I see - you did mean multiplication, not exponentiation.  Like
> you commented elsewhere, it is not quite a true uniform distribution,
> but certainly seems sane enough for the intent.

I will clarify the comment.

> > +++ b/tests/test-evil-cosmic.sh
> 
> > +
> > +f="test-evil-cosmic.out"
> > +rm -f $f
> > +cleanup_fn rm -f $f
> > +
> > +# 80 million zero bits in the backing disk, and the filter will
> > +# randomly flip (ie. set high) 1 in 800,000 bits, or about 100.
> > +
> > +# XXX Actually the number of set bits clusters around 80.  There could
> > +# be a mistake in my calculations or the interval algorithm we use
> > +# might be biased.
> 
> I have not closely inspected the math yet to see if there's an obvious
> bias (treat this as a first-pass quick review looking for obvious
> code/shell bugs, as opposed to more insidious deep analysis bugs).
> But that comment does mean we probably ought to double-check things
> prior to v2.

Interestingly the other tests are nicely distributed around the
expected values, it's just this one test where things are out of
whack.  I'll add a bit of debugging to the interval calculation etc to
see if I can see what's going on.

> > +export f
> > +nbdkit -U - null 10000000 \
> > +       --filter=evil --filter=noextents \
> > +       evil=cosmic-rays evil-probability=1/800000 \
> > +       --run 'nbdcopy "$uri" $f'
> > +
> > +# This will give an approximate count of the number of set bits.
> > +
> > +zbytes="$( od -A n -w1 -v -t x1 < $f | sort | uniq -c |
> > +           $SED -n -E -e 's/([0-9]+)[[:space:]]+00[[:space:]]*$/\1/p' )"
> > +nzbits=$(( 10000000 - zbytes )); # looks wrong but actually correct ...
> 
> Interesting computation!

I can probably change this to use Python code.  Because I wanted to
use nbdcopy (to allow the filter to be hit by multiple threads), I
wasn't using nbdsh, so Python wasn't immediately available ...

> > +# Expect about 50 stuck-high bits.
> > +buf = h.pread(10000000, 0)
> > +bits = count_bits(buf)
> > +print("stuck high bits: %d (expected 50)" % bits)
> > +assert(bits > 20 and bits < 80)
> > +
> > +# If we read subsets they should match the contents of the buffer.
> > +buf1 = h.pread(1000, 1000)
> > +assert(buf1 == buf[1000:2000])
> > +
> > +buf1 = h.pread(10000, 999)
> > +assert(buf1 == buf[999:10999])
> 
> Should we also assert that there is at least one stuck bit in those
> two ranges, and/or pick a different or larger range to improve the
> chances of that being the case?

So I think you mean pick a range where we know there is a stuck bit
before doing the read + assert.  I will see.  With a 1:800,000 chance,
it's not likely the ranges as written would have stuck bits so we're
comparing zeroes ...

> > +++ b/tests/test-evil-stuck-wires.sh
> 
> > +
> > +# Run nbdkit with the evil filter.
> > +start_nbdkit -P $pidfile -U $sock \
> > +             --filter=evil --filter=noextents \
> > +             null 1G evil=stuck-wires evil-probability=1/10000
> > +
> > +# Reads from the filter should have 1:10,000 bits stuck high or low.
> > +# However we don't see stuck low bits are we are always reading
> 
> s/are we are/since we are/

Will fix.

> Overall looks like an interesting idea for a filter.

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit