[Libguestfs] [PATCH nbdkit 5/5] New filter: evil

Tue May 16 15:55:25 UTC 2023

On Tue, May 16, 2023 at 01:12:19PM +0100, Richard W.M. Jones wrote:
> 
> This filter adds random data corruption when reading from the
> underlying plugin.
> ---
>  filters/error/nbdkit-error-filter.pod |   4 +
>  filters/evil/nbdkit-evil-filter.pod   | 159 ++++++++++++
>  configure.ac                          |   2 +
>  filters/evil/Makefile.am              |  77 ++++++
>  tests/Makefile.am                     |  14 +
>  filters/evil/evil.c                   | 353 ++++++++++++++++++++++++++
>  tests/test-evil-cosmic.sh             |  76 ++++++
>  tests/test-evil-stuck-high-bits.sh    |  86 +++++++
>  tests/test-evil-stuck-low-bits.sh     |  79 ++++++
>  tests/test-evil-stuck-wires.sh        |  85 +++++++
>  10 files changed, 935 insertions(+)

Interesting idea.  Might also be interesting to see how this filter
behaves with qemu's quorum device (that is, feed two or more copies of
the same block source, one or both through nbdkit's evil filter, into
qemu's quorum to see if those corruptions are flagged).

> +++ b/filters/evil/nbdkit-evil-filter.pod
> @@ -0,0 +1,159 @@
> +=head1 NAME
> +
> +nbdkit-evil-filter - add random data corruption to reads
> +
> +=head1 SYNOPSIS
> +
> + nbdkit --filter=evil PLUGIN [PLUGIN-ARGS...]
> +        evil=[cosmic-rays|stuck-bits|stuck-wires]
> +        [evil-probability=PROB] [evil-stuck-probability=PROB]
> +        [evil-seed=SEED]
> +
> +=head1 DESCRIPTION
> +
> +nbdkit-evil-filter is a Byzantine filter for L<nbdkit(1)> that
> +randomly corrupts data when reading from the underlying plugin.  This
> +can be used for testing filesystem checksums.  Note that it does not
> +change write operations, so the underlying plugin contains the correct
> +data.

That's actually a cool description for something so evil!

> +
> +=head2 Extents
> +
> +Plugins can be sparse.  This filter only corrupts bits in non-sparse
> +parts of the backing disk and it leaves sparse regions unchanged
> +(which is realistic behaviour).  If you wish to use this filter to
> +corrupt sparse regions, then combine this filter with
> +L<nbdkit-noextents-filter(1)>.  For example:
> +
> + nbdkit --filter=evil --filter=noextents memory 1G

Makes sense in that the sparse portions of a file don't read from the
underlying storage and therefore don't get the stuck read effects
(assuming the metadata itself was not also corrupted by stuck bits
when determining where the sparse portions of the file were).  Of
course, the more physically realistic we make the filter, the more
complicated it gets.

> +++ b/filters/evil/evil.c

> +
> +static int
> +evil_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
> +             const char *key, const char *value)
> +{
> +  if (strcmp (key, "evil") == 0 || strcmp (key, "evil-mode") == 0) {
> +    if (strcmp (value, "cosmic-rays") == 0 ||
> +        strcmp (value, "cosmic") == 0) {

Do we want strcasecmp on these?  Do we want to allow _ as a synonym to -?

> +      evil_mode = COSMIC_RAYS;
> +      return 0;
> +    }
> +    else if (strcmp (value, "stuck-bits") == 0 ||
> +             strcmp (value, "stuck-bit") == 0 ||
> +             strcmp (value, "stuck") == 0) {
> +      evil_mode = STUCK_BITS;
> +      return 0;
> +    }
> +    else if (strcmp (value, "stuck-wires") == 0 ||
> +             strcmp (value, "stuck-wire") == 0) {
> +      evil_mode = STUCK_WIRES;
> +      return 0;
> +    }
> +    else {
> +      nbdkit_error ("evil: unknown mode: %s", value);
> +      return -1;
> +    }
> +  }
> +  else if (strcmp (key, "evil-probability") == 0) {
> +    if (nbdkit_parse_probability ("evil-probability", value,
> +                                  &evil_probability) == -1)
> +      return -1;
> +    if (evil_probability < 0 || evil_probability > 1) {
> +      nbdkit_error ("%s: probability out of range, should be [0..1]", key);

Do we want to explicitly support 0 for control purposes (no errors,
but testing the effect of the filter as a passthrough)?  Right now,
you permit the probability...

> +
> +/* This is the heart of the algorithm, the function which corrupts
> + * the buffer after reading it from the plugin.
> + *
> + * The observation is that if we have a block of (eg) size 10^6 bits
> + * and our probability of finding a corrupt bit is (eg) 1/10^4, then
> + * we expect approximately 100 bits in the block to be corrupted.
> + *
> + * For stuck bits we want the corrupted bits to be the same on each
> + * access, either relative to the backing disk (STUCK_BITS) or to the
> + * request (STUCK_WIRES).
> + *
> + * Instead of creating an expensive bitmap ahead of time covering the
> + * whole disk, we can use the random number generator with a fixed
> + * seed derived from the offset of the start of the block.  We can
> + * then choose a random number uniformly in the range [0..2*(1/P)] (in
> + * the example [0..2*10^4]) as the distance to the next corrupt bit.

Is '**' the intended operator in these two comments (exponentiation,
rather than multiplication)? [1]

> + * We jump forwards, corrupt that bit, and repeat until we reach
> + * the end of the block.

...but 2**(1/0) is either going to result in infinity or NaN,...

> + *
> + * "Corrupted" in this case can mean flipped by cosmic rays or stuck,
> + * depending on the filter mode.
> + *
> + * On average this will choose the right number of bits in the block.
> + * (Although their distribution will be suboptimal.  In a uniform
> + * distribution it should be possible for two corrupted bits to be
> + * greater than 2*(1/P) apart, but the above algorithm would not do
> + * this.  In practice this probably doesn't matter.)
> + *
> + * Note that "block" != "buffer", especially in the STUCK_BITS mode.
> + * We iterate over blocks as above, but only corrupt a bit when it
> + * happens to coincide with the buffer we have just read.
> + *
> + * We choose the block size adaptively so that at least 100 bits in
> + * the block will be corrupted.  The block size must be a power of 2.
> + * The block size thus depends on the probability.
> + */
> +enum corruption_type { FLIP, STUCK };
> +
> +static uint64_t block_size;     /* in bytes */
> +static struct random_state state; /* only used for cosmic-rays */
> +
> +static int
> +evil_thread_model (void)
> +{
> +  switch (evil_mode) {
> +  case COSMIC_RAYS:
> +    /* Because cosmic-rays uses the global random state we need to
> +     * tighten the thread model.
> +     */
> +    return NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS;
> +
> +  case STUCK_BITS:
> +  case STUCK_WIRES:
> +    return NBDKIT_THREAD_MODEL_PARALLEL;
> +  }
> +  abort ();
> +}
> +
> +static int
> +evil_get_ready (int thread_model)
> +{
> +  switch (evil_mode) {
> +  case COSMIC_RAYS:
> +    xsrandom ((uint64_t) evil_seed, &state);
> +    break;
> +
> +  case STUCK_BITS:
> +  case STUCK_WIRES:
> +    ;
> +  }
> +
> +  /* Choose the block size based on the probability, so that at least
> +   * 100 bits are expected to be corrupted in the block.  Block size
> +   * must be a power of 2.
> +   */
> +  block_size = next_power_of_2 ((uint64_t) (100. / evil_probability));

...and I'm worried that your block_size computation reaches a point
where it does not do what we want if the probably gets too close to
zero.  Even if it is not division by zero, division by a subnormal
float can easily produce overflow to infinity, which converts to
UINT64_MAX, and next_power_of_2(UINT64_MAX) was untested in patch 3's
unit tests, but appears like it will be '1 << (64 - 64)' == 1, which
isn't really desirable.

> +
> +  return 0;
> +}
> +
> +static uint8_t
> +corrupt_one_bit (uint8_t byte, unsigned bit,
> +                 uint64_t rand, enum corruption_type ct)
> +{
> +  const unsigned mask = 1 << bit;
> +
> +  switch (ct) {
> +  case FLIP:
> +    byte ^= mask;
> +    break;
> +  case STUCK:
> +    rand &= 0xffffffff;
> +    if (evil_stuck_probability * 0x100000000 > rand) {
> +      if (rand & 1) /* stuck high or low? */
> +        byte |= mask;
> +      else
> +        byte &= ~mask;
> +    }
> +  }
> +  return byte;
> +}
> +
> +static void
> +corrupt_buffer (uint8_t *buf, uint32_t count, uint64_t offset_in_block,
> +                struct random_state *rs, enum corruption_type ct)
> +{
> +  if (evil_probability == 0)
> +    /* No corruption, and avoids a divide by zero below. */
> +    return;

Okay, you are trying to avoid division by zero elsewhere, but this
avoidance may occur too late after get_ready has already settled on a
block size.

> +
> +  uint64_t offs, intvl, i, rand;
> +  const uint64_t dinvp = (uint64_t) (2.0 * (1.0 / evil_probability));

[1] Ah, I see - you did mean multiplication, not exponentiation.  Like
you commented elsewhere, it is not quite a true uniform distribution,
but certainly seems sane enough for the intent.

> +
> +  assert ((offset_in_block & ~(block_size-1)) == 0);
> +
> +  /* Iterate over the whole block from the start. */
> +  for (offs = 0; offs < offset_in_block + count; ) {
> +    /* Choose the length of the interval to the next corrupted bit, by
> +     * picking a random number in [0..2*(1/P)].
> +     *
> +     * Remember this is in bits!
> +     */
> +    intvl = xrandom (rs) % dinvp;
> +
> +    /* Consume one more random state.  We may or may not use this.
> +     * But we need to always consume two random states per iteration
> +     * to make the output predictable.
> +     */
> +    rand = xrandom (rs);
> +
> +    /* Adjust offs to that byte. */
> +    offs += intvl / 8;
> +
> +    /* If we have gone past the end of buffer, stop. */
> +    if (offs >= offset_in_block + count) break;
> +
> +    /* If the current offs lies within the buffer, corrupt a bit. */
> +    if (offs >= offset_in_block) {
> +      i = offs - offset_in_block;
> +      assert (i < count);
> +      buf[i] = corrupt_one_bit (buf[i], intvl & 7, rand, ct);
> +    }
> +  }
> +}
> +
> +/* Read data. */
> +static int
> +evil_pread (nbdkit_next *next,
> +            void *handle, void *buf, uint32_t count, uint64_t offset,
> +            uint32_t flags, int *err)
> +{
> +  uint64_t seed, bstart, len;
> +  struct random_state local_state;
> +
> +  if (next->pread (next, buf, count, offset, flags, err) == -1)
> +    return -1;
> +
> +  switch (evil_mode) {
> +  case COSMIC_RAYS:
> +    /* Use the global random state because we want to flip bits at random. */
> +    corrupt_buffer (buf, count, 0, &state, FLIP);
> +    break;
> +
> +  case STUCK_BITS:
> +    /* Split the request to align with blocks. */
> +    bstart = offset & ~(block_size-1);
> +    while (count > 0) {
> +      /* Set the seed so we corrupt the same bits relative to the offset. */
> +      seed = (int64_t) evil_seed + bstart;
> +      xsrandom (seed, &local_state);
> +      /* If the buffer straddles two blocks, shorten to just the part
> +       * inside the first block.
> +       */
> +      len = MIN (count, bstart + block_size - offset);
> +      corrupt_buffer (buf, len, offset - bstart, &local_state, STUCK);
> +      bstart += block_size;
> +      offset += len;
> +      buf += len;
> +      count -= len;
> +    }
> +    break;
> +
> +  case STUCK_WIRES:
> +    /* Set the seed so we corrupt the same bits in every request. */
> +    seed = (int64_t) evil_seed;
> +    xsrandom (seed, &local_state);
> +    corrupt_buffer (buf, count, 0, &local_state, STUCK);
> +    break;
> +  }

Thanks to the ability to set a fixed seed, you can write unit tests
that the different bit corruption behaviors are indeed repeatable
regardless of client access size or offset.

> +++ b/tests/test-evil-cosmic.sh

> +
> +f="test-evil-cosmic.out"
> +rm -f $f
> +cleanup_fn rm -f $f
> +
> +# 80 million zero bits in the backing disk, and the filter will
> +# randomly flip (ie. set high) 1 in 800,000 bits, or about 100.
> +
> +# XXX Actually the number of set bits clusters around 80.  There could
> +# be a mistake in my calculations or the interval algorithm we use
> +# might be biased.

I have not closely inspected the math yet to see if there's an obvious
bias (treat this as a first-pass quick review looking for obvious
code/shell bugs, as opposed to more insidious deep analysis bugs).
But that comment does mean we probably ought to double-check things
prior to v2.

> +
> +export f
> +nbdkit -U - null 10000000 \
> +       --filter=evil --filter=noextents \
> +       evil=cosmic-rays evil-probability=1/800000 \
> +       --run 'nbdcopy "$uri" $f'
> +
> +# This will give an approximate count of the number of set bits.
> +
> +zbytes="$( od -A n -w1 -v -t x1 < $f | sort | uniq -c |
> +           $SED -n -E -e 's/([0-9]+)[[:space:]]+00[[:space:]]*$/\1/p' )"
> +nzbits=$(( 10000000 - zbytes )); # looks wrong but actually correct ...

Interesting computation!

> +
> +if [ $nzbits -lt 20 ] || [ $nzbits -gt 180 ]; then
> +    echo "ERROR: $0: unexpected number of non-zero bits: $nzbits"
> +    echo "       (expecting about 100)"
> +    exit 1
> +fi

And seems like a safe enough fudge factor that we are unlikely to get
false negative CI runs.

> diff --git a/tests/test-evil-stuck-high-bits.sh b/tests/test-evil-stuck-high-bits.sh
> new file mode 100755
> index 000000000..8f6db2ea0
> --- /dev/null
> +++ b/tests/test-evil-stuck-high-bits.sh
> @@ -0,0 +1,86 @@

> +# Run nbdkit with the evil filter.
> +start_nbdkit -P $pidfile -U $sock \
> +             --filter=evil --filter=noextents \
> +             null 1G evil-probability=1/800000
> +
> +# Since 1 in 800,000 bits are stuck (on average), for every 100,000
> +# bytes that we read we expect about 1 stuck bit.  Note however that
> +# bits are stuck randomly low or high, and against the null filter you
> +# cannot see a stuck low bit, so in fact we expect to see only 1 stuck
> +# bit per 200,000 bytes.
> +#
> +# There is a separate test for stuck low bits (test-evil-stuck-low-bits.sh).
> +#
> +# Also stuck bits should be consistent across reads.
> +
> +nbdsh -u "nbd+unix://?socket=$sock" \
> +      -c - <<EOF
> +def count_bits(buf):
> +    r = 0
> +    for i in range(0, len(buf)-1):
> +        if buf[i] != 0:
> +            r += bin(buf[i]).count("1")
> +    return r

Interesting change to count by embedded python instead of shell code.

> +
> +# Expect about 50 stuck-high bits.
> +buf = h.pread(10000000, 0)
> +bits = count_bits(buf)
> +print("stuck high bits: %d (expected 50)" % bits)
> +assert(bits > 20 and bits < 80)
> +
> +# If we read subsets they should match the contents of the buffer.
> +buf1 = h.pread(1000, 1000)
> +assert(buf1 == buf[1000:2000])
> +
> +buf1 = h.pread(10000, 999)
> +assert(buf1 == buf[999:10999])

Should we also assert that there is at least one stuck bit in those
two ranges, and/or pick a different or larger range to improve the
chances of that being the case?

> +++ b/tests/test-evil-stuck-wires.sh

> +
> +# Run nbdkit with the evil filter.
> +start_nbdkit -P $pidfile -U $sock \
> +             --filter=evil --filter=noextents \
> +             null 1G evil=stuck-wires evil-probability=1/10000
> +
> +# Reads from the filter should have 1:10,000 bits stuck high or low.
> +# However we don't see stuck low bits are we are always reading

s/are we are/since we are/

> +# zeroes, so we only expect about 1:20,000 bits stuck high.
> +#
> +# If we read 10,000,000 bytes (80,000,000 bits) we would expect about
> +# 4000 stuck bits.
> +#
> +# No matter where we read from the pattern of stuck bits should be the
> +# same (stuck wires, not backing bits).
> +
> +nbdsh -u "nbd+unix://?socket=$sock" \
> +      -c - <<EOF
> +def count_bits(buf):
> +    r = 0
> +    for i in range(0, len(buf)-1):
> +        if buf[i] != 0:
> +            r += bin(buf[i]).count("1")
> +    return r
> +
> +buf1 = h.pread(10000000, 0)
> +bits = count_bits(buf1)
> +print("stuck high bits: %d (expected 4000)" % bits)
> +assert(bits > 3000 and bits < 5000)
> +
> +# These buffers should be identical.
> +buf2 = h.pread(10000000, 1024)
> +buf3 = h.pread(10000000, 32*1024*1024 - 9999)
> +assert(buf1 == buf2)
> +assert(buf1 == buf3)
> +

Overall looks like an interesting idea for a filter.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org