[Libguestfs] [nbdkit PATCH] RFC: blocksize: Add test for sharding behavior

Thu May 26 16:15:39 UTC 2022

On Thu, May 26, 2022 at 09:58:50AM +0100, Richard W.M. Jones wrote:
> 
> Is there any way to do this without the literal sleeps?  Gitlab CI in
> particular appears to be very contended (I guess it runs in parallel
> on huge systems with vast numbers of unrelated containers).  I've seen
> threads being created that are so starved they never run at all even
> in tests running for many tens of seconds.
> 
> I'm thinking some kind of custom plugin which orders operations using
> its own clock that it is incremented as certain requests arrive, but I
> don't have a fully-formed idea of how it would work.

It might be possible to avoid literal sleeps, but I'm struggling with
the best approach.

In most cases, when we kick off two parallel aio requests, nbdkit will
start processing those two requests in the same order as libnbd sent
them.  But under heavy load, it is indeed possible that the threads
that process the requests wake up out of order.

For the test of parallel aligned writes, it's easy - our custom plugin
can distinguish the two writes (by their content), and wait to reply
to either request until both have arrived at the plugin; and it is
also easy enough to guarantee we send our replies in a desired order
if we coordinate via external means to know when the first reply
arrived at the client.

But for overlapping aligned/unaligned, anything other than sleep is
looking risky for deadlock.  We can detect whether the read (for the
unaligned) or the write (for the aligned) arrives first.  But unless
we know whether locking is in place, it's hard to tell if the
blocksize filter is properly serializing the overlapping requests from
the client.  When the bug is present, both requests will hit the
plugin eventually without any need to advance a clock (although slower
on a more loaded system); but when the bug is fixed, whichever request
landed in the blocksize filter second will be blocked until the plugin
responds to the first, so the plugin waiting for both requests to be
active at the same time would deadlock.  And the simplest lock (a
rwlock that prevents any other operation while a single RMW is in
effect) means we can't even try a parallel request to an unrelated
offset as proof of progress.

I'm still thinking about this...

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org