[Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

Tue Nov 21 21:48:26 UTC 2017

On 11/21/2017 03:21 PM, Richard W.M. Jones wrote:
> This works OK on x86_64, but fails on our fast new Amberwing (aarch64)
> machine.  I've attached the test-suite.log file, but I'm not very sure
> what's going wrong from that.
> 

I'll see what I can spot...

> FAIL: test-parallel-file.sh
> ===========================
> 
> read 1/1 bytes at offset 0
> 1 bytes, 1 ops; 0.0007 sec (1.340 KiB/sec and 1371.7421 ops/sec)
> wrote 1/1 bytes at offset 1
> 1 bytes, 1 ops; 0.0203 sec (49.193231 bytes/sec and 49.1932 ops/sec)
> /var/tmp/nbdkit-1.1.18/src/nbdkit -v -t 1 -U -
> /var/tmp/nbdkit-1.1.18/plugins/file/.libs/nbdkit-file-plugin.so file=file-data rdelay=2 wdelay=1 --run 
>   $QEMU_IO -f raw -c "aio_read 0 1" -c "aio_write -P 2 1 1" -c aio_flush $nbd
> 

> nbdkit: file[1]: debug: handshake complete, processing requests serially

When run serially,

> read 1/1 bytes at offset 0
> 1 bytes, 1 ops; 0:00:02.00 (0.499933 bytes/sec and 0.4999 ops/sec)
> wrote 1/1 bytes at offset 1
> 1 bytes, 1 ops; 0:00:03.01 (0.331378 bytes/sec and 0.3314 ops/sec)

the read takes 2 seconds, and the write takes an additional second.

> nbdkit: debug: /var/tmp/nbdkit-1.1.18/plugins/file/.libs/nbdkit-file-plugin.so: unload
> /var/tmp/nbdkit-1.1.18/src/nbdkit -v -U -
> /var/tmp/nbdkit-1.1.18/plugins/file/.libs/nbdkit-file-plugin.so file=file-data rdelay=2 wdelay=1 --run 
>   $QEMU_IO -f raw -c "aio_read 0 1" -c "aio_write -P 2 1 1" -c aio_flush $nbd
> 

When run with threads,

> nbdkit: file[1]: debug: handshake complete, processing requests with 16 threads
> nbdkit: debug: starting worker thread file.0
> nbdkit: debug: starting worker thread file.1
> nbdkit: file.0: debug: acquire unload prevention lock
> nbdkit: file.0: debug: pread count=1 offset=0
> nbdkit: debug: starting worker thread file.2
> nbdkit: file.1: debug: acquire unload prevention lock
> nbdkit: file.1: debug: pwrite count=1 offset=1

there are definitely two threads that both obtain the unload prevention
lock at once (so we do have parallel processing),

> nbdkit: file.1: debug: flush
> nbdkit: file.0: debug: release unload prevention lock
> nbdkit: file.1: debug: release unload prevention lock
> nbdkit: file.2: debug: acquire unload prevention lock
> nbdkit: file.2: debug: flush
> nbdkit: file.2: debug: release unload prevention lock
> nbdkit: file.4: debug: acquire unload prevention lock
> nbdkit: file.4: debug: flush
> nbdkit: file.4: debug: release unload prevention lock

and then, because we have not wired up nbdkit to expose FUA support to
the plugins, we are calling the flush callback in parallel through three
threads at once (probably way more flushing than we need, but doesn't
seem to be the cause of the failure).  I don't know how many of those
flushes were triggered by nbdkit connections.c and how many were
triggered by NBD_CMD_FLUSH from qemu-io.

> nbdkit: file.0: debug: client closed input socket, closing connection
> nbdkit: file.0: debug: exiting worker thread file.0
> read 1/1 bytes at offset 0
> 1 bytes, 1 ops; 0:00:02.47 (0.403374 bytes/sec and 0.4034 ops/sec)
> wrote 1/1 bytes at offset 1
> 1 bytes, 1 ops; 0:00:02.55 (0.391644 bytes/sec and 0.3916 ops/sec)

qemu-io finally records the replies it receives, and says that the read
took 2.4 seconds, then the write took an additional .08 seconds beyond
the read.  The two commands were obviously in flight in parallel
(because the wdelay is 1 second; as evidenced by the earlier results),
but there must be enough lock contention or other interaction between
the threads that we did not manage to get the write complete before the
read.  When I run the same test locally, and get a PASS result, my
qemu-io results look more like:

wrote 1/1 bytes at offset 1
1 bytes, 1 ops; 0:00:01.00 (0.994919 bytes/sec and 0.9949 ops/sec)
read 1/1 bytes at offset 0
1 bytes, 1 ops; 0:00:02.00 (0.499855 bytes/sec and 0.4999 ops/sec)

Hmm - I wonder - could it be extra delay induced by the flush that
happens to make the write delay longer than the read?  Should I swap the
commands (have read be 2 seconds, write be 1 second, so that any
flushing tied to the write is more likely to still finish before the read)?

Also, this makes it VERY obvious that without timestamps in our nbdkit
-v output, it's hard to tell WHEN things are happening, to see where the
bottlenecks are.

> FAIL: test-parallel-nbd.sh
> ==========================
> 

> nbdkit: file[1]: debug: release unload prevention lock
> nbdkit: file[1]: debug: handshake complete, processing requests with 16 threads
> nbdkit: nbd[1]: nbdkit: debug: debug: acquire unload prevention lockstarting worker thread file.0
> 
> nbdkit: nbd[1]: nbdkit: debug: debug: get_sizestarting worker thread file.2
> 
> nbdkit: nbd[1]: debug: can_write

Here, we have a nice demonstration that commid d02d9c9d works for
messages from one process (my debugging was worse without the mutex in
errors.c), but no difference when interleaving messages from multiple
processes.  If we're that worried about it, we could make errors.c
create the entire message into a single buffer (prologue, body, and \n),
and only then have a single stdio call on that giant buffer, to make
cross-process interleaving less likely, at the expense of more memory
manipulation to build up the error message into a temporary buffer.  I'm
not too fussed, though - you aren't often trying to debug two nbdkit
processes writing to the same file at once.

> /var/tmp/nbdkit-1.1.18/src/nbdkit -v -U -
> /var/tmp/nbdkit-1.1.18/plugins/nbd/.libs/nbdkit-nbd-plugin.so socket=test-parallel-nbd.sock --run 
>   $QEMU_IO -f raw -c "aio_read 0 1" -c "aio_write -P 2 1 1" -c aio_flush $nbd
> 

And when we run the parallel nbd plugin over parallel file plugin, we see:

> nbdkit: nbd[1]: debug: handshake complete, processing requests with 16 threads
> nbdkit: debug: starting worker thread file.1
> nbdkit: debug: nbdkit: starting worker thread file.10debug: 
> starting worker thread nbd.0
> nbdkit: debug: starting worker thread nbd.2
> nbdkit: debug: starting worker thread nbd.3
> nbdkit: nbdkit: debug: debug: starting worker thread file.4starting worker thread nbd.1
> 
> nbdkit: nbd.3: debug: acquire unload prevention lock
> nbdkit: nbd.3nbdkit: : debug: debug: starting worker thread file.12pread count=1 offset=0
> 
> nbdkit: nbd.3: debug: sending request with type 0 and cookie 0xa00000009

nbd plugin sends the read request first,

> nbdkit: debug: nbdkit: starting worker thread file.6debug: 
> starting worker thread nbd.4
> nbdkit: nbdkit: file.0debug: : starting worker thread nbd.6debug: 
> acquire unload prevention lock
> nbdkit: file.0: debug: pread count=1 offset=0

which the file plugin starts processing;

> nbdkit: nbd.0: debug: acquire unload prevention lock
> nbdkit: nbdkit: debug: nbd.0starting worker thread file.15: 
> debug: pwrite count=1 offset=1
> nbdkit: debug: starting worker thread nbd.5
> nbdkit: debug: starting worker thread file.8nbdkit: 
> nbd.0: debug: sending request with type 1 and cookie 0xc0000000b

then in another thread nbd sends the write request,

> nbdkit: debug: starting worker thread file.14
> nbdkit: file.6: nbdkit: debug: debug: acquire unload prevention lockstarting worker thread nbd.7
> 
> nbdkit: file.6: debug: pwrite count=1 offset=1

which the file plugin starts processing. So both processes are indeed
interleaved, as desired.

> nbdkit: debug: starting worker thread nbd.8
> nbdkit: debug: starting worker thread nbd.9
> nbdkit: debug: starting worker thread nbd.10
> nbdkit: debug: starting worker thread nbd.12
> nbdkit: debug: starting worker thread nbd.13
> nbdkit: debug: starting worker thread nbd.14
> nbdkit: debug: starting worker thread nbd.11
> nbdkit: debug: starting worker thread nbd.15
> nbdkit: file.6: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xc0000000b

And the write completes first, as desired.

> nbdkit: nbd.0: debug: flush
> nbdkit: nbd.0: debug: sending request with type 3 and cookie 0xc0000000b
> nbdkit: file.5: debug: acquire unload prevention lock
> nbdkit: file.5: debug: flush
> nbdkit: file.0: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xa00000009
> nbdkit: nbd.3: debug: release unload prevention lock
> nbdkit: file.5: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xc0000000b
> nbdkit: nbd.0: debug: release unload prevention lock
> nbdkit: nbd.6: debug: acquire unload prevention lock
> nbdkit: nbd.6: debug: flush
> nbdkit: nbd.6: debug: sending request with type 3 and cookie 0xa00000009
> nbdkit: file.7: debug: acquire unload prevention lock
> nbdkit: file.7: debug: flush
> nbdkit: file.7: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xa00000009
> nbdkit: nbd.6: debug: release unload prevention lock
> nbdkit: nbd.2: debug: acquire unload prevention lock
> nbdkit: nbd.2: debug: flush
> nbdkit: nbd.2: debug: sending request with type 3 and cookie 0xa00000009
> nbdkit: file.9: debug: acquire unload prevention lock
> nbdkit: file.9: debug: flush
> nbdkit: file.9: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xa00000009

but we have a lot of traffic on those flush requests

> nbdkit: nbd.2: debug: release unload prevention lock
> nbdkit: nbd.4: debug: client sent disconnect command, closing connection
> nbdkit: nbd.4: debug: exiting worker thread nbd.4
> nbdkit: nbd.1: debug: client closed input socket, closing connection
> read 1/1 bytes at offset 0
> 1 bytes, 1 ops; 0:00:02.47 (0.403408 bytes/sec and 0.4034 ops/sec)
> wrote 1/1 bytes at offset 1
> 1 bytes, 1 ops; 0:00:02.56 (0.390498 bytes/sec and 0.3905 ops/sec)

so that qemu-io sees the read take 2.4 seconds and the write took .09
seconds longer.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 619 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20171121/fbc2cc47/attachment.sig>