[Libguestfs] FYI: perf commands I'm using to benchmark nbdcopy

Richard W.M. Jones rjones at redhat.com
Wed May 26 13:03:08 UTC 2021


On Wed, May 26, 2021 at 02:50:32PM +0300, Nir Soffer wrote:
> Basically all give very similar results.
> 
> # hyperfine "./copy-libev $SRC $DST" "qemu-img convert -n -W -m 16 -S
> 1048576 $SRC $DST" "../copy/nbdcopy --sparse=1048576
> --request-size=1048576 --flush --requests=16 --connections=1 $SRC
> $DST"
> Benchmark #1: ./copy-libev nbd+unix:///?socket=/tmp/src.sock
> nbd+unix:///?socket=/tmp/dst.sock
>   Time (mean ± σ):     103.514 s ±  0.836 s    [User: 7.153 s, System: 19.422 s]
>   Range (min … max):   102.265 s … 104.824 s    10 runs
> 
> Benchmark #2: qemu-img convert -n -W -m 16 -S 1048576
> nbd+unix:///?socket=/tmp/src.sock nbd+unix:///?socket=/tmp/dst.sock
>   Time (mean ± σ):     103.104 s ±  0.899 s    [User: 2.897 s, System: 25.204 s]
>   Range (min … max):   101.958 s … 104.499 s    10 runs
> 
> Benchmark #3: ../copy/nbdcopy --sparse=1048576 --request-size=1048576
> --flush --requests=16 --connections=1
> nbd+unix:///?socket=/tmp/src.sock nbd+unix:///?socket=/tmp/dst.sock
>   Time (mean ± σ):     104.085 s ±  0.977 s    [User: 7.188 s, System: 19.965 s]
>   Range (min … max):   102.314 s … 105.153 s    10 runs

In my testing, nbdcopy is a clear 4x faster than qemu-img convert, with
4 also happening to be the default number of connections/threads.
Why use nbdcopy --connections=1?  That completely disables threads in
nbdcopy.  Also I'm not sure if --flush is fair (it depends on what
qemu-img does, which I don't know).

The other interesting things are the qemu-img convert flags you're using:

 -m 16  number of coroutines, default is 8

 -W     out of order writes, but the manual says "This is only recommended
        for preallocated devices like host devices or other raw block
	devices" which is a very unclear recommendation to me.
	What's special about host devices versus (eg) files or
	qcow2 files which means -W wouldn't always be recommended?

Anyway I tried various settings to see if I could improve the
performance of qemu-img convert vs nbdcopy using the sparse-random
test harness.  The results seem to confirm what has been said in this
thread so far.

libnbd-1.7.11-1.fc35.x86_64
nbdkit-1.25.8-2.fc35.x86_64
qemu-img-6.0.0-1.fc35.x86_64

$ hyperfine 'nbdkit -U - sparse-random size=100G --run "qemu-img convert \$uri \$uri"' 'nbdkit -U - sparse-random size=100G --run "qemu-img convert -m 16 -W \$uri \$uri"' 'nbdkit -U - sparse-random size=100G --run "nbdcopy \$uri \$uri"' 'nbdkit -U - sparse-random size=100G --run "nbdcopy --request-size=1048576 --requests=16 \$uri \$uri"'
Benchmark #1: nbdkit -U - sparse-random size=100G --run "qemu-img convert \$uri \$uri"
  Time (mean ± σ):     17.245 s ±  1.004 s    [User: 28.611 s, System: 7.219 s]
  Range (min … max):   15.711 s … 18.930 s    10 runs

Benchmark #2: nbdkit -U - sparse-random size=100G --run "qemu-img convert -m 16 -W \$uri \$uri"
  Time (mean ± σ):      8.618 s ±  0.266 s    [User: 33.091 s, System: 7.331 s]
  Range (min … max):    8.130 s …  8.943 s    10 runs

Benchmark #3: nbdkit -U - sparse-random size=100G --run "nbdcopy \$uri \$uri"
  Time (mean ± σ):      5.227 s ±  0.153 s    [User: 34.299 s, System: 30.136 s]
  Range (min … max):    5.049 s …  5.439 s    10 runs

Benchmark #4: nbdkit -U - sparse-random size=100G --run "nbdcopy --request-size=1048576 --requests=16 \$uri \$uri"
  Time (mean ± σ):      4.198 s ±  0.197 s    [User: 32.105 s, System: 24.562 s]
  Range (min … max):    3.868 s …  4.474 s    10 runs

Summary
  'nbdkit -U - sparse-random size=100G --run "nbdcopy --request-size=1048576 --requests=16 \$uri \$uri"' ran
    1.25 ± 0.07 times faster than 'nbdkit -U - sparse-random size=100G --run "nbdcopy \$uri \$uri"'
    2.05 ± 0.12 times faster than 'nbdkit -U - sparse-random size=100G --run "qemu-img convert -m 16 -W \$uri \$uri"'
    4.11 ± 0.31 times faster than 'nbdkit -U - sparse-random size=100G --run "qemu-img convert \$uri \$uri"'

> ## Compare nbdcopy request size with 16 requests and one connection

This is testing 4 connections I think?  Or is the destination not
advertising multi-conn?

> ## Compare number of requests with multiple connections
>
> To enable multiple connections to the destination, I hacked nbdcopy
> to ignore the the destination can_multicon always use multiple
> connections. This is how we use qemu-nbd with imageio in RHV.

So qemu-nbd doesn't advertise multi-conn?  I'd prefer if we fixed qemu-nbd.

Anyway, interesting stuff, thanks.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html




More information about the Libguestfs mailing list