[Libguestfs] FYI: perf commands I'm using to benchmark nbdcopy
Richard W.M. Jones
rjones at redhat.com
Wed May 26 14:15:13 UTC 2021
On Wed, May 26, 2021 at 04:49:50PM +0300, Nir Soffer wrote:
> On Wed, May 26, 2021 at 4:03 PM Richard W.M. Jones <rjones at redhat.com> wrote:
> > In my testing, nbdcopy is a clear 4x faster than qemu-img convert, with
> > 4 also happening to be the default number of connections/threads.
> > Why use nbdcopy --connections=1? That completely disables threads in
> > nbdcopy.
>
> Because qemu-nbd does not report multicon when writing, so practically
> you get one nbd handle for writing.
Let's see if we can fix that. Crippling nbdcopy because of a missing
feature in qemu-nbd isn't right. I wonder what Eric's reasoning for
multi-conn not being safe is.
> > Also I'm not sure if --flush is fair (it depends on what
> > qemu-img does, which I don't know).
>
> qemu is flushing at the end of the operation. Not flushing is cheating :-)
That's fair enough. I will add that flag to my future tests.
I also pushed these commits to disable malloc checking outside tests:
https://gitlab.com/nbdkit/libnbd/-/commit/88e72dcb1631b315957f5f98e3cdfcdd1fd0fe29
https://gitlab.com/nbdkit/nbdkit/-/commit/6039780f3bb0617650fa1fa4c1399b0d3f1dcb26
> > The other interesting things are the qemu-img convert flags you're using:
> >
> > -m 16 number of coroutines, default is 8
>
> We use 8 in RHV since the difference is very small, and when running
> concurrent copies it does not matter. Since we use up to 64 concurrent
> requests in nbdcopy, it is useful to compare similar setup in qemu.
I'm not really clear on the relationship (in qemu-img) between number
of coroutines, number of pthreads and number of requests in flight.
At this rate I'm going to have to look at the source :-)
> > -W out of order writes, but the manual says "This is only recommended
> > for preallocated devices like host devices or other raw block
> > devices" which is a very unclear recommendation to me.
> > What's special about host devices versus (eg) files or
> > qcow2 files which means -W wouldn't always be recommended?
>
> This is how RHV use qemu-img convert when copying to raw preallocated
> volumes. Using -W can be up to 6x times faster. We use the same for imageio
> for any type of disk. This is the reason I tested this way.
>
> -W is equivalent to the nbdocpy multithreaded copy using a single connection.
>
> qemu-img does N concurrent reads. If you don't specify -W, it writes
> the data in the right order (based on offset). If a read did not
> finish, the copy loops waits until the read complets before
> writing. This ensure exactly one concurrent write, and it is much
> slower.
Thanks - interesting. Still not sure why you wouldn't want to use
this flag all the time.
See also:
https://lists.nongnu.org/archive/html/qemu-discuss/2021-05/msg00070.html
...
> This shows that nbdcopy works better when the latency is
> (practically) zero, copying data from memory to memory. This is
> useful for minimizing overhead in nbdcopy, but when copying real
> images with real storage with direct I/O the time to write the data
> to storage hides everything else.
>
> Would it be useful to add latency in the sparse-random plugin, so it
> behaves more like real storage? (or maybe it is already possible
> with a filter?)
We could use one of these filters:
https://libguestfs.org/nbdkit-delay-filter.1.html
https://libguestfs.org/nbdkit-rate-filter.1.html
Something like "--filter=delay wdelay=1ms" might be more realistic.
To simulate NVMe we might need to be able to specify microseconds there.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
More information about the Libguestfs
mailing list