[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Libguestfs] FYI: perf commands I'm using to benchmark nbdcopy

On Wed, May 26, 2021 at 04:49:50PM +0300, Nir Soffer wrote:
> On Wed, May 26, 2021 at 4:03 PM Richard W.M. Jones <rjones redhat com> wrote:
> > In my testing, nbdcopy is a clear 4x faster than qemu-img convert, with
> > 4 also happening to be the default number of connections/threads.
> > Why use nbdcopy --connections=1?  That completely disables threads in
> > nbdcopy.
> Because qemu-nbd does not report multicon when writing, so practically
> you get one nbd handle for writing.

Let's see if we can fix that.  Crippling nbdcopy because of a missing
feature in qemu-nbd isn't right.  I wonder what Eric's reasoning for
multi-conn not being safe is.

> > Also I'm not sure if --flush is fair (it depends on what
> > qemu-img does, which I don't know).
> qemu is flushing at the end of the operation. Not flushing is cheating :-)

That's fair enough.  I will add that flag to my future tests.

I also pushed these commits to disable malloc checking outside tests:


> > The other interesting things are the qemu-img convert flags you're using:
> >
> >  -m 16  number of coroutines, default is 8
> We use 8 in RHV since the difference is very small, and when running
> concurrent copies it does not matter. Since we use up to 64 concurrent
> requests in nbdcopy, it is useful to compare similar setup in qemu.

I'm not really clear on the relationship (in qemu-img) between number
of coroutines, number of pthreads and number of requests in flight.
At this rate I'm going to have to look at the source :-)

> >  -W     out of order writes, but the manual says "This is only recommended
> >         for preallocated devices like host devices or other raw block
> >         devices" which is a very unclear recommendation to me.
> >         What's special about host devices versus (eg) files or
> >         qcow2 files which means -W wouldn't always be recommended?
> This is how RHV use qemu-img convert when copying to raw preallocated
> volumes. Using -W  can be up to 6x times faster. We use the same for imageio
> for any type of disk. This is the reason I tested this way.
> -W is equivalent to the nbdocpy multithreaded copy using a single connection.
> qemu-img does N concurrent reads. If you don't specify -W, it writes
> the data in the right order (based on offset). If a read did not
> finish, the copy loops waits until the read complets before
> writing. This ensure exactly one concurrent write, and it is much
> slower.

Thanks - interesting.  Still not sure why you wouldn't want to use
this flag all the time.

See also:

> This shows that nbdcopy works better when the latency is
> (practically) zero, copying data from memory to memory. This is
> useful for minimizing overhead in nbdcopy, but when copying real
> images with real storage with direct I/O the time to write the data
> to storage hides everything else.
> Would it be useful to add latency in the sparse-random plugin, so it
> behaves more like real storage? (or maybe it is already possible
> with a filter?)

We could use one of these filters:

Something like "--filter=delay wdelay=1ms" might be more realistic.
To simulate NVMe we might need to be able to specify microseconds there.


Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]