[Libguestfs] Some questions about nbdkit vs qemu performance affecting virt-v2v

Tue Jul 27 11:16:59 UTC 2021

Hi Eric, a couple of questions below about nbdkit performance.

Modular virt-v2v will use disk pipelines everywhere.  The input
pipeline looks something like this:

  socket <- cow filter <- cache filter <-   nbdkit
                                           curl|vddk

We found there's a notable slow down in at least one case: When the
source plugin is very slow (eg. it's curl plugin to a slow and remote
website, or VDDK in general), everything runs very slowly.

I made a simple test case to demonstrate this:

$ virt-builder fedora-33
$ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img delay-read=500ms --run 'virt-inspector --format=raw -a "$uri" -vx'

This uses a local file with the delay filter on top injecting half
second delays into every read.  It "feels" a lot like the slow case we
were observing.  Virt-v2v also does inspection as a first step when
converting an image, so using virt-inspector is somewhat realistic.

Unfortunately this actually runs far too slowly for me to wait around
- at least 30 mins, and probably a lot longer.  This compares to only
7 seconds if you remove the delay filter.

Reducing the delay to 50ms means at least it finishes in a reasonable time:

$ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \
     delay-read=50ms \
     --run 'virt-inspector --format=raw -a "$uri"'

real    5m16.298s
user    0m0.509s
sys     0m2.894s

In the above scenario the cache filter is not actually doing anything
(since virt-inspector does not write).  Adding cache-on-read=true lets
us cache the reads, avoiding going through the "slow" plugin in many
cases, and the result is a lot better:

$ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \
     delay-read=50ms cache-on-read=true \
     --run 'virt-inspector --format=raw -a "$uri"'

real    0m27.731s
user    0m0.304s
sys     0m1.771s

However this is still slower than the old method which used qcow2 +
qemu's copy-on-read.  It's harder to demonstrate this, but I modified
virt-inspector to use the copy-on-read setting (which it doesn't do
normally).  On top of nbdkit with 50ms delay and no other filters:

qemu + copy-on-read backed by nbdkit delay-read=50ms file:
real    0m23.251s

So 23s is the time to beat.  (I believe that with longer delays, the
gap between qemu and nbdkit increases in favour of qemu.)

Q1: What other ideas could we explore to improve performance?

- - -

In real scenarios we'll actually want to combine cow + cache, where
cow is caching writes, and cache is caching reads.

  socket <- cow filter <- cache filter   <-  nbdkit
                       cache-on-read=true   curl|vddk

The cow filter is necessary to prevent changes being written back to
the pristine source image.

This is actually surprisingly efficient, making no noticable
difference in this test:

time ./nbdkit --filter=cow --filter=cache --filter=delay \
     file /var/tmp/fedora-33.img \
     delay-read=50ms cache-on-read=true \
     --run 'virt-inspector --format=raw -a "$uri"' 

real	0m27.193s
user	0m0.283s
sys	0m1.776s

Q2: Should we consider a "cow-on-read" flag to the cow filter (thus
removing the need to use the cache filter at all)?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/