[Libguestfs] nbdkit, VDDK, extents, readahead, etc

Mon Apr 15 14:49:05 UTC 2019

On Mon, Apr 15, 2019 at 04:30:47PM +0200, Martin Kletzander wrote:
> On Fri, Apr 12, 2019 at 04:30:02PM +0100, Richard W.M. Jones wrote:
> > +-- nbdkit monitoring process
> >   |
> >   +-- first child = nbdkit
> >   |
> >   +-- second child = ‘--run’ command
> >
> >so when the second child exits, the monitoring process (which is doing
> >nothing except waiting for the second child to exit) can kill nbdkit.
> >
>
> Oh, I thought the "monitoring process" would just be a signal
> handler.  If the monitoring process is just checking those two
> underlying ones, how come the PID changes for the APIs?  Is the Init
> called before the first child forks off?

Right, for convenience reasons the configuration steps (ie. .config,
.config_complete in [1]) are done before we fork either to act as a
server or to run commands, and the VDDK plugin does the initialization
in .config_complete which is the only sensible place to do it.

While this is specific to using the --run option, it would also I
assume happen if nbdkit forks into the background to become a server.
But if you run nbdkit without --run and with --foreground then it
remains in the foreground and the hang doesn't occur.

[1] https://github.com/libguestfs/nbdkit/blob/master/docs/nbdkit-plugin.pod

> >If VDDK cannot handle this situation (and I'm just guessing that this
> >is the bug) then VDDK has a bug.
> >
> 
> Sure, but having a workaround could be nice, if it's not too much work.

Patches welcome, but I suspect there's not a lot we can do in nbdkit

> >>>(3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can
> >>>nicely measure the benefits of extents:
> >>>
> >>>With noextents (ie. force full copy):
> >>>
> >>> elapsed time: 323.815 s
> >>> read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s
> >>>
> >>>Without noextents (ie. rely on qemu-img skipping sparse bits):
> >>>
> >>> elapsed time: 237.41 s
> >>> read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s
> >>> extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s
> >>>
> >>>Note if you deduct 120 seconds (see point (1) above) from these times
> >>>then it goes from 203s -> 117s, about a 40% saving.  We can likely do
> >>>better by having > 32 bit requests and qemu not using
> >>>NBD_CMD_FLAG_REQ_ONE.
> >>>
> >>How did you run qemu-img?
> >
> >The full command was:
> >
> >LD_LIBRARY_PATH=vmware-vix-disklib-distrib/lib64 \
> >./nbdkit -r -U - vddk file="[datastore1] Fedora 28/Fedora 28.vmdk" \
> >                     libdir=vmware-vix-disklib-distrib \
> >                     server=vmware user=root password=+/tmp/passwd \
> >                     thumbprint=xyz \
> >                     vm=moref=3 \
> >                     --filter=stats statsfile=/dev/stderr \
> >                     --run '
> >       unset LD_LIBRARY_PATH
> >       /home/rjones/d/qemu/qemu-img convert -p $nbd /var/tmp/out
> >   '
> >
> >(with extra filters added to the command line as appropriate for each
> >test).
> >
> >>I think on slow CPU and fast disk this might be even bigger
> >>difference if qemu-img can write whatever it gets and not searching
> >>for zeros.
> >
> >This is RHEL 8 so /var/tmp is XFS.  The hardware is relatively new and
> >the disk is an SSD.
> >
>
> Why I'm asking is because what you are measuring above still
> includes QEMU looking for zero blocks in the data.  I haven't found
> a way to make qemu write the sparse data it reads without explicitly
> sparsifying even more by checking for zeros and not creating a fully
> allocated image.

While qemu-img is still trying to detect zeroes, it won't find too
many because the image is thin provisioned.  However I take your point
that when copying a snapshot using the "single link" flag you don't
want qemu-img to do this because that means it may omit parts of the
snapshot that happen to be zero.  It would still be good to see the
output of ‘qemu-img map --output=json’ to see if qemu is really
sparsifying the zeroes or is actually writing them as zero non-holes
(which is IMO correct behaviour and shouldn't cause any problem).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html