[Libguestfs] nbdkit, VDDK, extents, readahead, etc

Tue Apr 16 14:26:50 UTC 2019

On Mon, Apr 15, 2019 at 03:49:05PM +0100, Richard W.M. Jones wrote:
>On Mon, Apr 15, 2019 at 04:30:47PM +0200, Martin Kletzander wrote:
>> On Fri, Apr 12, 2019 at 04:30:02PM +0100, Richard W.M. Jones wrote:
>> > +-- nbdkit monitoring process
>> >   |
>> >   +-- first child = nbdkit
>> >   |
>> >   +-- second child = ‘--run’ command
>> >
>> >so when the second child exits, the monitoring process (which is doing
>> >nothing except waiting for the second child to exit) can kill nbdkit.
>> >
>>
>> Oh, I thought the "monitoring process" would just be a signal
>> handler.  If the monitoring process is just checking those two
>> underlying ones, how come the PID changes for the APIs?  Is the Init
>> called before the first child forks off?
>
>Right, for convenience reasons the configuration steps (ie. .config,
>.config_complete in [1]) are done before we fork either to act as a
>server or to run commands, and the VDDK plugin does the initialization
>in .config_complete which is the only sensible place to do it.
>
>While this is specific to using the --run option, it would also I
>assume happen if nbdkit forks into the background to become a server.
>But if you run nbdkit without --run and with --foreground then it
>remains in the foreground and the hang doesn't occur.
>

Yes, also, the delay I noticed was amplified by the req_one from qemu-img.
Since I am testing this on 100G file, there are 50 requests for extents to check
the allocation size of the image and then another 50 requests when actually
"copying the data".  I changed the script to use --exit-with-parent and it still
takes significant amount of time.  Although it's roughly 2 minutes faster ;)

>[1] https://github.com/libguestfs/nbdkit/blob/master/docs/nbdkit-plugin.pod
>
>> >If VDDK cannot handle this situation (and I'm just guessing that this
>> >is the bug) then VDDK has a bug.
>> >
>>
>> Sure, but having a workaround could be nice, if it's not too much work.
>
>Patches welcome, but I suspect there's not a lot we can do in nbdkit
>
>> >>>(3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can
>> >>>nicely measure the benefits of extents:
>> >>>
>> >>>With noextents (ie. force full copy):
>> >>>
>> >>> elapsed time: 323.815 s
>> >>> read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s
>> >>>
>> >>>Without noextents (ie. rely on qemu-img skipping sparse bits):
>> >>>
>> >>> elapsed time: 237.41 s
>> >>> read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s
>> >>> extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s
>> >>>
>> >>>Note if you deduct 120 seconds (see point (1) above) from these times
>> >>>then it goes from 203s -> 117s, about a 40% saving.  We can likely do
>> >>>better by having > 32 bit requests and qemu not using
>> >>>NBD_CMD_FLAG_REQ_ONE.
>> >>>
>> >>How did you run qemu-img?
>> >
>> >The full command was:
>> >
>> >LD_LIBRARY_PATH=vmware-vix-disklib-distrib/lib64 \
>> >./nbdkit -r -U - vddk file="[datastore1] Fedora 28/Fedora 28.vmdk" \
>> >                     libdir=vmware-vix-disklib-distrib \
>> >                     server=vmware user=root password=+/tmp/passwd \
>> >                     thumbprint=xyz \
>> >                     vm=moref=3 \
>> >                     --filter=stats statsfile=/dev/stderr \
>> >                     --run '
>> >       unset LD_LIBRARY_PATH
>> >       /home/rjones/d/qemu/qemu-img convert -p $nbd /var/tmp/out
>> >   '
>> >
>> >(with extra filters added to the command line as appropriate for each
>> >test).
>> >
>> >>I think on slow CPU and fast disk this might be even bigger
>> >>difference if qemu-img can write whatever it gets and not searching
>> >>for zeros.
>> >
>> >This is RHEL 8 so /var/tmp is XFS.  The hardware is relatively new and
>> >the disk is an SSD.
>> >
>>
>> Why I'm asking is because what you are measuring above still
>> includes QEMU looking for zero blocks in the data.  I haven't found
>> a way to make qemu write the sparse data it reads without explicitly
>> sparsifying even more by checking for zeros and not creating a fully
>> allocated image.
>
>While qemu-img is still trying to detect zeroes, it won't find too
>many because the image is thin provisioned.  However I take your point
>that when copying a snapshot using the "single link" flag you don't
>want qemu-img to do this because that means it may omit parts of the
>snapshot that happen to be zero.  It would still be good to see the
>output of ‘qemu-img map --output=json’ to see if qemu is really
>sparsifying the zeroes or is actually writing them as zero non-holes
>(which is IMO correct behaviour and shouldn't cause any problem).
>

I *thought* it is not writing them as zero data, nor punching the holes.  I
tried with both raw and qcow2 images (with options -n -W -C and combinations).
And then realized that the single-link patch is incomplete, so it read some more
zeroes than it actually should.  That means it might just work, but I need to
finish the patch and test it out.  And each test takes some infuriating time.
Not that it takes *so* long, but waiting just to see that it failed is a bad
enough experience on its own.

>Rich.
>
>-- 
>Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
>Read my programming and virtualization blog: http://rwmj.wordpress.com
>virt-builder quickly builds VMs from scratch
>http://libguestfs.org/virt-builder.1.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190416/68216324/attachment.sig>