[Libguestfs] nbdkit, VDDK, extents, readahead, etc

Martin Kletzander mkletzan at redhat.com
Fri Apr 12 13:52:58 UTC 2019


On Thu, Apr 11, 2019 at 06:55:55PM +0100, Richard W.M. Jones wrote:
>As I've spent really too long today investigating this, I want to
>document this in a public email, even though there's nothing really
>that interesting here.  One thing you find from search for VDD 6.7 /
>VixDiskLib_QueryAllocatedBlocks issues with Google is that we must be
>one of the very few users out there.  And the other thing is that it's
>quite broken.
>
>All testing was done using two baremetal servers connected back to
>back through a gigabit ethernet switch.  I used upstream qemu and
>nbdkit from git as of today.  I used a single test Fedora guest with a
>16G thin provisioned disk with about 1.6G allocated.
>
>Observations:
>
>(1) VDDK hangs for a really long time when using the nbdkit --run
>    option.
>
>It specifically hangs for exactly 120 seconds doing:
>
>  nbdkit: debug: VixDiskLib: Resolve host.
>
>This seems to be a bug in VDDK, possibly connected with the fact that
>we fork after initializing VDDK but before doing the
>VixDiskLib_ConnectEx.  I suspect it's something to do with the PID
>changing.
>
>It would be fair to deduct 2 minutes from all timings below.
>

Is the PID changed because you want to exec from the parent (where the init is
done), but all the other calls are done in the child? Is that the case so that
nbdkit is part of the process that someone spawned? I'm asking just to know if
something can be done about it.

>(2) VDDK cannot use VixDiskLib_QueryAllocatedBlocks if the disk is
>opened for writes.  It fails with this uninformative error:
>
>  nbdkit: vddk[1]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received NFC error 13 from server: NfcFssrvrOpen: Failed to open '[datastore1] Fedora 28/Fedora 28.vmdk'
>  nbdkit: vddk[1]: error: [NFC ERROR] NfcFssrvrClientOpen: received unexpected message 4 from server
>  nbdkit: vddk[1]: debug: VixDiskLib: Detected DiskLib error 290 (NBD_ERR_GENERIC).
>  nbdkit: vddk[1]: debug: VixDiskLib: VixDiskLibQueryBlockList: Fail to start query process. Error 1 (Unknown error) (DiskLib error 290: NBD_ERR_GENERIC) at 543.
>  nbdkit: vddk[1]: debug: can_extents: VixDiskLib_QueryAllocatedBlocks test failed, extents support will be disabled: original error: Unknown error
>
>The last debug statement is from nbdkit itself indicating that because
>VixDiskLib_QueryAllocatedBlocks didn't work, extents support is
>disabled.
>
>To work around this you can use nbdkit --readonly.  However I don't
>understand why that would be necessary, except perhaps it's just an
>undocumented limitation of VDDK.  For all the cases _we_ care about
>we're using --readonly, so that's lucky.
>

It might've been a safe measure for multiple accesses or something similar. Or
"we'll implement that later" symptome.

>(3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can
>nicely measure the benefits of extents:
>
>With noextents (ie. force full copy):
>
>  elapsed time: 323.815 s
>  read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s
>
>Without noextents (ie. rely on qemu-img skipping sparse bits):
>
>  elapsed time: 237.41 s
>  read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s
>  extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s
>
>Note if you deduct 120 seconds (see point (1) above) from these times
>then it goes from 203s -> 117s, about a 40% saving.  We can likely do
>better by having > 32 bit requests and qemu not using
>NBD_CMD_FLAG_REQ_ONE.
>

How did you run qemu-img? I think on slow CPU and fast disk this might be even
bigger difference if qemu-img can write whatever it gets and not searching for
zeros.

>(4) We can also add nbdkit-readahead-filter in both cases to see if
>that helps or not:
>
>With noextents and readahead:
>
>  elapsed time: 325.358 s
>  read: 265 ops, 17179869184 bytes, 4.22423e+08 bits/s
>
>As expected the readahead filter reduces the numbers of iops greatly.
>But in this back-to-back configuration VDDK requests are relatively
>cheap so no time is saved.
>
>Without noextents, with readahead:
>
>  elapsed time: 252.608 s
>  read: 96 ops, 1927282688 bytes, 6.10363e+07 bits/s
>  extents: 70 ops, 135654246400 bytes, 4.29612e+09 bits/s
>
>Readahead is detrimental in this case, as expected because this filter
>works best when reads are purely sequential, and if not it will tend
>to prefetch extra data.  Notice that the number of bytes read is
>larger here than in the earlier test.
>

Really good write-up, thanks for sharing.

>Rich.
>
>-- 
>Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
>Read my programming and virtualization blog: http://rwmj.wordpress.com
>virt-df lists disk usage of guests without needing to install any
>software inside the virtual machine.  Supports Linux and Windows.
>http://people.redhat.com/~rjones/virt-df/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190412/2f722eb6/attachment.sig>


More information about the Libguestfs mailing list