[Libguestfs] More parallelism in VDDK driver

Eric Blake eblake at redhat.com
Wed Aug 5 14:00:11 UTC 2020


On 8/5/20 7:47 AM, Richard W.M. Jones wrote:
> 
> Here are some results anyway.  The command I'm using is:
> 
> $ ./nbdkit -r -U - vddk \
>      libdir=/path/to/vmware-vix-disklib-distrib \
>      user=root password='***' \
>      server='***' thumbprint=aa:bb:cc:... \
>      vm=moref=3 \
>      file='[datastore1] Fedora 28/Fedora 28.vmdk' \
>      --run 'time /var/tmp/threaded-reads $unixsocket'
> 
> Source for threaded-reads is attached.
> 

> 
> Tests (1) and (2) are about the same within noise.
> 
> Test (3) is making 8 times as many requests as test (1), so I think
> it's fair to compare the 8 x time taken by test (1) (ie. the time it
> would have taking to make 80,000 requests):
> 
>    Test (1) * 8 = 11m28
>    Test (3)     =  7m11
> 
> So if we had a client which could actually use multi-conn then this
> would be a reasonable win.  It seems like there's still a lot of
> locking going on somewhere, perhaps inside VDDK or in the server.
> It's certainly nowhere near a linear speedup.

If I'm reading 
https://code.vmware.com/docs/11750/virtual-disk-development-kit-programming-guide/GUID-6BE903E8-DC70-46D9-98E4-E34A2002C2AD.html 
correctly, we cannot reuse a single VDDK handle for two parallel 
requests, but we CAN have two VDDK handles open, with requests open on 
both handles at the same time.  That's what SERIALIZE_REQUESTS buys us - 
a client that opens multiple connections (taking advantage of 
multi-conn) now has two NBD handles and therefore two VDDK handles, and 
the reads separated across those two handles can proceed in parallel.

But I also don't see anything that prevents a single NBD connection from 
opening multiple VDDK handles under the hood, or even from having all of 
those handles opened as coordinated through a single helper thread. 
That is, if nbdkit were to provide a way for a plugin to know the 
maximum number of threads that will be used in parallel, then vddk's 
.after_fork could spawn a dedicated thread for running VDDK handle 
open/close requests (protected by a mutex), the .open callback can then 
obtain the mutex, call into the helper thread to open N handles, the 
thread model is advertised as PARALLEL, and in all other calls (.pread, 
.pwrite, ...) we pick any one of the N handles for that NBD connection 
that is not currently in use.  The client application would not even 
have to take advantage of multi-conn, but gets the full benefit of 
out-of-order thread access for a parallel speedup.

> 
> The patch does at least seem stable.  I'll post it in a minute.

Whether we do all VixDiskLib_Open from a single dedicated helper thread 
created during .after_fork, or rely on pthread mutex locking so that at 
most one .open is calling Open or Close, is different from whether we 
open multiple VDDK handles per single NBD connection in PARALLEL mode, 
vs. one VDDK handle per NBD connection in SERIALIZE_REQUESTS (offloading 
the parallelism to the multi-conn client).  We could probably do both, 
but while opening multiple VDDK handles per NBD connection appears to be 
fully compliant with VDDK docs (patch not written yet), the former 
(using a mutex to serialize open calls, but open is not happening in the 
same thread all the time) is indeed risky.  And having the speedup 
available to all clients, not just multi-conn aware clients, seems like 
it might be nice to have, even if the speedup is not linear.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




More information about the Libguestfs mailing list