[Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload

Sun Feb 13 17:44:22 UTC 2022

On Sun, Feb 13, 2022 at 05:13:37PM +0200, Nir Soffer wrote:
> On Sun, Feb 13, 2022 at 11:41 AM Richard W.M. Jones <rjones at redhat.com> wrote:
> >
> > On Sat, Feb 12, 2022 at 10:49:42PM +0200, Nir Soffer wrote:
> > > rhv-upload plugin is translating every NBD command to HTTP request,
> > > translated back to NBD command on imageio server. The HTTP client and
> > > server, and the NBD client on the imageio server side are synchronous
> > > and implemented in python, so they have high overhead per request. To
> > > get good performance we need to use larger request size.
> > >
> > > Testing shows that request size of 8MiB is best, speeding up the copy
> > > disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster).
> >
> > Unfortunately this will break VDDK since it cannot handle very large
> > requests (I think 4M is about the max without reconfiguring the
> > server).
> 
> Are you sure it will break VDDK?
> 
> Request size limit is in VDDK API, not in the nbdkit plugin. When you
> request 8M request, the VDDK plugin should allocated a 8M buffer,
> and issue multiple calls to VDDK APIs, using VDDK maximum
> request size to fill the buffer.
> 
> If the VDDK plugin does not do this, this is a bug in the plugin, since
> it must respect the underlying API.

nbdkit-vddk-plugin doesn't do this at the moment.

With VDDK it's kind of complicated because it's actually a maximum
memory setting for a particular daemon on the server side (hostd), so
the limit is roughly

  number of parallel requests x size of each request <= memory limit - overhead

Also this can be adjusted on the server side by the administrator (and
we recommend virt-v2v users do this) but there's no way to query the
setting from the client.  More here:

https://vdc-download.vmware.com/vmwb-repository/dcr-public/48f32c72-7f7f-49da-9887-63b7ae1fd6f0/b820bcb4-f4bd-4141-bb9e-68ca9aded55d/nbd-perf.html

> If 8M does break the vddk plugin, we can use 4M, it is only a little
> slower then 8M but still much faster than 256k.
> 
> > Also larger requests have adverse performance effects in
> > other configurations, although I understand this patch tries to
> > retrict the change to when the output mode is rhv-upload.
> 
> Yes, this affects only -o rhv-upload.
> 
> > We need to think of some other approach, but I'm not sure what it is.
> > I'd really like to be able to talk to imageio's NBD server directly!
> 
> We have a RFE to implement a local nbd socket, which should be easy,
> but the only attention we got so far was an attempt to close it ;-)
> 
> Even if we have a local only nbd socket, lets's say in oVirt 4.6, it will not
> help existing users.
> 
> > Other relevant commits:
> > https://github.com/libguestfs/virt-v2v/commit/7ebb2c8db9d4d297fbbef116a9828a9dde700de6
> > https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165
> 
> This looks like a nicer way to implement this change.
> 
> >
> > [...]
> > > This is an ugly hack; the preferred request size should be a function of
> > > the output module that only output_rhv_upload will override, but I don't
> > > know how to implement this with the current code.
> >
> > Just add a new value to output/output.ml{,i}.  There is no
> > superclassing (this is not OO) so you'll have to add the value to
> > every output module implementation, defaulting to None.
> 
> Sounds reasonable, we have only a few outputs.
> 
> > However I'd like to think of another approach first.
> >
> >  - Have nbdcopy split and combine requests so request size for input
> >    and output can be different?  Sounds complicated but might be
> >    necessary one day to support minimum block size.
> 
> I think this is not needed for the case of large request size,
> since on nbd client/server size it is easy to handle large requests
> regardless of the underlying API limits.
> 
> >  - More efficient Python plugin that might combine requests?  Also
> >    complicated ...
> 
> This will be slower and complicated.
> 
> First we need to solve the issue of getting the right connection to handle
> the nbd command - now every command is handled by a random connection
> from the pool.
> 
> Assuming we solved connection pooling, we can keep a buffer for the next
> request, and on every pwrite() fill this buffer. When the buffer is full, or
> when receiving a zero()/flush()/close() request, we need to send the
> buffer. This introduces additional copy which will slow down the transfer.
> 
> Finally there is the issue of reporting errors - since we buffer nbd commands
> data, we cannot report errors for every commands,  we need to report
> errors later, either in the middle of the command (buffer becomes full)
> or when the next command is received.
> 
> So this will be slower, very hard to implement, and hard to debug.
> 
> The way we solved this in imageio client - copying data between nbd
> and http backends, is to implement the copy loop in the http client.
> 
> In imageio we have this (simplified) copy loop:
> 
>     for extent in extents:
>         if zero:
>             dst.zero(extent)
>         else:
>             copy(src, dst, extent)
> 
> copy() is checking if the src or dst can do efficient copy,
> and if not it fallbacks to a generic copy:
> 
>     if hasattr(dst, "read_from"):
>         dst.read_from(src, extent)
>     elif hasattr(src, "write_to"):
>         srt.write_to(dst, extent)
>     else:
>         for chunk in extent:
>             src.read(chunk)
>             dst.write(chunk)
> 
> The http backend implements read_from like this:
> 
>     send put request headers
>     for chunk in extent:
>         src.read(chunk)
>         socket.write(chunk)
> 
> In this way we can use the most efficient request size for the nbd
> input (256k), for the http socket (256k is good), but minimize the
> number of http requests.
> 
> Of course this will be harder with the async API, but it is possible.
> We can keep the pread requests in a FIFO queue, and send each
> chunk in the right order. The Go aio_copy example does this:
> https://gitlab.com/nbdkit/libnbd/-/blob/master/golang/examples/aio_copy/aio_copy.go#L106
> 
> This can be done in nbdcopy by adding http-ops. The pipeline in
> rhv-upload case will be:
> 
>     nbdkit (any input) -> nbdcopy (nbd-ops/http-ops) -> imageio server
> 
> Another way is to drop nbdcopy for rhv-upload - if we provide nbd
> input, we can use imageio client to read from the nbd input and write
> to imageio server. The pipeline will be:
> 
>     nbdkit (any input) -> imageio client -> imageio server
> 
> This means that the v2v output can optionally do the copy phase,
> if it knows how to read from the nbdkit input.
> 
> Yet another way is to rewrite imageio in Go, and use libnbd async API,
> which will eliminate most of the overhead of the http requests. I started
> this work with the imageio client side. You can check what we have so
> far here:
> https://github.com/oVirt/ovirt-imageio/tree/master/ovirt-img
> 
> For the long term, I think improving nbdcopy to support additional
> input/output backends is the right way and will make it more useful
> for other purposes.
> 
> For now I will post a better v2. It will be nice to have this in RHEL 9.0.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top