[Libguestfs] Performance regression in -o rhv-upload questions

Richard W.M. Jones rjones at redhat.com
Tue Feb 8 14:48:55 UTC 2022


On Tue, Feb 08, 2022 at 04:44:33PM +0200, Nir Soffer wrote:
> On Tue, Feb 8, 2022 at 3:27 PM Richard W.M. Jones <rjones at redhat.com> wrote:
> > virt-v2v 1.44.2:
> > Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.44.2-rhv-upload.log
> >
> >   [  63.1] Copying disk 1/1
> >   ...
> >   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finalizing_success
> >   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finished_success
> >   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a finalized in 6.105 seconds
> >   [  89.2] Creating output metadata
> >   [  89.8] Finishing off
> >
> > virt-v2v 1.45.97:
> > Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.45.97-rhv-upload.log
> >
> >   [  71.0] Copying disk 1/1
> >   [  82.7] Creating output metadata
> >   ...
> >   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finalizing_success
> >   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finished_success
> >   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc finalized in 61.552 seconds
> >   [ 144.9] Finishing off
>
> This happens because virt-v2v try to finalize the transfer *before*
> closing the connections to imageio server.
>
> Current imageio release mark the a ticket as canceled, but will not
> remove it if the ticket has open connections from clients. If the
> clients are idle, the connection closes after 60 seconds, so when
> engine tries again to remove the ticket, the operation succeeds.

Ah ha!  I should be able to fix this easily enough ...  Let me try a
few things and then I'll come back to the rest of your message.

Rich.

> Upstream version improved this flow to wait only for ongoing requests. If there
> are no ongoing requests the ticket is removed immediately, ignoring the open
> connections. The connections will be closed when the client closes the
> connection
> on when reading from the socket times out.
> 
> You can test upstream version from here:
> https://github.com/oVirt/ovirt-imageio/actions/runs/1811058094
> 
> But if you want to be compatible with released imageio version, you
> must close the connections to imageio *before* finalizing the transfers.
> 
> We already discussed this issue last year, when we handled the bug
> when vddk blocks for many minutes during block status and imageio
> connection is droped, and you opened a bug for this.
> 
> > It's not a completely like-for-like comparison because the rhv-upload
> > backend changed a lot between these versions.  In particular if I was
> > going to pick some suspicious change it would be my refactoring here
> > which was supposed to be neutral but maybe isn't:
> >
> >   https://github.com/libguestfs/virt-v2v/commit/143a22860216b94d3a81706193088d50c03fc35c
> >
> > Unfortunately this commit is almost impossible to revert because of
> > deep restructuring of surrounding code.
> 
> The refactoring is not the issue, the issue is that we do not terminate
> the output nbdkit before finalizing.
> 
> For some output we cannot do this since we want to query nbdkit for
> more info (e.g block status) but for rhv upload we don't have anything
> to do with nbkdit instance and we must close the connection before
> we finalize, so we should close the output right after we finish the copy.
> 
> > Another idea:
> >
> > Old virt-v2v uses qemu-img convert which does not flush by default.
> > New virt-v2v uses nbdcopy with the --flush option, so it will call
> > imageio PATCH /path ... "op":"flush" at the end.  However removing
> > nbdcopy --flush didn't help very much (only a couple of seconds off).
> 
> Calling flush is right, we cannot remove it. Although there is bug in nbdkit,
> and if flushes during close() flow even if you did not send a flush.
> 
> > Do you have any other ideas?
> >
> > What exactly does imageio do during the finalizing_success state?
> 
> The flow is this:
> - engine send request to vdsm to delete the ticket
> - vdsm connects to imageio control socket and send DELETE /tickets/{ticket-id}
> - imageio mark the ticket as canceled, so no new request can succeed
>   and no new connection to attach to the ticket
> - imageio mark all ongoing requests as canceled, so if the request is in
>   read/write loop, it will abort on the the next read/write complete
> - upstream: imageio waits until all ongoing operations complete
> - release: imageio waits until all connections are closed
> - if the ticket could not be removed in 60 seconds, imageio returns 409 Conflict
> - imageio returns 200 OK
> - engine retries the delete if it failed
> - when engine succeeds and finish other finalization tasks, the image
> transfer will
>   be reported as finished.
> 
> I would expect the flow to time out on virt-v2v side, since we use 60 seconds
> timeout, and when removing a ticket times out, it takes 60 seconds, so you
> should not succeed to finalize in 61 seconds.  There is a bug in ovirt
> engine that cause this flow to succeed.
> 
> Nir

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org




More information about the Libguestfs mailing list