[Libguestfs] [PATCH v3] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)

Nir Soffer nsoffer at redhat.com
Thu Aug 8 23:58:18 UTC 2019


On Thu, Aug 8, 2019 at 5:05 PM Richard W.M. Jones <rjones at redhat.com> wrote:

> On Mon, Mar 18, 2019 at 06:51:26PM +0200, Daniel Erez wrote:
> > After invoking transfer_service.finalize, check operation
> > status by examining DiskStatus.
> > This is done instead of failing after a predefined timeout
> > regardless the status.
> >
> > * not verified *
> >
> > Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1680361
> > ---
> >  v2v/rhv-upload-plugin.py | 19 +++++++++++++------
> >  1 file changed, 13 insertions(+), 6 deletions(-)
> >
> > diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py
> > index 2a950c5ed..2cefe94b0 100644
> > --- a/v2v/rhv-upload-plugin.py
> > +++ b/v2v/rhv-upload-plugin.py
> > @@ -523,16 +523,23 @@ def close(h):
> >          # waiting for the transfer object to cease to exist, which
> >          # falls through to the exception case and then we can
> >          # continue.
> > -        endt = time.time() + timeout
> > +        disk_id = disk.id
> > +        start = time.time()
> >          try:
> >              while True:
> >                  time.sleep(1)
> > -                tmp = transfer_service.get()
> > -                if time.time() > endt:
> > -                    raise RuntimeError("timed out waiting for transfer "
> > -                                       "to finalize")
> > +                disk_service = h['disk_service']
> > +                disk = disk_service.get()
> > +                if disk.status == types.DiskStatus.LOCKED:
> > +                    if time.time() > start + timeout:
> > +                        raise RuntimeError("timed out waiting for
> transfer "
> > +                                           "to finalize")
> > +                    continue
> > +                if disk.status == types.DiskStatus.OK:
> > +                    debug("finalized after %s seconds", time.time() -
> start)
>

Maybe this was the real error? (using , instead of %)


> > +                    break
> >          except sdk.NotFoundError:
> > -            pass
> > +            raise RuntimeError("transfer failed: disk %s not found" %
> disk_id)
> >
> >          # Write the disk ID file.  Only do this on successful
> completion.
> >          with builtins.open(params['diskid_file'], 'w') as fp:
>
> Ilanit tested this patch for us.  Unfortunately it fails for reasons I
> don't really understand but seem to be deep inside the RHV API:
>
> nbdkit: python[1]: error: /var/tmp/v2v.sZm0my/rhv-upload-plugin.py: close:
> error: ['Traceback (most recent call last):
> ', '  File "/var/tmp/v2v.sZm0my/rhv-upload-plugin.py", line 577, in close
> ', '  File "/var/tmp/v2v.sZm0my/rhv-upload-plugin.py", line 516, in
> delete_disk_on_failure
>

This looks like failure flow - but there is no info here on the real
failure. We need to understand
what was that failure.


> ', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py",
> line 37334, in remove
>     self._internal_remove(headers, query, wait)
> ', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line
> 271, in _internal_remove
>     return future.wait() if wait else future
> ', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line
> 55, in wait
>     return self._code(response)
> ', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line
> 268, in callback
>     self._check_fault(response)
> ', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line
> 132, in _check_fault
>     self._raise_error(response, body)
> ', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line
> 118, in _raise_error
>     raise error
> ', 'Error: Fault reason is "Operation Failed". Fault detail is "[Cannot
> remove Virtual Disk. Related operation is currently in progress. Please try
> again later.]". HTTP response code is 409.
> ']
>

I think this error means that the disk is currently busy, probably because
of the real error.

Ilanit, can you share complete virt-v2v and daemon logs?

There is a surprising behavior in the image transfer API - you create a
disk, and start
a transfer session with that disk. However once you started a the transfer,
it owns
the disk, and it will delete it in case of failures, so you don't need to
delete it yourself.

I guess you need something like

    h["owns_disk"] = True

After creating the disk, and once the transfer was started, set it to False.
In your cleanup flow, delete the disk only if you still own it.

The other issue here is that the error during close is hiding the real
error. This is
a common problem in python 2, and the solution is typically to catch all
exceptions
in the cleanup flow and log them, and make sure the original error is
raised and
logged by the top error handler.

Nir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190809/02002f1f/attachment.htm>


More information about the Libguestfs mailing list