[Libguestfs] Dealing with ImageIO problems

Richard W.M. Jones rjones at redhat.com
Thu Nov 21 11:37:26 UTC 2019


[Adding the mailing list address back]

On Thu, Nov 21, 2019 at 12:44:38PM +0200, Nir Soffer wrote:
> On Thu, Nov 21, 2019 at 12:16 PM Richard W.M. Jones <rjones at redhat.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 06:00:52PM +0200, Nir Soffer wrote:
> > > 5. can_zero fails, not sure why (Maybe Richard knows more about this)
> > >
> > > nbdkit: python[1]: debug: python: can_zero
> > > nbdkit: python[1]: debug: sending error reply: Operation not supported
> > >
> > > rhv-upload-plugin always reports that it can zero, I cannot explain this.
> >
> > I believe this is a bug in nbdkit’s python bindings.  Will take a
> > closer look and reply in a new thread.  I don't believe it affects
> > anything here however.
> >
> > > 6. writing the first sector fails
> > >
> > > python[1]: nbdkit: debug: vddk[3]python: pwrite count=65536 offset=0 fua=0:
> > > ...
> > > nbdkit: python[1]: error:
> > > /var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py: pwrite: error:
> > > ['Traceback (most recent call last):\n', '  File
> > > "/var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py", line 189, in
> > > pwrite\n    http.endheaders()\n', '  File
> > > "/usr/lib64/python3.7/http/client.py", line 1247, in endheaders\n
> > > self._send_output(message_body, encode_chunked=encode_chunked)\n', '
> > > File "/usr/lib64/python3.7/http/client.py", line 1026, in
> > > _send_output\n    self.send(msg)\n', '  File
> > > "/usr/lib64/python3.7/http/client.py", line 966, in send\n
> > > self.connect()\n', '  File
> > > "/var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py", line 357, in
> > > connect\n    self.sock.connect(self.path)\n', 'ConnectionRefusedError:
> > > [Errno 111] Connection refused\n']
> > >
> > > But it fails in send(), and we don't handle errors proeply around
> > > send(), only around getresponse().
> > > So the error is forgotten by the plugin.
> >
> > That sounds bad?
> 
> Indeed, but we never had this issue in real environment.
>
> Basically we should handle any exception in the plugin as failure,
> but I'm not sure where we should fix it.
>
> We can add error handler around all the entry points
> (e.g. pwrite()), setting h['failed'] = True. But this makes it
> harder to write correct plugins; everyone has to ensure correct
> error handling instead of getting it right by default.

In nbdkit-python-plugin, any exception which escapes will be caught in
the C code and turned into an NBD error on the wire.  The function
which does this is:

  https://github.com/libguestfs/nbdkit/blob/a52ebb4e8d18a73b9ae009cd768bfab18505f59a/plugins/python/python.c#L189

Since each plugin method corresponds* (* = mostly) to a single NBD
command on the wire, returning an error from the method signals an
error back to the NBD client.  If the NBD client is ‘qemu-img convert’
then it will exit the first time it sees an error.

(The retry filter http://libguestfs.org/nbdkit-retry-filter.1.html
changes this behaviour, but we are not using that for output to
rhv-upload.)

> We can handle this in the python plugin (e.g. h['failed'] = True)
> but the handle is controlled by the plugin, so this does not sound
> like a good way.

The close() method is always called, whether or not there was an
error.  The same connection handle (‘h’) is passed back to the close
method.  It's quite safe and normal to store state in this handle.

nbdkit itself doesn't save the fact that there was an error during the
connection (it can't since it doesn't know if the error was
recoverable or not).  But rhv-upload-plugin needs to handle failed vs
successful transfers differently.  Thus we save the failure state in
the handle the first time we see an error.

> Maybe we can change close() to:
> 
>     def close(failed=True):
>         if failed:
>             # cleanup after some call failed
>             ...
> 
>         # Cleanup after successful run
> 
> The python plugin will remember unhandled errors and will call:
> 
>     close(failed=True)
> 
> To write correct plugin author must not handle fatal errors in the plugin.

This changes how the Python plugin would work versus other plugins,
and the C code can't know if the NBD connection failed, because some
NBD errors are recoverable.  Also the client does not send us any
error indication when it disconnects.

The answer is probably to wrap every method in a big try...except
which sets the failed flag in the handle.

> How do we handle errors in other plugins?

It's kind of easier to study this using the sh plugin.  Here's an
example showing the order in which callbacks happen on error:

----------------------------------------------------------------------
nbdkit sh -fv - --run 'qemu-img convert $nbd /tmp/out' <<'EOF'
case "$1" in
get_size) echo 512 ;;
pread) echo 'EIO error' >&2; exit 1 ;;
*) exit 2 ;;
esac
EOF
----------------------------------------------------------------------

Running this gives the following output (filtered a bit to show the
key points):

  $ bash test.sh 
  nbdkit: debug: sh: load
  nbdkit: debug: sh: load: tmpdir: /tmp/nbdkitshB6KCs0
  nbdkit: debug: sh: config key=script, value=-
  nbdkit: debug: sh: config_complete
  nbdkit: sh[1]: debug: sh: open readonly=0
  nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh open false ""
  nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh open: status 2
  nbdkit: sh[1]: debug: sh: open returned handle 0x7face4002070
  nbdkit: sh[1]: debug: sh: get_size
  nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh get_size ""
  nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh get_size: status 0
  nbdkit: sh[1]: debug: sh: can_write
  nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh can_write ""
  nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh can_write: status 2
...
  nbdkit: sh[1]: debug: sh: pread count=512 offset=0
  nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh pread "" 512 0
  nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh pread: status 1
  nbdkit: sh[1]: error: /tmp/nbdkitshB6KCs0/inline-script.sh: error
  nbdkit: sh[1]: debug: sending error reply: Input/output error
  nbdkit: sh[1]: debug: client sent NBD_CMD_DISC, closing connection
  qemu-img: Could not open 'nbd:localhost:10809': Could not read image for determining its format: Input/output error
  nbdkit: sh[1]: debug: sh: close
  nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh close ""
  nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh close: status 2
  nbdkit: debug: sh: unload plugin
  nbdkit: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh unload
  nbdkit: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh unload: status 2

Notice in particular how close() and unload() are both called in the
plugin even though qemu-img convert has already exited with an error.

Also the same handle (empty string above) is passed back to close().

> We probably should continue to the mailing list.
> 
> Nir

Thanks, Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v




More information about the Libguestfs mailing list