[Libguestfs] Thoughts on nbdkit automatic reconnection

Martin Kletzander mkletzan at redhat.com
Thu Sep 19 06:14:05 UTC 2019


On Wed, Sep 18, 2019 at 01:59:01PM +0100, Richard W.M. Jones wrote:
>We have a running problem with the nbdkit VDDK plugin where the VDDK
>side apparently disconnects or the network connection is interrupted.
>During a virt-v2v conversion this causes the entire operation to fail,
>and since v2v conversions take many hours that's not a happy outcome.
>
>(Aside: I should say that we see many cases where it's claimed that
>the connection was dropped, but often when we examine them in detail
>the cause is something else.  But it seems like this disconnection
>thing does happen sometimes.)
>
>To put this isn't concrete terms which don't involve v2v, let's say
>you were doing something like:
>
>  nbdkit ssh host=remote /var/tmp/test.iso \
>    --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
>
>which copies a file over ssh to local.  If /var/tmp/test.iso is very
>large and/or the connection is very slow, and the network connection
>is interrupted then the whole operation fails.  If nbdkit could
>retry/reconnect on failure then the operation might succeed.
>
>There are lots of parameters associated with retrying, eg:
>
> - how often should you retry before giving up?
>
> - how long should you wait between retries?
>
> - which errors should cause a retry, which are a hard failure?
>
>So I had an idea we could implement this as a generic "retry" filter,
>like:
>
>  nbdkit ssh ... --filter=retry retries=5 retry-delay=5 retry-exponential=yes
>
>This cannot be implemented with the current design of filters because
>a filter would have to call the plugin .close and .open methods, but
>filters don't have access to those from regular data functions, and in
>any case this would cause a new plugin handle to be allocated.
>
>We could probably do it if we added a special .reopen method to
>plugins.  We could either require plugins which support the concept of
>retrying to implement this, or we could have a generic implementation
>in server/backend.c which would call .close, .open and cope with the
>new handle.
>
>Another way to do this would be to modify each plugin to add the
>feature.  nbdkit-nbd-plugin has this for a very limited case, but no
>others do, and it's quite complex to implement in plugins.  As far as
>I can see it involves checking the return value of any data call that
>the plugin makes and performing the reconnection logic, while not
>changing the handle (so just calling self->close, self->open isn't
>going to work).
>
>If anyone has any thoughts about this I'd be happy to hear them.
>

Maybe this is a special case for which it would be easier to implement this in
nbdkit itself rather than a filter or plugin.  But to be honest when I read the
first paragraph my first two immediate thoughts were:

 1) this needs to be configurable and

 2) having this as a filter would be really cool.

I don't know about how much the implementation needs to change etc.  But if
plugins expose proper error outputs then the "to retry or not to retry" question
can be also configurable based on the errno?  Just food for thoughts.

Have a nice day,
Martin

>Rich.
>
>-- 
>Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
>Read my programming and virtualization blog: http://rwmj.wordpress.com
>virt-builder quickly builds VMs from scratch
>http://libguestfs.org/virt-builder.1.html
>
>_______________________________________________
>Libguestfs mailing list
>Libguestfs at redhat.com
>https://www.redhat.com/mailman/listinfo/libguestfs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190919/21be2e63/attachment.sig>


More information about the Libguestfs mailing list