[Libguestfs] [libnbd PATCH 4/4] copy: Ignore extents if we encountered earlier error

Wed Feb 2 08:42:46 UTC 2022

On 02/01/22 20:44, Eric Blake wrote:
> In practice, the nbd extents callback will only be called once per
> nbd_aio_block_status, because we have only requested exactly one
> metacontext id, and because compliant servers send only one reply
> chunk per id.  In theory, a server could reply first with an error
> chunk (perhaps meaning "I'm unable to get status beyond offset X")
> followed by an extent chunk ("but here's what I was able to get")
> while still being compliant.  And a non-compliant server can send
> unexpected chunks, just to try and confuse us.  Right now, it would
> take a custom NBD server to actually trigger any scenario where *error
> would ever be non-zero on entry to next_extent() (nbdkit does not have
> that much power).  So while *error ever being non-zero is unlikely,
> our easiest course is to handle it the same way we would for a server
> sending us an unexpected id: ignore the rest of the extent callback.
> The later completion callback will handle the overall command failing.

This commit message mostly answers my question under the cover letter,
and indeed it's quite the opposite of what I expected. I'd have expected
the command (the exchange with the server) completing, upon which time
the completion callback would have run; then the results (a series of
extent callbacks) would be passed to the extent callback.

So, that's precisely *not* the case in reality. :/

> ---
>  copy/nbd-ops.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/copy/nbd-ops.c b/copy/nbd-ops.c
> index dfc62f20..ab30badd 100644
> --- a/copy/nbd-ops.c
> +++ b/copy/nbd-ops.c
> @@ -1,5 +1,5 @@
>  /* NBD client library in userspace.
> - * Copyright (C) 2020 Red Hat Inc.
> + * Copyright (C) 2020-2022 Red Hat Inc.
>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public
> @@ -337,7 +337,7 @@ add_extent (void *vp, const char *metacontext,
>    extent_list *ret = vp;
>    size_t i;
> 
> -  if (strcmp (metacontext, "base:allocation") != 0)
> +  if (strcmp (metacontext, "base:allocation") != 0 || *error)
>      return 0;
> 
>    for (i = 0; i < nr_entries; i += 2) {
> 

So, I don't understand this. First, here we have a non-aio (i.e.,
synchronous) nbd_block_status call. in case add_extent (the extents
callback) is entered with *error !=0, that means (per docs) "the error
parameter contains the errno value of any previously detected error".

In case we responded by returning -1, then the following (per docs)
would apply:

"If the callback returns -1, and no earlier error has been detected,
then the overall block status command will fail with any non-zero value
stored into the callback's error parameter"

But we don't do that, we say "everything is OK, I'll just ignore this",
like we ignore metadata contexts that we don't need.

So how is the "later completion callback" going to fail the entire
nbd_block_status() call? First, because this is not an AIO
nbd_block_status() call, we don't provide the completion callback
ourselves -- I guess it is some default function somewhere in libnbd.

Or is it the following: upon getting (*error != 0) on entry, we just
decline looking at the (likely nonsensical) "entries" array, and permit
the error we got via *error to "survive" through all further
add_extent() calls, and to finally fail the outermost nbd_block_status()
call?

Next question: even if that's the case, should we *first* check *error,
and only check "metacontext" if *error is zero? What says that
"metacontext" is defined if *error is nonzero?

Thanks,
Laszlo