[Libguestfs] [libnbd PATCH v2 02/12] api: Fix nbd_can_meta_context if NBD_OPT_SET_META_CONTEXT fails

Laszlo Ersek lersek at redhat.com
Thu Sep 1 09:04:09 UTC 2022


On 08/31/22 16:39, Eric Blake wrote:
> If a server replies to NBD_OPT_SET_META_CONTEXT first with
> NBD_REP_META_CONTEXT, then with NBD_REP_ERR_*, we have already
> modified h->meta_contexts, but fail to clean it up before moving on to
> OPT_GO in the OPT_META_CONTEXT.CHECK_REPLY's default: handler (this
> corner-case bug has been present since we got meta context support
> working back in commits a97e2779 and 1b560b62).  Per the NBD spec,
> because the overall SET_META_CONTEXT did not succeed, we should
> therefore not attempt NBD_CMD_BLOCK_STATUS; but because
> h->meta_contexts was left non-empty, we mistakenly report
> nbd_can_meta_context(first_string) as true, and allow
> nbd_block_status() to proceed; this forms a protocol violation not
> pre-filtered by our usual litany of nbd_set_strict_mode() checks, and
> therefore we can trigger unspecified server behavior.
> 
> Rather than open-code a cleanup loop on all error paths, I instead fix
> the problem by adding a meta_valid witness that is only set to true in
> select places.  The obvious place is when handling NBD_REP_ACK; but it
> also makes sense if we didn't call NBD_OPT_SET_META_CONTEXT at all: if
> negotiating structured replies failed (including oldstyle servers), or
> if the user never called nbd_add_meta_context() and therefore doesn't
> care (blindly returning 0 to nbd_can_meta_context() in those cases is
> fine; our existing tests/oldstyle and tests/newstyle-limited cover
> that).  However, whereas we previously always returned a 0/1 answer
> for nbd_can_meta_context once we are in transmission phase, this new
> witness now means that in the corner case of explicit server failure
> to NBD_OPT_SET_META_CONTEXT, nbd_can_meta_context() returns -1 to call
> attention to the earlier server failure (although it is something that
> is unlikely enough in practice that I doubt anyone will experience it
> in the wild).
> 
> The change in nbd_can_meta_context() to use h->meta_valid instead of
> h->eflags has an additional latent semantic difference not observable
> at this time (because both h->meta_valid and h->eflags are set in
> tandem by successful nbd_opt_go()/nbd_opt_info() in their current
> two-command sequence), but which matters to future patches adding new
> APIs.  On the one hand, once we add nbd_set_request_meta_context() to
> stop nbd_opt_go() from performing NBD_OPT_SET_META_CONTEXT, we want
> nbd_can_meta_context() to fail during transmission phase if the
> context was not set elsewhere during negotiation, rather than blindly
> returning 0.  On the other hand, when manually calling
> nbd_opt_set_meta_context() during negotiation, we do not want it to
> touch h->eflags, but do want nbd_can_meta_context() to start working
> right away.
> 
> Note that the use of a witness flag also allows me to slightly change
> when the list of previously-negotiated contexts is freed - instead of
> doing it in nbd_internal_reset_size_and_flags(), that function now
> simply marks any existing vector as invalid; and the actual wipe is
> done when starting a new NBD_OPT_SET_META_CONTEXT or when closing
> struct nbd_handle.  There are still some oddities to be cleaned up in
> later patches by moving when the flag is marked invalid (for example,
> we really want nbd_set_export_name() to wipe both flags and meta
> contexts, but the act of NBD_OPT_GO should not wipe contexts, and the
> act of NBD_OPT_SET_META_CONTEXT should not wipe flags), but I'm
> leaving those further tweaks for later patches to make this one easier
> to review.
> 
> Testing this is surprisingly tricky; compliant servers don't do this
> (hence I don't really have anything automatic to add to libnbd's
> testsuite).  However, I came up with the following temporary hack to
> nbdkit to demonstrate the problem:
> 
> | diff --git i/server/protocol-handshake-newstyle.c w/server/protocol-handshake-newstyle.c
> | index fedee48f..b3f34c98 100644
> | --- i/server/protocol-handshake-newstyle.c
> | +++ w/server/protocol-handshake-newstyle.c
> | @@ -884,7 +884,7 @@ negotiate_handshake_newstyle_options (void)
> |              opt_index += querylen;
> |              nr_queries--;
> |            }
> | -          if (send_newstyle_option_reply (option, NBD_REP_ACK) == -1)
> | +          if (send_newstyle_option_reply (option, NBD_REP_ERR_POLICY) == -1)
> |              return -1;
> |          }
> |          debug ("newstyle negotiation: %s: reply complete", optname);
> | diff --git i/server/protocol.c w/server/protocol.c
> | index 2ac77055..97235bda 100644
> | --- i/server/protocol.c
> | +++ w/server/protocol.c
> | @@ -191,7 +191,7 @@ validate_request (uint16_t cmd, uint16_t flags, uint64_t offset, uint32_t count,
> |
> |    /* Block status allowed? */
> |    if (cmd == NBD_CMD_BLOCK_STATUS) {
> | -    if (!conn->structured_replies) {
> | +    if (!conn->structured_replies || 1) {
> |        nbdkit_error ("invalid request: "
> |                      "%s: structured replies was not negotiated",
> |                      name_of_nbd_cmd (cmd));
> 
> Coupled with this command line:
> 
> $ ./run /path/to/nbdkit -U- memory 1 --run 'nbdsh --base -u "$uri" -c "
> try:
>  print(h.can_meta_context(nbd.CONTEXT_BASE_ALLOCATION))
> except nbd.Error as ex:
>  print(ex.string)
> def f(c, o, e, er):
>  pass
> try:
>  print(h.block_status(1, 0, f))
> except nbd.Error as ex:
>  print(ex.string)
> "'
> 
> Pre-patch, we see can_meta_context succeed, and that we let
> NBD_CMD_BLOCK_STATUS leak through causing an nbdkit error message:
> 
> True
> nbdkit: memory.0: error: invalid request: NBD_CMD_BLOCK_STATUS: structured replies was not negotiated
> nbd_block_status: block-status: command failed: Invalid argument
> 
> Post-patch, can_meta_context diagnoses the problem, and block_status
> is blocked client-side with no messages needed from nbdkit:
> 
> nbd_can_meta_context: need a successful server meta context request first: Invalid argument
> nbd_block_status: did not negotiate any metadata contexts, either you did not call nbd_add_meta_context before connecting or the server does not support it: Operation not supported
> 
> Fixes: a97e2779 ("states: Negotiate base:allocation with the server.", v0.1)
> ---
>  lib/internal.h                               |  1 +
>  generator/API.ml                             |  4 +++-
>  generator/states-newstyle-opt-meta-context.c |  9 +++++++--
>  generator/states-reply-structured.c          |  1 +
>  lib/flags.c                                  | 14 ++++++++------
>  lib/handle.c                                 |  5 +++++
>  lib/rw.c                                     |  2 +-
>  7 files changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/internal.h b/lib/internal.h
> index d601d5d..8aaff15 100644
> --- a/lib/internal.h
> +++ b/lib/internal.h
> @@ -180,6 +180,7 @@ struct nbd_handle {
>    bool structured_replies;      /* If we negotiated NBD_OPT_STRUCTURED_REPLY */
> 
>    /* Vector of negotiated metadata contexts. */
> +  bool meta_valid;
>    meta_vector meta_contexts;
> 
>    /* The socket or a wrapper if using GnuTLS. */
> diff --git a/generator/API.ml b/generator/API.ml
> index 3e948aa..62e2d54 100644
> --- a/generator/API.ml
> +++ b/generator/API.ml
> @@ -1785,7 +1785,9 @@   "can_meta_context", {
>      longdesc = "\
>  Returns true if the server supports the given meta context
>  (see L<nbd_add_meta_context(3)>).  Returns false if
> -the server does not.
> +the server does not.  It is possible for this command to fail if
> +meta contexts were requested but there is a missing or failed
> +attempt at NBD_OPT_SET_META_CONTEXT during option negotiation.
> 
>  The single parameter is the name of the metadata context,
>  for example C<LIBNBD_CONTEXT_BASE_ALLOCATION>.
> diff --git a/generator/states-newstyle-opt-meta-context.c b/generator/states-newstyle-opt-meta-context.c
> index 8f4dee2..a6a5271 100644
> --- a/generator/states-newstyle-opt-meta-context.c
> +++ b/generator/states-newstyle-opt-meta-context.c
> @@ -29,6 +29,9 @@ STATE_MACHINE {
>     */
>    assert (h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE);
>    nbd_internal_reset_size_and_flags (h);
> +  for (i = 0; i < h->meta_contexts.len; ++i)
> +    free (h->meta_contexts.ptr[i].name);
> +  meta_vector_reset (&h->meta_contexts);

This is code movement out of nbd_internal_reset_size_and_flags(), OK.

>    if (h->opt_current == NBD_OPT_LIST_META_CONTEXT) {
>      assert (h->opt_mode);
>      assert (h->structured_replies);
> @@ -44,7 +47,7 @@ STATE_MACHINE {
>      }
>    }
> 
> -  assert (h->meta_contexts.len == 0);
> +  assert (!h->meta_valid);

OK, this is the new predicate that nbd_internal_reset_size_and_flags()
now ensures.

> 
>    /* Calculate the length of the option request data. */
>    len = 4 /* exportname len */ + strlen (h->export_name) + 4 /* nr queries */;
> @@ -194,8 +197,10 @@ STATE_MACHINE {
>        CALL_CALLBACK (h->opt_cb.completion, &err);
>        nbd_internal_free_option (h);
>      }
> -    else
> +    else {
>        SET_NEXT_STATE (%^OPT_GO.START);
> +      h->meta_valid = true;
> +    }
>      break;
>    case NBD_REP_META_CONTEXT:  /* A context. */
>      if (len > maxpayload)

IIUC, this modifies the NBD_REP_ACK handling for
NBD_OPT_SET_META_CONTEXT. IIUC we've consumed all the mappings, so we
can now set the witness to "OK". Seems OK.

> diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
> index aaf75b8..f18c999 100644
> --- a/generator/states-reply-structured.c
> +++ b/generator/states-reply-structured.c
> @@ -445,6 +445,7 @@ STATE_MACHINE {
>      assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
>      assert (h->bs_entries);
>      assert (length >= 12);
> +    assert (h->meta_valid);
> 
>      /* Need to byte-swap the entries returned, but apart from that we
>       * don't validate them.

This seems to express that we may only have initiated a block status
request (for which we are now processing the replies --
REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES) in case we had a valid meta
context list first. Seems plausible. (A bit lower down in the context,
not quoted, we "Look up the context ID" in the meta context vector, so
yes, very sensible assertion.)

> diff --git a/lib/flags.c b/lib/flags.c
> index 87c20ee..91efc1a 100644
> --- a/lib/flags.c
> +++ b/lib/flags.c
> @@ -37,9 +37,7 @@ nbd_internal_reset_size_and_flags (struct nbd_handle *h)
> 
>    h->exportsize = 0;
>    h->eflags = 0;
> -  for (i = 0; i < h->meta_contexts.len; ++i)
> -    free (h->meta_contexts.ptr[i].name);
> -  meta_vector_reset (&h->meta_contexts);

Moved to NEWSTYLE.OPT_META_CONTEXT.START; OK.

(This is the only place we reset the meta context vector, pre-patch.)

> +  h->meta_valid = false;

Replacing the above logic, OK.

>    h->block_minimum = 0;
>    h->block_preferred = 0;
>    h->block_maximum = 0;
> @@ -75,6 +73,11 @@ nbd_internal_set_size_and_flags (struct nbd_handle *h,
>      eflags &= ~NBD_FLAG_SEND_FAST_ZERO;
>    }
> 
> +  if (!h->structured_replies || h->request_meta_contexts.len == 0) {
> +    assert (h->meta_contexts.len == 0);
> +    h->meta_valid = true;
> +  }
> +
>    h->exportsize = exportsize;
>    h->eflags = eflags;
>    return 0;

Hmmm. This gives me a run for my money...

I think the condition here describes the case when we're never going to
send an NBD_OPT_SET_META_CONTEXT -- meaning that we're never going to
reach the "meta_valid = true" assignment in the ACK handling for
NBD_OPT_SET_META_CONTEXT. Still we need to qualify our empty metacontext
vector as valid somewhere, so nbd_internal_set_size_and_flags() is being
proposed as the location for that.

The question is where nbd_internal_set_size_and_flags() is called
*from*. I find:

- NEWSTYLE.OPT_EXPORT_NAME.CHECK_REPLY
- NBD_REP_INFO | NBD_INFO_EXPORT
- OLDSTYLE.CHECK

We also have a comment on nbd_internal_set_size_and_flags() saying

/* Set the export size and eflags on the handle, validating them.
 * This is called from the state machine when either the newstyle or
 * oldstyle negotiation reaches the point that these are available.
 */

Aha! So, also in accordance with the commit message, I need to replace
"we're never going to reach the ACK for NBD_OPT_SET_META_CONTEXT" with
"we could never have reached the ACK for NBD_OPT_SET_META_CONTEXT".

I'm not sure if this is the *only place* (or the *best place*) to set
the witness [*], but it does look like a "good one".

[*] The commit message speaks about a "latent semantic difference" which
is currently masked by our managing meta_valid and eflags strictly in
tandem. Hopefully I'll understand that paragraph better when looking at
later patches in the series.

> @@ -194,9 +197,8 @@ nbd_unlocked_can_meta_context (struct nbd_handle *h, const char *name)
>  {
>    size_t i;
> 
> -  if (h->eflags == 0) {
> -    set_error (EINVAL, "server has not returned export flags, "
> -               "you need to connect to the server first");
> +  if (h->request_meta_contexts.len && !h->meta_valid) {
> +    set_error (EINVAL, "need a successful server meta context request first");
>      return -1;
>    }

This is not easy to understand at first, but I kind of convinced myself
with the following argument: the condition for successfully querying the
context list is

  h->request_meta_contexts.len == 0 || h->meta_valid

i.e., we never needed to ask the server, or the server responded (and we
parsed the response successfully). The *result* list may or may not be
empty at this point (it could be empty for two reasons: we never asked
for any contexts, or we did but the server supports none); either way,
the result list is valid for searching.

OK.

> 
> diff --git a/lib/handle.c b/lib/handle.c
> index 8713e18..03f45a4 100644
> --- a/lib/handle.c
> +++ b/lib/handle.c
> @@ -115,6 +115,8 @@ nbd_create (void)
>  void
>  nbd_close (struct nbd_handle *h)
>  {
> +  size_t i;
> +
>    nbd_internal_set_error_context ("nbd_close");
> 
>    if (h == NULL)
> @@ -127,6 +129,9 @@ nbd_close (struct nbd_handle *h)
> 
>    free (h->bs_entries);
>    nbd_internal_reset_size_and_flags (h);
> +  for (i = 0; i < h->meta_contexts.len; ++i)
> +    free (h->meta_contexts.ptr[i].name);
> +  meta_vector_reset (&h->meta_contexts);
>    nbd_internal_free_option (h);
>    free_cmd_list (h->cmds_to_issue);
>    free_cmd_list (h->cmds_in_flight);

OK.

> diff --git a/lib/rw.c b/lib/rw.c
> index 81f8f35..2ab85f3 100644
> --- a/lib/rw.c
> +++ b/lib/rw.c
> @@ -482,7 +482,7 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
>        return -1;
>      }
> 
> -    if (h->meta_contexts.len == 0) {
> +    if (!h->meta_valid || h->meta_contexts.len == 0) {
>        set_error (ENOTSUP, "did not negotiate any metadata contexts, "
>                   "either you did not call nbd_add_meta_context before "
>                   "connecting or the server does not support it");
> 

Conversely, here the condition for passing is

  h->meta_valid && h->meta_contexts.len > 0

meaning we asked the server, and it supports at least one of the
requested meta contexts.

I think the patch is correct, but whether it is *complete* as well is
something I can't easily evaluate. With that caveat:

Reviewed-by: Laszlo Ersek <lersek at redhat.com>


More information about the Libguestfs mailing list