[Libguestfs] [libnbd PATCH 1/2] api: Better use of nbd_internal_reset_size_and_flags

Eric Blake eblake at redhat.com
Thu Aug 25 15:05:18 UTC 2022


On Thu, Aug 25, 2022 at 12:13:42PM +0200, Laszlo Ersek wrote:

[I wrote my reply out-of-order in various spurts, apologies if I
repeat myself, or if I make an argument earlier in the mail that
doesn't make sense until you read more background later in the mail]

...
> 
> Thanks for writing this up; it's a lot to digest. I think I understand
> it now.
> 
> There's a whole lot of state here, which is something I don't really
> like. (I don't know the design background however.) FWIW I've found this
> stuff easier to understand by thinking about it in terms of
> requests/responses, and by considering the client-side state only a
> "lightweight state", solely existing for implementing the
> request/response patterns. In other words, I kind of consider the
> client-side state "ad hoc" (whereas the server-side state is
> heavy-weight state).

The design background does play a big role.  When we first started
libnbd, we did not have an easy way to write an API that would take a
string vector as a single argument that the generator could then
translate nicely into C, OCaml, and Python APIs (this was before we
later added golang translations).  So instead of having a single API:

nbd_connect_with_contexts(uri, vector_of_contexts)

it was thus easier given the limitation of our generator at the time
to have:

nbd_add_meta_context(a) [appends to h->request_meta_contexts]
nbd_add_meta_context(b) [appends to h->request_meta_contexts]
nbd_connect(uri) [implicitly uses h->request_meta_contexts from earlier adds]

And we definitely don't have a clean way for a single API to produce a
direct output of a string vector (easy in Python, but hard in C); our
approach there was initially using client-side state (build the list
internally to the nbd_handle, let the user query it after the fact:
nbd_connect_* populates h->meta_contexts followed by
nbd_can_meta_context() querying it), although it has also proven
viable to utilize a callback function (let the user build their own
vector as we pass in one string at a time: nbd_opt_list()
nbd_opt_list_meta_context() both use the same callback signature).

Also, when we first started, libnbd did everything in nbd_connect_*
from opening the socket to NBD_OPT_GO; you could only fine-tune the
handshake by what was set prior to nbd_connect_.  In libnbd 1.4, we
added nbd_set_opt_mode and nbd_opt_* to try and make it more
fine-tuned, giving more control to a power user over aspects of
handshaking (most obvious is nbdinfo that wants to run NBD_OPT_LIST to
learn the list of available exports, then NBD_OPT_INFO and
NBD_OPT_SET/LIST_META_CONTEXT on each of those exports in turn), while
still keeping the common case simple (if you already know your
connection name, nbd_add_meta_context("base:allocation") +
nbd_connect_uri(uri) going all the way to NBD_OPT_GO really is nicer
than forcing you to spell out all the individual steps).  So
backwards-compatibility is an issue - the older an API is, the more
steps it tends to do by default; we have been introducing new API to
break it into smaller independent chunks where that is useful, but
still have to keep the old complexity in place to not break existing
code bases.  So this is client-side state in the reverse direction -
now we have lots of APIs for each knob we have added over the years to
break large APIs into smaller tasks (see
nbd_set_request_structured_replies,
nbd_get_request_structured_replies, as well as my proposal to add
nbd_set_request_meta_context, ...), which have nothing to do with
server state but do impact how other nbd_* APIs will behave.

Over time, our generator has gotten better (we've added support for
more parameter types), so it may be worth adding API(s) that take the
context name list as explicit parameters rather than reusing the
implicit list registered one-at-a-time.  Where we currently have:

nbd_add_meta_context(a)
nbd_add_meta_context(b)
nbd_opt_list_meta_context(export, callback)

it may indeed be worth adding:

nbd_opt_list_meta_context_from_list(export, vector, callback)

that uses an explicit vector of queries rather than the implied vector
previously set by nbd_add_meta_context().  At the same time, our set
of nbd_connect_* calls has grown, so adding yet another variant to
each of those that takes a context vector may be easy in the
generator, but scale exponentially in the number of APIs added and
needing unit tests.

You are correct that h->request_meta_contexts is light-weight client
state (our hack to work around the lack of a single vector argument,
when passing a query to NBD_OPT_LIST_META_CONTEXT or
NBD_OPT_SET_META_CONTEXT), while h->meta_contexts is more heavy-weight
(both the client and the server must agree on the same name-to-id
mapping of which contexts were set by the last successful
NBD_OPT_SET_META_CONTEXT and not wiped out by other state-clearing
commands like NBD_OPT_STARTTLS or NBD_OPT_GO with a different export
name; because that name-to-id mapping is essential to later
NBD_CMD_BLOCK_STATUS).

Thus, we have several conflated issues.
- Can we reduce ad hoc client state?  Yes, if we extend the generator
and add new APIs that take the query vector as a direct argument
instead of the indirect h->request_meta_contexts, and/or provide a
callback argument to let the caller build up their own reply vector
instead of having to query an internal result vector after the fact
(so far, not something I have tried, but now you have me curious on
whether it is worth it).
- Can we get more fine-grained control over negotiation?  Yes,
nbd_opt_info() and nbd_opt_go() currently default to a sequence of 2
server commands (NBD_OPT_SET_META_CONTEXT, NBD_OPT_INFO/GO), but v2 of
this series will add nbd_set_request_meta_context() which cuts out the
first half, as well as nbd_opt_set_meta_context() which runs the first
half in isolation.
- Are we properly tracking when server state changes?  No - in some
cases we weren't clearing state even when the server does (teach
nbd_set_export_name() to clear state); in other cases we clear state
even though we don't have to (teach nbd_opt_list_meta_context() to not
clear state).

v2 is taking me more time to polish than I thought, because I've been
trying to break it down into even more individual fixes with
accompanying tests, as well as finish the API additions, but I'm
liking the direction your review has set me on.  In particular, I've
noticed that there are really two different groups of client states to
track after server interaction:

export information (nbd_get_size, nbd_is_read_only, nbd_can_trim, ...)
 - gated by h->eflags being non-zero, but tracked in multiple
   variables such as h->eflags, h->exportsize, h->block_minimum, ...
 - should be made valid by successful NBD_OPT_INFO or NBD_OPT_GO
 - should be wiped by:
   - failed NBD_OPT_INFO or NBD_OPT_GO
   - any NBD_OPT_STARTTLS (but I have to add nbd_opt_starttls to allow
     that much fine-grained control; on my list of things to add)
   - nbd_set_export_name() changing names (not directly caused by
     server state, but because of how many of our interfaces
     implicitly reuse h->export_name, and where the server would
     return different state for NBD_OPT_INFO on a different name;
     currently broken, v1 tried to fix it)
 - should persist over:
   - NBD_OPT_LIST_META_CONTEXT (this is stateless in the server, so
     nbd_opt_list_meta_context() should be stateless in the client;
     currently broken, v1 tried to fix it)
   - NBD_OPT_SET_META_CONTEXT (setting contexts does not change the
     export size, so my new nbd_opt_set_meta_context() should not wipe
     it)
context mappings (nbd_can_meta_context)
 - currently gated by h->eflags being non-zero, but tracked by
   h->meta_contexts (v2 will change the gating)
 - should be made valid (returns 0/1 for all names, according to
   h->meta_contexts lookup) by:
   - successful NBD_OPT_SET_META_CONTEXT
   - if nbd_set_request_meta_context() is true (default once the API
     is added), a successful nbd_opt_info() or nbd_opt_go() that was
     unable to set contexts (server was oldstyle, or structured
     replies are lacking, or client didn't register any names with
     nbd_add_meta_context; so h->meta_contexts is empty)
 - should be wiped (returns -1 for all names) by:
   - failed NBD_OPT_SET_META_CONTEXT (regardless if triggered by
     nbd_opt_go() or by my new nbd_opt_set_meta_context(), currently
     broken and unaddressed in v1, see also [1] below)
   - any NBD_OPT_STARTTLS
   - nbd_set_export_name() changing names (same story as export name
     change wiping export information: currently broken, v1 tried to
     fix it)
 - should persist over:
  - bare NBD_OPT_GO, NBD_OPT_LIST (untestable until adding
    nbd_set_request_meta_context(false) to teach nbd_opt_info() and
    nbd_opt_go() to skip their first half)
  - NBD_OPT_LIST_META_CONTEXT (currently broken, v1 tried to fix it)

> 
> The overlap between nbd_opt_list_meta_context() and nbd_opt_info() is
> especially confusing. For the following reasons:
> 
> - both functions perform the same listing at the base level (build an
> intersection between the client's desired meta context list and the
> server's supported meta context list)
> 
> - the intersection is consumed differently on the client (the first
> function reports results through a callback, the second one requires
> subsequent, repeated calls to nbd_can_meta_context()).

nbd_can_meta_context() is probing the name-to-id table returned by
NBD_OPT_SET_META_CONTEXT; but NBD_OPT_LIST_META_CONTEXT does not
provide a name-to-id table (the NBD spec says that LIST should use an
id of 0 for everything, partly because only the LIST form supports
wildcards in both client and server directions; note that a wildcard
has no valid id, but 0 can be a valid id under SET).

> 
> - the first function is stateless (considering only the server-side
> state "state" here), the second is stateful. While this is well
> reflected by the opcode names LIST vs. SET, those opcodes are hidden
> from the function names. In particular, the second function says "info"
> in the name, which I would not expect to change state on the server.

Maybe this will help. NBD_OPT_LIST_META_CONTEXT says "what context
names does the server support, where the server can disregard the
export name if that makes sense, and where both directions can use
wildcards to allow the server to advertise a family of export names
and the client to ask for more details within that family".  With
qemu-nbd as server, LIST_META_CONTEXT([], "") (the empty vector) might
currently cause the server to reply with ["base:allocation",
"qemu:dirty-bitmap:a", "qemu:dirty-bitmap:b"] (all the contexts it
specifically supports on the export ""), while
LIST_META_CONTEXT(["qemu:"], "") would only reply with
["qemu:dirty-bitmap:a", "qemu:dirty-bitmap:b"].  But it would also
have been possible for qemu-nbd to reply to the empty vector query
with ["base:allocation", "qemu:"] ("base:allocation" is supported on
all exports, regardless of name, but instead of spelling out
individual "qemu:..." contexts available for the specific context,
returning the wildcard "qemu:" means that the client must do a further
LIST_META_CONTEXT to learn about specific contexts for a given
export).

Conversely, NBD_OPT_SET_META_CONTEXT does not use wildcards, in either
the client or the server direction.  It is documented as strictly "the
client wants these contexts for this export; the server must supply a
context-to-id mapping for the ones it supports, and that mapping is
then valid until the next NBD_OPT_SET_META_CONTEXT, NBD_OPT_STARTTLS,
or use of NBD_OPT_GO with a different export name".

The nbd_opt_list() function is trying to learn as much information
about a specific export: what it its size and readonly status (from
NBD_OPT_LIST), and what contexts can it support (here,
NBD_OPT_SET_META_CONTEXT guarantees correct answers, but is rather
heavy-handed in setting server state; NBD_OPT_LIST_META_CONTEXT might
return wildcards instead of exact answers, but for nbd_opt_info(),
that's probably good enough).  Then again, since I wrote it without a
callback function, the only way to query which contexts is via
nbd_can_meta_context(), which requires the name-to-id mapping, which
we get from NBD_OPT_SET_META_CONTEXT.

Of course, once I add nbd_set_request_meta_context(), we can skip the
NBD_OPT_SET_META_CONTEXT done by nbd_opt_info(), and let the user
manually use nbd_opt_list_meta_context() for its own stateless
NBD_OPT_LIST_META_CONTEXT query of the supported contexts.

So it may also be worth a new API:

nbd_opt_info_with_contexts(export_name, vector_of_queries, callback)

to be used in place of

nbd_add_meta_context(context)
nbd_opt_info(export_name)
nbd_can_meta_context(context)

for less stateful pollution.

> 
> - NBD_OPT_INFO in particular looks like a useless opcode.

Useless when you already know what export you want to connect to.  But
useful during the 'nbdinfo' app when you want to learn as much as
possible about every export exposed by a single nbd server, while
still remaining in negotiation after each export thus queried.

> NBD_OPT_SET_META_CONTEXT handles the filtering (intersection) and
> mapping to numeric IDs. NBD_OPT_GO transitions the server to the "ready"
> state (completes negotiation). So those two look useful. I don't
> understand what the precise responsibility is that *only* NBD_OPT_INFO
> can cover. The manual at <https://libguestfs.org/nbd_opt_info.3.html>
> says that nbd_opt_info() enables export size querying (for example), but
> that kind of querying is just as possible without nbd_opt_info(); only
> nbd_set_opt_mode() is needed -- see the example code at
> <https://libguestfs.org/nbd_set_opt_mode.3.html>.

That example is only able to print what NBD_OPT_LIST returns (the
export name, and any description the server provides) - some servers
might provide the export size in the human-readable description, but
the only way to get a machine-readable readout of each size is to
further call NBD_OPT_INFO on each name returned by NBD_OPT_LIST.
nbd_set_opt_mode() is what lets it be possible to call nbd_opt_list(),
but it still requires nbd_opt_list() before you can use nbd_get_size()
while still remaining in negotiation mode.

> 
> > 
> > The promised upcoming patch will add:
> > 
> > nbd_set_request_meta_context(false) => Skip the
> > NBD_OPT_SET_META_CONTEXT portion in nbd_opt_info()/nbd_opt_go()
> 
> This makes sense for nbd_opt_go(), as it will then retain a single
> responsibility: transitioning the server to the ready state.
> 
> However, with SET_META_CONTEXT skipped, nbd_opt_go() appears to be
> reduced to sending the NBD_OPT_INFO opcode -- and that one looks
> entirely useless to me (see above).

The state machine for OPT_GO handles both NBD_OPT_INFO and NBD_OPT_GO
under the hood (the two have identical query and response layouts, the
only real difference is that the choice of opcode determines whether
the state machine transitions on to transmission phase or remains back
in negotiation phase).

Similarly, the state machine for OPT_META_CONTEXT handles both
NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT (same query and
response layouts).

> 
> > 
> > nbd_opt_set_meta_context() => Issue NBD_OPT_SET_META_CONTEXT to the
> > server using h->request_meta_contexts as its query list and
> > h->export_name, and collect the results into both the user's callback
> > and h->meta_contexts at the same time.  Stateful - wipes out any ids
> > returned by an earlier SET_META_CONTEXT, but the ids returned by the
> > server can survive a number of other NBD_OPT_ commands prior to
> > NBD_OPT_GO, provided that NBD_OPT_GO uses the same export name as
> > NBD_OPT_SET_META_CONTEXT.
> 
> Right, so nbd_opt_set_meta_context() is the "first half" extracted from
> the current behavior of nbd_opt_info()/nbd_opt_go(). (Again with my
> personal confusion that the "second half" of nbd_opt_info(), i.e., the
> sending of the NB_OPT_INFO opcode, looks useless.)

For nbd_opt_info(), the second half is NBD_OPT_INFO; for nbd_opt_go(),
the second half is NBD_OPT_GO, but both use the same state machine for
both halves.  NBD_OPT_INFO is useless if you want to do reads or
writes from the export, but useful if you want to merely learn
information about what multiple exports that the server is letting you
choose from.

> 
> > 
> > In that manner, it will become possible for manual option haggling to
> > set the meta contexts at a different point in the negotiation phase,
> > useful when trying to trigger various server corner cases.
> > 
> > The server wipes meta context state any time it gets
> > NBD_OPT_SET_META_CONTEXT or NBD_OPT_GO with a different export name
> > than before.  But since we don't pass in export name as a direct
> > parameter to either nbd_opt_go() or the upcoming
> > nbd_opt_set_meta_context(), we instead need to wipe the state when we
> > change h->export_name (that is, at nbd_set_export_name()).  This does
> > mean that if a client calls nbd_set_export_name(nbd, "a"),
> > nbd_set_meta_context(...), nbd_set_export_name(nbd, "b"),
> > nbd_set_export_name(nbd, "a"), nbd_opt_go(nbd), that the server will
> > not observe the name change, and will actually support the
> > previously-negotiated contexts, but libnbd will have wiped access to
> > those, and trigger client-side failure on attempts to call
> > nbd_block_status() on the grounds that it doesn't remember any
> > negotiated contexts.  But a client actually doing this is unlikely.
> 
> So this is the problem with stateful protocols; we need to remember and
> recognize the points at which the state goes stale and must be cleared.
> I wonder why the NBD protocol was not designed to take as many
> parameters as possible per-request (minimizing the state in both client
> and server), and only turn the kind of information into actual "state"
> that is required by the *bulk* operations (basically the block-level
> operations on the device, such as read, write, zero/trim, etc) where the
> parameter repetitions would have significant bandwidth and computation
> (formatting/parsing) cost. The cost of the negotiation phase looks
> trivial in comparison to the cost of the "ready state" work (IIRC the
> "ready state"); all the cached information does not seem to buy us much.

We have been trying to keep the NBD protocol to track as little state
as possible.  However, at the time we added NBD_CMD_BLOCK_STATUS, the
protocol did not have a way to indicate payload size on transmission
requests, so the solution in the protocol was to have a single
context-to-id stateful mapping set during negotiation by
NBD_OPT_SET_META_CONTEXT that is then used for all subsequent
NBD_CMD_BLOCK_STATUS; if you negotiate multiple contexts, the server
is then obligated to give you an answer for each of those contexts on
every block status request, even if your reason for calling block
status was to only find out information on just one of those contexts
(then again, many clients only negotiate a single context of
"base:allocation", so for those clients, it's not as onerous as it
sounds).

Part of my (ongoing) work at adding 64-bit extensions to the NBD
protocol is extending the transmission phase to actually advertise
payload lengths from the client request, at which point it DOES become
possible to have a variant of NBD_CMD_BLOCK_STATUS where the client
can ask for just one context on a given block status request, even
though both the client and server are aware of more contexts being
available.  But whether that filtering should be in the form of
passing in a string name (that the server must parse on every
NBD_CMD_BLOCK_STATUS which includes a filter payload), or whether the
server should rely on its name-to-id mapping set up once during
NBD_OPT_SET_META_CONTEXT, is still something we could debate when I
post my 64-bit extension patches.

> 
> > 
> >>
> >>> it would become actively dangerous: negotiating meta
> >>> contexts for one export, then switching to a different export name
> >>> before nbd_opt_go, can lead us into calling nbd_cmd_block_status() when
> >>> the server is not expecting it.
> >>
> >> (3) What is nbd_cmd_block_status()? I can only find nbd_block_status().
> >> Do you mean the NBD extent callback that is called once by metacontext?
> > 
> > Typo on my part.  The API nbd_block_status() issues
> > NBD_CMD_BLOCK_STATUS to the server.
> > 
> >>
> >> (4) The word "server" seems like a typo; libnbd is used by NBD clients.
> > 
> > Here, it is intended.  The NBD spec says a client should not issue
> > NBD_OPT_BLOCK_STATUS to the server unless it successfully negotiated
> > NBD_OPT_SET_META_CONTEXT with the same export name later passed to
> > NBD_OPT_GO, without intervening commands that might wipe that state.
> > The client code in nbd_block_status() tries to enforce that it won't
> > confuse the server by checking whether h->meta_contexts is non-NULL
> > before it will issue NBD_CMD_BLOCK_STATUS.  But once we allow manual
> > control over setting meta contexts, rather than tightly coupling it
> > with NBD_OPT_GO, we need to make sure we wipe h->meta_contexts in at
> > least all places the server wipes state (even if we also happen to
> > wipe it in more places, as in my "a"/set/"b"/"a"/go example).
> 
> OK. The wording of your commit message ("actively dangerous") confused
> me, as I didn't expect the client to be able to break the server (for
> any definition of "break"). However, if the spec uses the "SHOULD NOT"
> term, then preventing that sequence of operations in the library makes
> sense.

[1] As part of my v2 patch, I have written a 2-line nbdkit hack patch
that makes nbdkit behave as a server that exposes yet another
pre-existing bug in libnbd: if NBD_OPT_SET_META_CONTEXT starts
replying with a valid name-to-id mapping (NBD_REP_META_CONTEXT), then
fails the overall command for an unrelated reason (NBD_REP_ERR...)
rather than concluding with NBD_REP_ACK; the NBD spec says that in
this case, the client should assume no name-to-id mapping is valid and
NBD_CMD_BLOCK_STATUS should not be attempted; but current libnbd
forgets to clear h->meta_contexts state.  As mentioned in my musings
above about export state being different from context state, it is
nice that nbd_can_meta_context() normally returns 0 after nbd_opt_go()
for all unknown names, even for a server that lacked
NBD_OPT_SET_META_CONTEXT; but I think it makes sense to have
nbd_can_meta_context() actively return -1 for all names even after
nbd_opt_go() if NBD_OPT_SET_META_CONTEXT failed for any reason other
than NBD_REP_ERR_UNSUP (this is a rare server corner case that does
not crop up in normal practice - but is worth diagnosing if it does,
as with my 2-line hacked nbdkit).  It also means I'm also leaning
towards:

nbd_set_opt_mode(true)
nbd_set_request_meta_context(false)
nbd_connect_...()
nbd_opt_go()
nbd_can_meta_context(anything) => -1

where we explicitly indicate that the user asked to take control over
context negotiation, but then failed to call
nbd_opt_set_meta_context(), and therefore nbd_can_meta_context()
should be an error rather than successfully claiming false.

> 
> > 
> >>
> >>>
> >>> While fixing this, code things to check whether we are actually
> >>> swapping export names; an idempotent nbd_set_export_name() is not
> >>> observable by the server, and therefore does not need to invalidate
> >>> what has already been negotiated.
> >>>
> >>
> >> OK.
> >>
> >>> Conversely, calling nbd_opt_list_meta_context() does not need to wipe
> >>> the list - it does not touch h->meta_contexts, but relies on the
> >>> user's callback instead.
> >>
> >> (5) I kind of wish this were a separate patch, but I see the point of
> >> keeping it unified, too.
> > 
> > Maybe I can separate it; the test would be calling nbd_opt_info() to
> > set h->meta_contexts, nbd_can_meta_context() to verify that
> > "base:allocation" is supported by the server,
> > nbd_opt_list_meta_context() to do a query, then nbd_can_meta_context()
> > again to see whether "base:allocation" is still remembered or was
> > accidentally wiped.
> 
> .... Again, anything around nbd_opt_info() keeps confusing me.
> 
> What we actually need here is the NBD_OPT_SET_META_CONTEXT. Currently,
> we can do that in two ways, with nbd_opt_info() and nbd_opt_go(). The
> latter is no good, as it kicks us out of the negotiation phase. The
> former I *dislike*, because it does something *in addition* to
> NBD_OPT_SET_META_CONTEXT that I don't understand -- namely the
> NBD_OPT_INFO opcode.
> 
> Therefore, the new function nbd_opt_set_meta_context() would be *just
> right* for this step, as it exposes NBD_OPT_SET_META_CONTEXT (and the
> population of h->meta_contexts) in isilation. Perhaps introduce the new
> function before updating the test case?

I had originally planned nbd_set_request_meta_context() and
nbd_opt_set_meta_context() to be added in the same patch, but in what
I've written above, I now have sufficient test-case proof that I can
separate those API additions into separate patches for easier review
of incremental improvements.

> 
> > 
> >>
> >> (6) I don't understand the "but". I see no conflict (or even: relevance)
> >> with the user's callback. Listing the meta contexts does not touch
> >> "h->meta_contexts", so we should not clear that list, period. I don't
> >> understand what specifically we rely on the user's callback *for*, for
> >> this particular discussion. If it's important, please elaborate;
> >> otherwise I'm just getting confused! :)
> > 
> > Fair enough.  Basically, nbd_opt_list_meta_context() hands the
> > server's answers to the user's callback instead of by modifications to
> > h->meta_contexts, so it does not impact future nbd_can_meta_context().
> > I'll see if I can reword that a bit.
> 
> Upon re-reading the commit message, after your explanation, I think the
> "but" is fine, but benefits from a bit of expanding:
> 
>     Conversely, calling nbd_opt_list_meta_context() does not need to
>     wipe the list - it does not touch h->meta_contexts, but relies on
>     the user's callback instead, *for collecting the list of those
>     metacontexts that the server acknowledges*.
> 
> How does it sound?

If that wording survives anywhere into my (larger) v2 series, I like
your improved wording and will utilize it.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


More information about the Libguestfs mailing list