[Libguestfs] [PATCH libnbd 4/4] lib: Atomically update h->state when leaving the locked region.

Wed Jun 5 15:23:57 UTC 2019

On 6/5/19 6:15 AM, Richard W.M. Jones wrote:
> Split h->state into:
> 
>  - h->state = the state on entry to the locked region
> 
>  - h->next_state = the current state and what the "publicly visible"
>    state will become when we leave the locked region

The rest of this thread discusses potential other names, such as
h->public_state for the state visible to unlocked code and h->state for
internal while still locked; I trust that once you pick a naming scheme
you like, you can redo this patch accordingly. I'll go ahead and review
the changes as spelled here to at least see if I can spot any problems.

> 
> Some calls to get_state become calls to get_next_state depending on
> which of these they are trying to read.  Calls to set_state become
> set_next_state because that is what gets updated.
> 
> When we leave the locked region we update h->state.
> 
> The purpose of this patch is to make it easier to reason about the
> state in lockless code.

Lockless code (including nbd_aio_get_direction) can no longer see
intermediate states that did not block, which is a lot nicer for
deciding whether we will accidentally let the user's poll() loop block
forever if it didn't see that we were blocked on POLLIN or POLLOUT.

There's still always a TOCTTOU race (anywhere the user reads the public
state while another thread holds the lock, their future actions based on
the observed state may be unexpected by the time they actually grab the
lock because the other thread changed state), as well as the issue I
mentioned in on 3/4 about any consecutive user calls to nbd_aio_is_*
having a race where the two calls can be done on different public states.

I'm still thinking that for any functions that grab the lock but want to
do gating on state MUST save the public state check until after grabbing
the lock, not beforehand, to avoid the TOCTTOU race.

> ---
>  generator/generator | 23 +++++++++++++----------
>  lib/connect.c       | 10 +++++-----
>  lib/disconnect.c    |  8 ++++----
>  lib/handle.c        |  2 +-
>  lib/internal.h      | 15 +++++++++++++--
>  lib/rw.c            |  4 ++--
>  6 files changed, 38 insertions(+), 24 deletions(-)
> 

> @@ -2462,7 +2462,7 @@ let generate_lib_states_c () =
>    pr "  }\n";
>    pr "\n";
>    pr "  set_error (EINVAL, \"external event %%d is invalid in state %%s\",\n";
> -  pr "             ev, nbd_internal_state_short_string (get_state (h)));\n";
> +  pr "             ev, nbd_internal_state_short_string (get_next_state (h)));\n";

I'm also wondering if we should treat external events more like
interrupts, where calling nbd_aio_notify_read() in an incorrect state
succeeds by setting a flag that is not cleared until we later reach a
state that can actually care about the flag, rather than by rejecting
the call out-right for being in the wrong state.  (That is, states where
we are NOT expecting a notify_read() are the same as setting an IRQ mask
that we don't want to be interrupted by a a read notification yet, but
we don't want to lose the interrupt when we later clear the mask)

In fact, CmdIssue is somewhat already like this - the code determines if
the state machine is unmasked (in the READY state) to kick it off
immediately; or is masked (in a processing state) to queue things but
leave the interrupt set (the next time we enter READY, we unmask,
observe the queue is non-empty, and so fire off the next command).

Another thought - if we ever want to allow the user the ability to send
custom NBD_OPT_ commands during handshake, or even to emulate 'qemu-nbd
--list' where the client can make queries to see what the server
supports before finally settling on whether to run NBD_OPT_GO or
NBD_OPT_ABORT, we'll need to add an external event after nbd_aio_connect
but before nbd_aio_is_connected for doing those additional handshake
steps. It's easier to think about adding a mandatory nbd_aio_go() called
in between nbd_aio_connect*() and the first nbd_aio_pread now, before we
have to start worrying about back-compat issues to all existing AIO clients.

> @@ -2866,8 +2866,11 @@ let generate_lib_api_c () =

As said above, I think you're still missing a change at the beginning of
the wrapper that grabs the lock to do the state check after the lock.

>      let argnames = List.flatten (List.map name_of_arg args) in
>      List.iter (pr ", %s") argnames;
>      pr ");\n";
> -    if is_locked then
> -      pr "  pthread_mutex_unlock (&h->lock);\n";
> +    if is_locked then (
> +      pr "  if (h->state != h->next_state)\n";
> +      pr "    h->state = h->next_state;\n";
> +      pr "  pthread_mutex_unlock (&h->lock);\n"
> +    );
>      pr "  return ret;\n";
>      pr "}\n";
>      pr "\n";
> diff --git a/lib/connect.c b/lib/connect.c
> index b889f80..4e3141f 100644
> --- a/lib/connect.c

> +++ b/lib/internal.h
> @@ -80,7 +80,17 @@ struct nbd_handle {
>    /* Linked list of close callbacks. */
>    struct close_callback *close_callbacks;
>  
> -  _Atomic enum state state;     /* State machine. */
> +  /* State machine.
> +   *
> +   * The actual current state is ‘next_state’.  ‘state’ is updated
> +   * before we release the lock.
> +   *
> +   * Note don't access these fields directly, use the SET_NEXT_STATE
> +   * macro in generator/states* code, or the set_next_state,
> +   * get_next_state and get_state macros in regular code.
> +   */
> +  _Atomic enum state state;
> +  enum state next_state;
>  

Otherwise the patch looks like you caught the right places.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190605/2c50b844/attachment.sig>