[Libguestfs] [PATCH libnbd v2 1/6] api: Synchronous connect waits til all connections are connected.

Eric Blake eblake at redhat.com
Thu May 23 02:21:30 UTC 2019


On 5/22/19 4:50 AM, Richard W.M. Jones wrote:
> If not using multi-conn then obviously the synchronous connection
> calls ‘nbd_connect_unix’, ‘nbd_connect_tcp’ and ‘nbd_connect_command’
> should only return when the (one) connection object is connected.
> 
> In the multi-conn case it's not very clear what these synchronous
> calls should do.  Previously I had it so that they would return as
> soon as at least one connection was connected.  However this is a
> problem if you are using them as a convenient way to set up a
> multi-threaded main loop, because it can be that some of them have not
> finished connecting, but then you issue commands on those connections
> and that will fail.  The failure is pernicious because most of the
> time you won't see it, only if one connection is slow.  So it's
> (probably) better that the synchronous ‘nbd_connect_unix’ and
> ‘nbd_connect_tcp’ should connect every connection object before
> returning.
> 
> For ‘nbd_connect_command’, it essentially ignored multi-conn anyway,
> and I changed it so that it waits for conn[0] to get connected and
> returns, the other connections (if they exist) being ignored.  It
> should probably be an error for the user to enable multi-conn on the
> handle and then use nbd_connect_command, but I did not make that
> change yet.

Ouch - I just realized that we created a different problem. If the
server does not add multi-conn because it does NOT permit parallel
connections (such as 'nbdkit --filter=noparallel memory 1m
serialize=connections'), then a client requesting
nbd_set_multi_conn(nbd, 2) will now hang because we will never be able
to move all our connections through handshake. While we can still try to
fire off multiple connections at once, we need to add some logic in
nbd_connect_unix() and nbd_connect_tcp() to check can_multi_conn after
the FIRST connection completes (whichever one wins), and if
!can_multi_conn(), it would be nice if we could automatically kill off
the losers, swap the winning connection into conns[0], and then behave
as if we implicitly called set_multi_conn(nbd, 1), so that the
synchronous connect command will succeed after all. In such a case, the
user can learn after the fact via nbd_can_multi_conn and
nbd_get_multi_conn that things were changed because multi_conn was not
supported after all.

Or maybe we want both 'set_multi_conn' (connect hard-fails if the FIRST
connection reports !can_multi_conn) vs. 'request_multi_conn' (connect
downgrades gracefully), similar to how request_meta_context downgrades
gracefully if the server lacks SR or a particular meta context.

On IRC we were chatting about how multi-conn > 1 makes no sense with
connect_command() (where we are limited to exactly one channel over
stdin/stdout), so it may also be worth adding some logic to have
connect_command fail if set_multi_conn is > 1, and to likewise have
set_multi_conn(2) fail if the existing connection is already started in
command mode (do we even track which mode a commands started in, and/or
insist that all connections used the same nbd_connect_*? Or can you
mix-and-match a TCP and a Unix socket into a multi-conn struct nbd_handle?)

On the other hand, a user that is aware of multi-conn being optional may
manually try to make one connection, check can_multi_conn, and THEN call
set_multi_conn(2), with some expectation of being able to then connect
the remaining slots (again, if we stored the nbd_connect_* used by the
FIRST connection, then all the remaining connections requested could
automatically be fired off to start connecting to the same server - and
you'd need a synchronous way to wait for them all to reach stable
state).  Perhaps that's even a reason to have two separate API - one
that is callable only while you are in created state (a request of what
to try if the server supports it), and the other callable only after
your single connection has completed handshake (to then expand to
actually get the additional connections now that it is known to be
safe).  At any rate, I don't think we're done with the design in this area.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190522/c7ae7b18/attachment.sig>


More information about the Libguestfs mailing list