[Libguestfs] [PATCH libnbd v2 1/6] api: Synchronous connect waits til all connections are connected.
Richard W.M. Jones
rjones at redhat.com
Thu May 23 08:05:29 UTC 2019
On Wed, May 22, 2019 at 09:21:30PM -0500, Eric Blake wrote:
> On 5/22/19 4:50 AM, Richard W.M. Jones wrote:
> > If not using multi-conn then obviously the synchronous connection
> > calls ‘nbd_connect_unix’, ‘nbd_connect_tcp’ and ‘nbd_connect_command’
> > should only return when the (one) connection object is connected.
> >
> > In the multi-conn case it's not very clear what these synchronous
> > calls should do. Previously I had it so that they would return as
> > soon as at least one connection was connected. However this is a
> > problem if you are using them as a convenient way to set up a
> > multi-threaded main loop, because it can be that some of them have not
> > finished connecting, but then you issue commands on those connections
> > and that will fail. The failure is pernicious because most of the
> > time you won't see it, only if one connection is slow. So it's
> > (probably) better that the synchronous ‘nbd_connect_unix’ and
> > ‘nbd_connect_tcp’ should connect every connection object before
> > returning.
> >
> > For ‘nbd_connect_command’, it essentially ignored multi-conn anyway,
> > and I changed it so that it waits for conn[0] to get connected and
> > returns, the other connections (if they exist) being ignored. It
> > should probably be an error for the user to enable multi-conn on the
> > handle and then use nbd_connect_command, but I did not make that
> > change yet.
>
> Ouch - I just realized that we created a different problem. If the
> server does not add multi-conn because it does NOT permit parallel
> connections (such as 'nbdkit --filter=noparallel memory 1m
> serialize=connections'), then a client requesting
> nbd_set_multi_conn(nbd, 2) will now hang because we will never be able
> to move all our connections through handshake. While we can still try to
> fire off multiple connections at once, we need to add some logic in
> nbd_connect_unix() and nbd_connect_tcp() to check can_multi_conn after
> the FIRST connection completes (whichever one wins), and if
> !can_multi_conn(), it would be nice if we could automatically kill off
> the losers, swap the winning connection into conns[0], and then behave
> as if we implicitly called set_multi_conn(nbd, 1), so that the
> synchronous connect command will succeed after all. In such a case, the
> user can learn after the fact via nbd_can_multi_conn and
> nbd_get_multi_conn that things were changed because multi_conn was not
> supported after all.
It's somewhat unavoidable given the way that the NBD protocol itself
works. While it's a pain, below is the logic that I came up with in
the forthcoming nbdtool. nr_multi_conn is a command line parameter to
select multi_conn, with the desired behaviour being to fall back to
single threaded if multi-conn is not supported. Also this code will
become shorter and simpler when we get URL support:
static void
connect_to (struct nbd_handle *nbd, char *arg)
{
char *p = strchr (arg, ':');
int r;
if (p) {
*p = '\0';
r = nbd_connect_tcp (nbd, arg, p+1);
}
else
r = nbd_connect_unix (nbd, arg);
if (r == -1) NBD_ERROR_EXIT;
/* Check if the connected handle supports multi-conn, and enable or
* disable it.
*/
if (nr_multi_conn == 1)
return;
if (nbd_can_multi_conn (nbd) == 1) {
if (nbd_set_multi_conn (nbd, nr_multi_conn) == -1)
NBD_ERROR_EXIT;
/* Now make the other connections on the handle. */
if (p)
r = nbd_connect_tcp (nbd, arg, p+1);
else
r = nbd_connect_unix (nbd, arg);
if (r == -1) NBD_ERROR_EXIT;
}
else
nr_multi_conn = 1;
}
(In fact it was this code which provokes the error which required
this fix:
https://www.redhat.com/archives/libguestfs/2019-May/msg00134.html )
However that still leaves this problem:
> Or maybe we want both 'set_multi_conn' (connect hard-fails if the FIRST
> connection reports !can_multi_conn) vs. 'request_multi_conn' (connect
> downgrades gracefully), similar to how request_meta_context downgrades
> gracefully if the server lacks SR or a particular meta context.
IOW:
* What if the user sets multi-conn > 1, does NOT check can_multi_conn
after connecting, and the server allows multiple connections but
does not give the true multi-conn guarantees?
That results in a data corruption issue. So I think that connections
should fail if we get to the point where we have read the eflags,
multi-conn is not set, but h->multi_conn > 1. I've put this on my
to-do list.
Requesting multi-conn optionally is harder: We could adjust
h->multi_conn == 1, swap the winning connection object into
h->conns[0] and close the others, but I think that's very tricksy.
> On IRC we were chatting about how multi-conn > 1 makes no sense with
> connect_command() (where we are limited to exactly one channel over
> stdin/stdout), so it may also be worth adding some logic to have
> connect_command fail if set_multi_conn is > 1,
Also added to my to-do list.
> and to likewise have
> set_multi_conn(2) fail if the existing connection is already started in
> command mode (do we even track which mode a commands started in, and/or
> insist that all connections used the same nbd_connect_*? Or can you
> mix-and-match a TCP and a Unix socket into a multi-conn struct nbd_handle?)
We don't track any of this, and yes it's possible.
> On the other hand, a user that is aware of multi-conn being optional may
> manually try to make one connection, check can_multi_conn, and THEN call
> set_multi_conn(2), with some expectation of being able to then connect
> the remaining slots (again, if we stored the nbd_connect_* used by the
> FIRST connection, then all the remaining connections requested could
> automatically be fired off to start connecting to the same server - and
> you'd need a synchronous way to wait for them all to reach stable
> state). Perhaps that's even a reason to have two separate API - one
> that is callable only while you are in created state (a request of what
> to try if the server supports it), and the other callable only after
> your single connection has completed handshake (to then expand to
> actually get the additional connections now that it is known to be
> safe). At any rate, I don't think we're done with the design in this area.
That's what the code does above.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
More information about the Libguestfs
mailing list