[Libguestfs] [PATCH libnbd 2/2] api: Implement local command with systemd socket activation.
Eric Blake
eblake at redhat.com
Thu Sep 26 20:56:27 UTC 2019
On 9/26/19 11:40 AM, Richard W.M. Jones wrote:
> This adds new APIs for running a local NBD server and connecting to it
> using systemd socket activation (instead of stdin/stdout).
>
> This includes interop tests against nbdkit and qemu-nbd which I
> believe are the only NBD servers supporting socket activation. (If we
> find others then we can add more interop tests in future.)
>
> The upstream spec for systemd socket activation is here:
> http://0pointer.de/blog/projects/socket-activation.html
> ---
> +++ b/generator/states-connect.c
> @@ -37,6 +37,9 @@
> #include <sys/socket.h>
> #include <sys/un.h>
>
> +/* This is baked into the systemd socket activation API. */
> +#define FIRST_SOCKET_ACTIVATION_FD 3
> +
> /* Disable Nagle's algorithm on the socket, but don't fail. */
> static void
> disable_nagle (int sock)
> @@ -292,4 +295,110 @@ disable_nagle (int sock)
> SET_NEXT_STATE (%^MAGIC.START);
> return 0;
>
> + CONNECT_SA.START:
> + /* This is in some ways similar to nbd_aio_connect_command above,
> + * but we must pass a listening socket to the child process and
> + * we must set LISTEN_PID and LISTEN_FDS environment variables.
> + */
> + size_t len;
> + int s;
> + struct sockaddr_un addr;
> + pid_t pid;
> + int flags;
> + char listen_pid[16];
> +
> + assert (!h->sock);
> + assert (h->argv);
> + assert (h->argv[0]);
> +
> + /* Use /tmp instead of TMPDIR because we must ensure the path is
> + * short enough to store in the sockaddr_un. On some platforms this
> + * may cause problems so we may need to revisit it. XXX
> + */
Is the use of socketpair() any better than creating a socket under /tmp?
> + h->sa_tmpdir = strdup ("/tmp/libnbdXXXXXX");
> + h->sa_sockpath = strdup ("/tmp/libnbdXXXXXX/sock");
> + if (h->sa_tmpdir == NULL || h->sa_sockpath == NULL) {
> + SET_NEXT_STATE (%.DEAD);
> + set_error (errno, "strdup");
> + return 0;
> + }
> +
> + if (mkdtemp (h->sa_tmpdir) == NULL) {
> + SET_NEXT_STATE (%.DEAD);
> + set_error (errno, "mkdtemp");
> + return 0;
> + }
> + len = strlen (h->sa_tmpdir);
> + memcpy (h->sa_sockpath, h->sa_tmpdir, len);
> +
> + s = socket (AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0);
> + if (s == -1) {
> + SET_NEXT_STATE (%.DEAD);
> + set_error (errno, "socket");
> + return 0;
> + }
If we fail here or later, do/should we try to clean up the
/tmp/libnbdXXX directory created earlier?
/me reads ahead - nbd_close tries to address it
Still, if we fail at this point, h->sa_sockpath is set but not yet
created [1]
> +
> + addr.sun_family = AF_UNIX;
> + memcpy (addr.sun_path, h->sa_sockpath, strlen (h->sa_sockpath) + 1);
> + if (bind (s, (struct sockaddr *) &addr, sizeof addr) == -1) {
> + SET_NEXT_STATE (%.DEAD);
> + set_error (errno, "bind: %s", h->sa_sockpath);
> + return 0;
> + }
> +
> + if (listen (s, 1) == -1) {
> + SET_NEXT_STATE (%.DEAD);
> + set_error (errno, "listen");
> + return 0;
> + }
> +
> + pid = fork ();
> + if (pid == -1) {
> + SET_NEXT_STATE (%.DEAD);
> + set_error (errno, "fork");
> + close (s);
> + return 0;
> + }
> + if (pid == 0) { /* child - run command */
> + if (s != FIRST_SOCKET_ACTIVATION_FD) {
> + dup2 (s, FIRST_SOCKET_ACTIVATION_FD);
No check for errors? But that's probably okay in this instance (we know
that we won't fail with EBADF, for example).
> + close (s);
> + }
> + else {
> + /* We must unset CLOEXEC on the fd. (dup2 above does this
> + * implicitly because CLOEXEC is set on the fd, not on the
> + * socket).
> + */
> + flags = fcntl (s, F_GETFD, 0);
> + if (flags == -1) {
> + perror ("fcntl: F_GETFD");
perror is not async-signal-safe. Calling it in a child of a
potentially-multi-threaded parent is therefore prone to deadlock, if
perror() triggers a request to grab any resource that was left locked by
a different thread holding the lock but stranded by the fork.
> + _exit (EXIT_FAILURE);
> + }
> + if (fcntl (s, F_SETFD, flags & ~FD_CLOEXEC) == -1) {
> + perror ("fcntl: F_SETFD");
> + _exit (EXIT_FAILURE);
> + }
About all we can _safely_ do is let our _exit() status inform the parent
process that something failed, and let the parent process print the
error message. But even passing errno as the _exit() value is not
portable (GNU Hurd errno values are intentionally larger than what fit
in 8-bit returns that the parent would see). It's also possible to set
up a cloexec pipe (in addition to everything else) so that the child can
write() error details into the pipe, but that's a lot of effort compared
to just blindly stating that the child failed but declaring full details
of why as lost.
> + }
> +
> + snprintf (listen_pid, sizeof listen_pid, "%ld", (long) getpid ());
> + setenv ("LISTEN_PID", listen_pid, 1);
> + setenv ("LISTEN_FDS", "1", 1);
snprintf() and setenv() are also not async-signal-safe. Which is a
bummer, since you really don't know the child pid until in the child.
You could open-code the translation of a decimal number into a buffer
without snprintf, but you also have to figure out how to safely put it
into the environ seen by the child. How does systemd do it?
We _could_ create a helper executable, where we fork() into the child
process, exec() the helper to overwrite the child pid with something
that is no longer constrained by the limits of async-safety, so that the
helper app can then use snprintf/setenv at will before re-execing again
with the real target executable. The next question is how worried are
we about actually being hit by an non-async-signal-safe hang. What you
have works 99.9% of the time, but if we are trying to be a library that
is usable by any application, we have to decide if that is good enough.
> +
> + /* Restore SIGPIPE back to SIG_DFL. */
> + signal (SIGPIPE, SIG_DFL);
> +
> + execvp (h->argv[0], h->argv);
> + perror (h->argv[0]);
and another unsafe perror.
> + _exit (EXIT_FAILURE);
Is EXIT_FAILURE the best here, or should we consider exiting with status
126/127 to match what other programs (sh, env, nice, nohup, ...) do when
execvp() fails?
> + }
> +
> + /* Parent. */
> + close (s);
> + h->pid = pid;
> +
> + h->connaddrlen = sizeof addr;
> + memcpy (&h->connaddr, &addr, h->connaddrlen);
> + SET_NEXT_STATE (%^CONNECT.START);
> + return 0;
> +
> } /* END STATE MACHINE */
> +++ b/interop/socket-activation.c
> +#define SIZE 1048576
> +
> +int
> +main (int argc, char *argv[])
> +{
> + struct nbd_handle *nbd;
> + char tmpfile[] = "/tmp/nbdXXXXXX";
> + int fd;
> + int64_t actual_size;
> + char buf[512];
> + int r = -1;
> +
> + /* Create a large sparse temporary file. */
> + fd = mkstemp (tmpfile);
> + if (fd == -1 ||
> + ftruncate (fd, SIZE) == -1 ||
> + close (fd) == -1) {
> + perror (tmpfile);
> + goto out;
> + }
Is it worth populating anything in the temp file...
> +
> + nbd = nbd_create ();
> + if (nbd == NULL) {
> + fprintf (stderr, "%s\n", nbd_get_error ());
> + goto out;
> + }
> +
> + char *args[] = { SERVER, SERVER_PARAMS, NULL };
> + if (nbd_connect_socket_activation (nbd, args) == -1) {
> + fprintf (stderr, "%s\n", nbd_get_error ());
> + goto out;
> + }
> +
> + actual_size = nbd_get_size (nbd);
> + if (actual_size == -1) {
> + fprintf (stderr, "%s\n", nbd_get_error ());
> + goto out;
> + }
> + if (actual_size != SIZE) {
> + fprintf (stderr, "%s: actual size %" PRIi64 " <> expected size %d",
> + argv[0], actual_size, SIZE);
> + goto out;
> + }
> +
> + if (nbd_pread (nbd, buf, sizeof buf, 0, 0) == -1) {
> + fprintf (stderr, "%s\n", nbd_get_error ());
> + goto out;
> + }
...and checking that we read it back out? But even a successful read
without checking contents is pretty good evidence that the socket
activation worked.
> +
> + if (nbd_shutdown (nbd, 0) == -1) {
> + fprintf (stderr, "%s\n", nbd_get_error ());
> + goto out;
> + }
> +
> + nbd_close (nbd);
> +
> + r = 0;
> + out:
> + unlink (tmpfile);
> +
> + exit (r == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
> +}
> diff --git a/lib/connect.c b/lib/connect.c
> index f98bcdb..c1cbef7 100644
> --- a/lib/connect.c
> +++ b/lib/handle.c
> @@ -129,6 +129,16 @@ nbd_close (struct nbd_handle *h)
> free_cmd_list (h->cmds_in_flight);
> free_cmd_list (h->cmds_done);
> nbd_internal_free_string_list (h->argv);
> + if (h->sa_sockpath) {
> + if (h->pid > 0)
> + kill (h->pid, SIGTERM);
Are we sure that SIGTERM Is always going to be sufficient? Or do we
need a tiered approach where we try SIGTERM, but followup with SIGKILL
if too much time elapses? I guess it is the same situation for
nbd_connect_command, does that mean we should have a common helper
function for doing appropriate cleanup?
Do we document that nbd_close() may block?
> + unlink (h->sa_sockpath);
[1] we can end up calling unlink() on a path that does not exist. But
since we aren't checking error status here, we are effectively ignoring
the ENOENT in that case.
> + free (h->sa_sockpath);
> + }
> + if (h->sa_tmpdir) {
> + rmdir (h->sa_tmpdir);
> + free (h->sa_tmpdir);
> + }
> free (h->unixsocket);
> free (h->hostname);
> free (h->port);
> diff --git a/lib/internal.h b/lib/internal.h
> index bdb0e83..bf8b9aa 100644
> --- a/lib/internal.h
> +++ b/lib/internal.h
> @@ -188,6 +188,12 @@ struct nbd_handle {
> char **argv;
> pid_t pid;
>
> + /* When using systemd socket activation, this directory and socket
> + * must be deleted, and the pid above must be killed.
> + */
> + char *sa_tmpdir;
> + char *sa_sockpath;
> +
> /* When connecting to Unix domain socket. */
> char *unixsocket;
>
>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
More information about the Libguestfs
mailing list