[Libguestfs] [PATCH libnbd 2/2] api: Implement local command with systemd socket activation.

Fri Sep 27 12:28:10 UTC 2019

On Thu, Sep 26, 2019 at 03:56:27PM -0500, Eric Blake wrote:
> >+  /* Use /tmp instead of TMPDIR because we must ensure the path is
> >+   * short enough to store in the sockaddr_un.  On some platforms this
> >+   * may cause problems so we may need to revisit it.  XXX
> >+   */
> 
> Is the use of socketpair() any better than creating a socket under /tmp?

AIUI socketpair() can't be used to create a listening socket.  Systemd
socket activation has two modes -- one where you pass an already
accepted socket, and one where you pass a listening socket and the
server accepts connections until it is killed.  In nbdkit and qemu-nbd
we are using the latter one.

...
> If we fail here or later, do/should we try to clean up the
> /tmp/libnbdXXX directory created earlier?
> 
> /me reads ahead - nbd_close tries to address it
> 
> Still, if we fail at this point, h->sa_sockpath is set but not yet
> created [1]

I modified the code so that it doesn't try to delete literal
/tmp/libnbdXXXXXX on some error paths and should be more robust.

> >+      close (s);
> >+    }
> >+    else {
> >+      /* We must unset CLOEXEC on the fd.  (dup2 above does this
> >+       * implicitly because CLOEXEC is set on the fd, not on the
> >+       * socket).
> >+       */
> >+      flags = fcntl (s, F_GETFD, 0);
> >+      if (flags == -1) {
> >+        perror ("fcntl: F_GETFD");
> 
> perror is not async-signal-safe.  Calling it in a child of a
> potentially-multi-threaded parent is therefore prone to deadlock, if
> perror() triggers a request to grab any resource that was left
> locked by a different thread holding the lock but stranded by the
> fork.

I suspect these sort of problems are unsolvable.  I've studied enough
customer bug reports where we've had to reverse engineer exactly how
something failed given only a slim error log to know that error
messages are always useful and I wouldn't want to remove these.
However because of the chance of deadlock, how about a deadlock-free
function that write(2)'s the message + errno to stderr?

> >+        _exit (EXIT_FAILURE);
> >+      }
> >+      if (fcntl (s, F_SETFD, flags & ~FD_CLOEXEC) == -1) {
> >+        perror ("fcntl: F_SETFD");
> >+        _exit (EXIT_FAILURE);
> >+      }
> 
> About all we can _safely_ do is let our _exit() status inform the
> parent process that something failed, and let the parent process
> print the error message.  But even passing errno as the _exit()
> value is not portable (GNU Hurd errno values are intentionally
> larger than what fit in 8-bit returns that the parent would see).
> It's also possible to set up a cloexec pipe (in addition to
> everything else) so that the child can write() error details into
> the pipe, but that's a lot of effort compared to just blindly
> stating that the child failed but declaring full details of why as
> lost.
> 
> >+    }
> >+
> >+    snprintf (listen_pid, sizeof listen_pid, "%ld", (long) getpid ());
> >+    setenv ("LISTEN_PID", listen_pid, 1);
> >+    setenv ("LISTEN_FDS", "1", 1);

And we could use similar code here.

> snprintf() and setenv() are also not async-signal-safe.  Which is a
> bummer, since you really don't know the child pid until in the
> child. You could open-code the translation of a decimal number into
> a buffer without snprintf, but you also have to figure out how to
> safely put it into the environ seen by the child.  How does systemd
> do it?

For setenv(), systemd does this by building a char **environ which is
passed to execve.  The code is complicated to say the least:

https://github.com/systemd/systemd/blob/5ac1530eca652cd16819853fe06e76e156f5cf5e/src/core/dbus-execute.c

> >+    _exit (EXIT_FAILURE);
> 
> Is EXIT_FAILURE the best here, or should we consider exiting with
> status 126/127 to match what other programs (sh, env, nice, nohup,
> ...) do when execvp() fails?

Yes (this is also a bug elsewhere).  Not sure if 126 or 127 is
right:

https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html

...
> Do we document that nbd_close() may block?

We should do.  I've added it in a separate patch.

I'll have a rethink about this patch and come back with something
better in a while.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW