[Libguestfs] [libnbd PATCH v4 1/3] lib/utils: introduce xwritel() as a more robust and convenient write()

Eric Blake eblake at redhat.com
Wed Mar 15 14:01:40 UTC 2023


On Wed, Mar 15, 2023 at 12:01:55PM +0100, Laszlo Ersek wrote:
> While nbd_internal_fork_safe_perror() must indeed call write(), and
> arguably justifiedly ignores the return value of write(), we can still
> make the write operations slightly more robust and convenient. Let's do
> that by introducing xwritel():
> 
> - Let the caller pass a list of NUL-terminated strings, via stdarg /
>   ellipsis notation in the parameter list.
> 
> - Handle partial writes.
> 
> - Cope with EINTR and EAGAIN errors. (A busy loop around EAGAIN on a
>   non-blocking file is not great in the general case, but it's good enough
>   for writing out diagnostics before giving up.)
> 
> - In the common case, handle an nbd_internal_fork_safe_perror() call with
>   a single xwritel() -> writev() call chain, rather than with four
>   separate write() calls. In practice, this tends to make the error
>   message written to a regular file contiguous, even if other threads are
>   writing to the same file. Multiple separate write() calls tend to
>   interleave chunks of data from different threads.
> 
> As a side bonus, remove the path in nbd_internal_fork_safe_perror() where
> at least one of the first two write() syscalls fails, and overwrites
> "errno", before we get to formatting the error string from "errno".

All nice benefits, even if we don't normally exercise the code.

> 
> Thanks to Eric Blake for helping me understand the scope of Austin Group
> bug reports.
> 
> Signed-off-by: Laszlo Ersek <lersek at redhat.com>
> ---
> 
> Notes:
>     v4:
>     
>     - Rework with <stdarg.h> and writev().
>     
>     - Don't split the output into chunks of SSIZE_MAX bytes. In v3, the goal
>       of that chunking was to avoid implementation-defined behavior.
>       However, POSIX requires writev() to fail cleanly when more than
>       SSIZE_MAX bytes would be transfered in a single call. Hence the
>       original goal (avoiding implementation-defined behavior) is ensured
>       simply by switching to writev(). The SSIZE_MAX limit is not expected
>       to be hit in practice (_POSIX_SSIZE_MAX is 32767).

Concur.

>     
>     - As a "bonus", remove the pre-patch possibility to trample "errno"
>       before formatting the error string.

Nice find; and one I missed in my earlier review.

>     
>     - Refresh the commit message.
>     
>     - The "contiguous output" from a single xwritel() -> writev() call (as
>       opposed to the "interleaved output" from multiple xwrite() -> write()
>       calls in v3) is easily testable in practice (on my end) with the
>       following small patch, even though this "contiguity" is of course not
>       guaranteed:
>     
>     > diff --git a/generator/states-connect-socket-activation.c b/generator/states-connect-socket-activation.c
>     > index e4b3b565ae2e..c66c638d331f 100644
>     > --- a/generator/states-connect-socket-activation.c
>     > +++ b/generator/states-connect-socket-activation.c
>     > @@ -179,6 +179,8 @@  CONNECT_SA.START:
>     >         * socket).
>     >         */
>     >        int flags = fcntl (s, F_GETFD, 0);
>     > +      flags = -1;
>     > +      errno = EBADF;
>     >        if (flags == -1) {
>     >          nbd_internal_fork_safe_perror ("fcntl: F_GETFD");
>     >          _exit (126);
>     
>       It results in the following snippet in
>       "tests/connect-systemd-socket-activation.log":
>     
>     > libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition: CONNECT.CONNECTING -> MAGIC.START
>     > fcntl: F_GETFD: Bad file descriptor
>     > libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition: MAGIC.START -> MAGIC.RECV_MAGIC
>     
>       Note that the child process's output is on an isolated line.

Without the patch, it had a high likelihood (but not guarantee) of
interleaving.  And writev() is not a bulletproof avoidance of
interleaving (if we hit a short write and retry the tail, interleaving
is possible) - but we are very unlikely to see that in practice.

>     
>     - Do not pick up R-b tags from Eric and Rich due to significant changes
>       in v4.
>     
>     context:-U5
> 
>  lib/utils.c | 87 +++++++++++++++++++-
>  1 file changed, 83 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/utils.c b/lib/utils.c
> index 6df4f14ce9f4..62b4bfdda5c3 100644
> --- a/lib/utils.c
> +++ b/lib/utils.c
> @@ -23,11 +23,14 @@
>  #include <string.h>
>  #include <unistd.h>
>  #include <ctype.h>
>  #include <errno.h>
>  #include <fcntl.h>
> +#include <stdarg.h>
> +#include <sys/uio.h>
>  
> +#include "array-size.h"
>  #include "minmax.h"
>  
>  #include "internal.h"
>  
>  void
> @@ -179,33 +182,109 @@ nbd_internal_fork_safe_itoa (long v, char *buf, size_t bufsize)
>  #if defined (__GNUC__)
>  #pragma GCC diagnostic push
>  #pragma GCC diagnostic ignored "-Wunused-result"
>  #endif
>  
> +/* "Best effort" function for writing out a list of NUL-terminated strings to a
> + * file descriptor (without the NUL-terminators). The list is terminated with
> + * (char *)NULL. Partial writes, and EINTR and EAGAIN failures are handled
> + * internally. No value is returned; only call this function for writing
> + * diagnostic data on error paths, when giving up on a higher-level action
> + * anyway.
> + *
> + * No more than 16 strings, excluding the NULL terminator, will be written. (As
> + * of POSIX Issue 7 + TC2, _XOPEN_IOV_MAX is 16.)
> + *
> + * The function is supposed to remain async-signal-safe.
> + *
> + * (The va_*() macros, while not marked async-signal-safe in Issue 7 + TC2, are
> + * considered such, per <https://www.austingroupbugs.net/view.php?id=711>, which
> + * is binding for Issue 7 implementations via the Interpretations Track.
> + *
> + * Furthermore, writev(), while also not marked async-signal-safe in Issue 7 +
> + * TC2, is considered such, per
> + * <https://www.austingroupbugs.net/view.php?id=1455>, which is slated for
> + * inclusion in Issue 7 TC3 (if there's going to be a TC3), and in Issue 8.)

Correct summary, and matches our off-list collaboration on determining
what the Austin Group guarantees (while still waiting for their
release of POSIX Issue 8 later this year).

> + */
> +static void
> +xwritel (int fildes, ...)
> +{

Good candidate for __attribute__ ((sentinel)), so that -Wformat marks
callers that fail to supply a trailing NULL argument.

> +  /* Open-code the current value of _XOPEN_IOV_MAX, in order to contain stack
> +   * footprint, should _XOPEN_IOV_MAX grow in the future.
> +   */
> +  struct iovec iovec[16], *filled, *end, *pos;
> +  va_list ap;
> +  char *arg;
> +
> +  /* Translate the variable argument list to IO vectors. Note that we cast away
> +   * const-ness intentionally.
> +   */
> +  filled = iovec;
> +  end = iovec + ARRAY_SIZE (iovec);
> +  va_start (ap, fildes);
> +  while (filled < end && (arg = va_arg (ap, char *)) != NULL)
> +    *filled++ = (struct iovec){ .iov_base = arg, .iov_len = strlen (arg) };
> +  va_end (ap);
> +
> +  /* Write out the IO vectors. */
> +  pos = iovec;
> +  while (pos < filled) {
> +    ssize_t written;
> +
> +    /* Skip any empty vectors at the front. */
> +    if (pos->iov_len == 0) {
> +      ++pos;
> +      continue;
> +    }

In practice, writev() will handle 0-length iovs on all file types; but
I agree with your effort to skip them since POSIX only guarantees
behavior on regular files (and our typical fd of stderr is not always
a regular file).

> +
> +    /* Write out the vectors. */
> +    do
> +      written = writev (fildes, pos, filled - pos);
> +    while (written == -1 && (errno == EINTR || errno == EAGAIN));
> +
> +    if (written == -1)
> +      return;
> +
> +    /* Consume the vectors that have been written out (fully, or in part). Note
> +     * that "written" is positive here.

The POSIX wording is a bit tricky to read in this regards, but I think
you are correct that write() (and therefore writev()) will never
return 0 if passed a non-zero length on input: either a short write
happens because of a signal before anything is written (return is -1,
errno is EINTR or EAGAIN), or the short write occurred after partial
write (return must be positive); the only time return can be 0 is if
length was 0 but we don't have that issue.

> +     */
> +    do {
> +      size_t advance;
> +
> +      advance = MIN (written, pos->iov_len);
> +      /* Note that "advance" is positive here iff "pos->iov_len" is positive. */
> +      pos->iov_base = (char *)pos->iov_base + advance;
> +      pos->iov_len -= advance;
> +      written -= advance;
> +
> +      /* At least one of "written" and "pos->iov_len" is zero here. */
> +      if (pos->iov_len == 0)
> +        ++pos;
> +    } while (written > 0);
> +  }
> +}

Nice!  Took me more than one pass to fully understand it, but I agree
that your documented loop invariants hold, and that it does indeed do
a best effort vectored write.

> +
>  /* Fork-safe version of perror.  ONLY use this after fork and before
>   * exec, the rest of the time use set_error().
>   */
>  void
>  nbd_internal_fork_safe_perror (const char *s)
>  {
>    const int err = errno;
>    const char *m = NULL;
>    char buf[32];
>  
> -  write (2, s, strlen (s));
> -  write (2, ": ", 2);
>  #ifdef HAVE_STRERRORDESC_NP
>    m = strerrordesc_np (errno);
>  #else
>  #if HAVE_SYS_ERRLIST /* NB Don't use #ifdef */
>    m = errno >= 0 && errno < sys_nerr ? sys_errlist[errno] : NULL;
>  #endif
>  #endif
>    if (!m)
>      m = nbd_internal_fork_safe_itoa ((long) errno, buf, sizeof buf);

The bonus bug you fixed here could be independently fixed by
s/errno/err/ without the rest of your patch, but now that nothing else
touches errno prior to assigning to m, I don't see the need to make
that change now.

> -  write (2, m, strlen (m));
> -  write (2, "\n", 1);
> +  xwritel (STDERR_FILENO, s, ": ", m, "\n", (char *)NULL);

I also like the change of s/2/STDERR_FILENO/ that you snuck in here.

The only change I recommend is the addition of the __attribute__; but
with or without it, I'm happy with:

Reviewed-by: Eric Blake <eblake at redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


More information about the Libguestfs mailing list