[Libguestfs] [libnbd PATCH v4 1/3] lib/utils: introduce xwritel() as a more robust and convenient write()
Eric Blake
eblake at redhat.com
Wed Mar 15 14:01:40 UTC 2023
On Wed, Mar 15, 2023 at 12:01:55PM +0100, Laszlo Ersek wrote:
> While nbd_internal_fork_safe_perror() must indeed call write(), and
> arguably justifiedly ignores the return value of write(), we can still
> make the write operations slightly more robust and convenient. Let's do
> that by introducing xwritel():
>
> - Let the caller pass a list of NUL-terminated strings, via stdarg /
> ellipsis notation in the parameter list.
>
> - Handle partial writes.
>
> - Cope with EINTR and EAGAIN errors. (A busy loop around EAGAIN on a
> non-blocking file is not great in the general case, but it's good enough
> for writing out diagnostics before giving up.)
>
> - In the common case, handle an nbd_internal_fork_safe_perror() call with
> a single xwritel() -> writev() call chain, rather than with four
> separate write() calls. In practice, this tends to make the error
> message written to a regular file contiguous, even if other threads are
> writing to the same file. Multiple separate write() calls tend to
> interleave chunks of data from different threads.
>
> As a side bonus, remove the path in nbd_internal_fork_safe_perror() where
> at least one of the first two write() syscalls fails, and overwrites
> "errno", before we get to formatting the error string from "errno".
All nice benefits, even if we don't normally exercise the code.
>
> Thanks to Eric Blake for helping me understand the scope of Austin Group
> bug reports.
>
> Signed-off-by: Laszlo Ersek <lersek at redhat.com>
> ---
>
> Notes:
> v4:
>
> - Rework with <stdarg.h> and writev().
>
> - Don't split the output into chunks of SSIZE_MAX bytes. In v3, the goal
> of that chunking was to avoid implementation-defined behavior.
> However, POSIX requires writev() to fail cleanly when more than
> SSIZE_MAX bytes would be transfered in a single call. Hence the
> original goal (avoiding implementation-defined behavior) is ensured
> simply by switching to writev(). The SSIZE_MAX limit is not expected
> to be hit in practice (_POSIX_SSIZE_MAX is 32767).
Concur.
>
> - As a "bonus", remove the pre-patch possibility to trample "errno"
> before formatting the error string.
Nice find; and one I missed in my earlier review.
>
> - Refresh the commit message.
>
> - The "contiguous output" from a single xwritel() -> writev() call (as
> opposed to the "interleaved output" from multiple xwrite() -> write()
> calls in v3) is easily testable in practice (on my end) with the
> following small patch, even though this "contiguity" is of course not
> guaranteed:
>
> > diff --git a/generator/states-connect-socket-activation.c b/generator/states-connect-socket-activation.c
> > index e4b3b565ae2e..c66c638d331f 100644
> > --- a/generator/states-connect-socket-activation.c
> > +++ b/generator/states-connect-socket-activation.c
> > @@ -179,6 +179,8 @@ CONNECT_SA.START:
> > * socket).
> > */
> > int flags = fcntl (s, F_GETFD, 0);
> > + flags = -1;
> > + errno = EBADF;
> > if (flags == -1) {
> > nbd_internal_fork_safe_perror ("fcntl: F_GETFD");
> > _exit (126);
>
> It results in the following snippet in
> "tests/connect-systemd-socket-activation.log":
>
> > libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition: CONNECT.CONNECTING -> MAGIC.START
> > fcntl: F_GETFD: Bad file descriptor
> > libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition: MAGIC.START -> MAGIC.RECV_MAGIC
>
> Note that the child process's output is on an isolated line.
Without the patch, it had a high likelihood (but not guarantee) of
interleaving. And writev() is not a bulletproof avoidance of
interleaving (if we hit a short write and retry the tail, interleaving
is possible) - but we are very unlikely to see that in practice.
>
> - Do not pick up R-b tags from Eric and Rich due to significant changes
> in v4.
>
> context:-U5
>
> lib/utils.c | 87 +++++++++++++++++++-
> 1 file changed, 83 insertions(+), 4 deletions(-)
>
> diff --git a/lib/utils.c b/lib/utils.c
> index 6df4f14ce9f4..62b4bfdda5c3 100644
> --- a/lib/utils.c
> +++ b/lib/utils.c
> @@ -23,11 +23,14 @@
> #include <string.h>
> #include <unistd.h>
> #include <ctype.h>
> #include <errno.h>
> #include <fcntl.h>
> +#include <stdarg.h>
> +#include <sys/uio.h>
>
> +#include "array-size.h"
> #include "minmax.h"
>
> #include "internal.h"
>
> void
> @@ -179,33 +182,109 @@ nbd_internal_fork_safe_itoa (long v, char *buf, size_t bufsize)
> #if defined (__GNUC__)
> #pragma GCC diagnostic push
> #pragma GCC diagnostic ignored "-Wunused-result"
> #endif
>
> +/* "Best effort" function for writing out a list of NUL-terminated strings to a
> + * file descriptor (without the NUL-terminators). The list is terminated with
> + * (char *)NULL. Partial writes, and EINTR and EAGAIN failures are handled
> + * internally. No value is returned; only call this function for writing
> + * diagnostic data on error paths, when giving up on a higher-level action
> + * anyway.
> + *
> + * No more than 16 strings, excluding the NULL terminator, will be written. (As
> + * of POSIX Issue 7 + TC2, _XOPEN_IOV_MAX is 16.)
> + *
> + * The function is supposed to remain async-signal-safe.
> + *
> + * (The va_*() macros, while not marked async-signal-safe in Issue 7 + TC2, are
> + * considered such, per <https://www.austingroupbugs.net/view.php?id=711>, which
> + * is binding for Issue 7 implementations via the Interpretations Track.
> + *
> + * Furthermore, writev(), while also not marked async-signal-safe in Issue 7 +
> + * TC2, is considered such, per
> + * <https://www.austingroupbugs.net/view.php?id=1455>, which is slated for
> + * inclusion in Issue 7 TC3 (if there's going to be a TC3), and in Issue 8.)
Correct summary, and matches our off-list collaboration on determining
what the Austin Group guarantees (while still waiting for their
release of POSIX Issue 8 later this year).
> + */
> +static void
> +xwritel (int fildes, ...)
> +{
Good candidate for __attribute__ ((sentinel)), so that -Wformat marks
callers that fail to supply a trailing NULL argument.
> + /* Open-code the current value of _XOPEN_IOV_MAX, in order to contain stack
> + * footprint, should _XOPEN_IOV_MAX grow in the future.
> + */
> + struct iovec iovec[16], *filled, *end, *pos;
> + va_list ap;
> + char *arg;
> +
> + /* Translate the variable argument list to IO vectors. Note that we cast away
> + * const-ness intentionally.
> + */
> + filled = iovec;
> + end = iovec + ARRAY_SIZE (iovec);
> + va_start (ap, fildes);
> + while (filled < end && (arg = va_arg (ap, char *)) != NULL)
> + *filled++ = (struct iovec){ .iov_base = arg, .iov_len = strlen (arg) };
> + va_end (ap);
> +
> + /* Write out the IO vectors. */
> + pos = iovec;
> + while (pos < filled) {
> + ssize_t written;
> +
> + /* Skip any empty vectors at the front. */
> + if (pos->iov_len == 0) {
> + ++pos;
> + continue;
> + }
In practice, writev() will handle 0-length iovs on all file types; but
I agree with your effort to skip them since POSIX only guarantees
behavior on regular files (and our typical fd of stderr is not always
a regular file).
> +
> + /* Write out the vectors. */
> + do
> + written = writev (fildes, pos, filled - pos);
> + while (written == -1 && (errno == EINTR || errno == EAGAIN));
> +
> + if (written == -1)
> + return;
> +
> + /* Consume the vectors that have been written out (fully, or in part). Note
> + * that "written" is positive here.
The POSIX wording is a bit tricky to read in this regards, but I think
you are correct that write() (and therefore writev()) will never
return 0 if passed a non-zero length on input: either a short write
happens because of a signal before anything is written (return is -1,
errno is EINTR or EAGAIN), or the short write occurred after partial
write (return must be positive); the only time return can be 0 is if
length was 0 but we don't have that issue.
> + */
> + do {
> + size_t advance;
> +
> + advance = MIN (written, pos->iov_len);
> + /* Note that "advance" is positive here iff "pos->iov_len" is positive. */
> + pos->iov_base = (char *)pos->iov_base + advance;
> + pos->iov_len -= advance;
> + written -= advance;
> +
> + /* At least one of "written" and "pos->iov_len" is zero here. */
> + if (pos->iov_len == 0)
> + ++pos;
> + } while (written > 0);
> + }
> +}
Nice! Took me more than one pass to fully understand it, but I agree
that your documented loop invariants hold, and that it does indeed do
a best effort vectored write.
> +
> /* Fork-safe version of perror. ONLY use this after fork and before
> * exec, the rest of the time use set_error().
> */
> void
> nbd_internal_fork_safe_perror (const char *s)
> {
> const int err = errno;
> const char *m = NULL;
> char buf[32];
>
> - write (2, s, strlen (s));
> - write (2, ": ", 2);
> #ifdef HAVE_STRERRORDESC_NP
> m = strerrordesc_np (errno);
> #else
> #if HAVE_SYS_ERRLIST /* NB Don't use #ifdef */
> m = errno >= 0 && errno < sys_nerr ? sys_errlist[errno] : NULL;
> #endif
> #endif
> if (!m)
> m = nbd_internal_fork_safe_itoa ((long) errno, buf, sizeof buf);
The bonus bug you fixed here could be independently fixed by
s/errno/err/ without the rest of your patch, but now that nothing else
touches errno prior to assigning to m, I don't see the need to make
that change now.
> - write (2, m, strlen (m));
> - write (2, "\n", 1);
> + xwritel (STDERR_FILENO, s, ": ", m, "\n", (char *)NULL);
I also like the change of s/2/STDERR_FILENO/ that you snuck in here.
The only change I recommend is the addition of the __attribute__; but
with or without it, I'm happy with:
Reviewed-by: Eric Blake <eblake at redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
More information about the Libguestfs
mailing list