[Libguestfs] [PATCH 2/2] build: Replace ./nbdkit with a C program.

Eric Blake eblake at redhat.com
Wed Nov 14 02:10:34 UTC 2018


On 11/13/18 4:51 PM, Richard W.M. Jones wrote:
> There are advantages to having the same code parse the options in the
> ./nbdkit wrapper as in the real nbdkit:
> 
> - We can parse options in exactly the same way as the real program.
> 
> - Use the more accurate ‘is_short_name’ test for unadorned
>    plugin/filter names on the command line.
> 
> - Fixes the FreeBSD problem with shebangs caused because FreeBSD
>    refuses the use a shell script as a shebang path.

s/the/to/

> 
> Apart from the above, this is a straightforward translation of the
> original shell script into C and preserves all the existing features
> such as valgrind and gdb support.
> ---
>   Makefile.am  |  11 ++-
>   README       |   2 +-
>   configure.ac |   2 -
>   nbdkit.in    | 160 ----------------------------------
>   wrapper.c    | 237 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   5 files changed, 248 insertions(+), 164 deletions(-)
> 
> diff --git a/Makefile.am b/Makefile.am
> index d5ef59f..cd41132 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -46,7 +46,16 @@ EXTRA_DIST = \
>   
>   CLEANFILES += html/*.html
>   
> -noinst_SCRIPTS = nbdkit
> +# NB: This is not the real nbdkit binary.  It's a wrapper that allows
> +# you to run nbdkit from the build directory before it is installed.
> +noinst_PROGRAMS = nbdkit

Maybe you could build the executable as 'wrapper' and then hard- or 
sym-link the name 'nbdkit' to the executable? Don't know if that would 
reduce or add to the confusion of having two separate binaries named 
nbdkit in the build tree.  Not a show-stopper, and the idea of a wrapper 
binary rather than a wrapper script seems interesting, regardless of 
what naming you settle on.

> +++ b/wrapper.c

> +
> +/*------------------------------------------------------------
> + * This wrapper lets you run nbdkit from the source directory.
> + *
> + * You can use either:
> + * ./nbdkit file [arg=value] [arg=value] ...
> + * or:
> + *   /path/to/nbdkit file [arg=value] [arg=value] ...
> + *
> + * Or you can set $PATH to include the nbdkit source directory and run
> + * the bare "nbdkit" command without supplying the full path.
> + *
> + * The wrapper modifies the bare plugin name (eg. "file") to be the
> + * full path to the locally compiled plugin.  If you don't use this
> + * program and run src/nbdkit directly then it will pick up the
> + * installed plugins which is not usually what you want.
> + *
> + * This program is also used to run the tests (make check).
> + *------------------------------------------------------------

Comment matches the script version, with appropriate rewording.

> + */
> +
> +#include <config.h>
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdarg.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <getopt.h>
> +#include <limits.h>
> +
> +#include "options.h"
> +
> +/* Construct an array of parameters passed through to real nbdkit. */
> +static const char **cmd;
> +static size_t len;
> +
> +static void
> +passthru (const char *s)
> +{
> +  cmd = realloc (cmd, (len+1) * sizeof (const char *));
> +  if (cmd == NULL)
> +    abort ();
> +  cmd[len] = s;
> +  ++len;
> +}
> +
> +static void
> +passthru_format (const char *fs, ...)

I'd give this a gcc __attribute__((printf...)), so that the compiler can 
help diagnose misuse.

> +{
> +  va_list args;
> +  char *str;
> +
> +  va_start (args, fs);
> +  if (vasprintf (&str, fs, args) == -1)
> +    abort ();
> +  va_end (args);
> +  passthru (str);
> +}
> +
> +static void
> +end_passthru (void)
> +{
> +  passthru (NULL);
> +}

Phew. I didn't see this on my first glance through, and wondered whether 
your realloc was passing trailing garbage to exec.

> +
> +static void
> +print_command (void)
> +{
> +  size_t i;
> +
> +  if (len > 0)
> +    fprintf (stderr, "%s", cmd[0]);
> +  for (i = 1; i < len && cmd[i] != NULL; ++i)

Is the 'i < len' check wasted effort, given that 'cmd[i] != NULL' is 
sane after end_passthru(), and you don't print prior to then? Or is it 
intended that you could use 'p print_command' while in a gdb session?

> +    fprintf (stderr, " %s", cmd[i]);

Ambiguous output if the user passed in whitespace in an argument, but 
that's not too horrible (in bash, 'printf %q' would solve the problem; 
in C, it's actually more work to figure out how to quote just the things 
that need quoting, rather than to print everything the same way whether 
or not that can be reparsed the same as the original input)

> +  fprintf (stderr, "\n");
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  int c;
> +  int long_index;
> +  int verbose = 0;

Worth using <stdbool.h> and making this one bool?

> +  size_t i;
> +  char *s;
> +
> +  /* If NBDKIT_VALGRIND=1 is set in the environment, then we run the
> +   * program under valgrind.  This is used by the tests.  Similarly if
> +   * NBDKIT_GDB=1 is set, we run the program under GDB, useful during
> +   * development.
> +   */
> +  s = getenv ("NBDKIT_VALGRIND");
> +  if (s && strcmp (s, "1") == 0) {

The shell did this if NBDKIT_VALGRIND was set to anything non-empty; now 
you are requiring it to be exactly "1". Intentional, or do you really 
want 'if (s && *s)'?

> +    passthru (VALGRIND);
> +    passthru ("--vgdb=no");
> +    passthru ("--leak-check=full");
> +    passthru ("--error-exitcode=119");
> +    passthru_format ("--suppressions=%s/valgrind-suppressions", srcdir);
> +    passthru ("--trace-children=no");
> +    passthru ("--child-silent-after-fork=yes");
> +    passthru ("--run-libc-freeres=no");
> +    passthru ("--num-callers=20");
> +  }
> +  else {
> +    s = getenv ("NBDKIT_GDB");
> +    if (s && strcmp (s, "1") == 0) {

Ditto.

> +      passthru ("gdb");
> +      passthru ("--args");
> +    }
> +  }
> +
> +  /* Absolute path of the real nbdkit command. */
> +  passthru_format ("%s/src/nbdkit", builddir);
> +
> +  /* Option parsing.  We don't really parse options here.  We are only
> +   * interested in which options have arguments and which need
> +   * rewriting.
> +   */
> +  for (;;) {
> +    long_index = -1;

Why do we need this?

/me reads ahead...

Oh, because you are taking the lazy way out by rewriting all user's 
input back in long-opt form, regardless of how the user spelled it on 
input, rather than duplicating a switch statement that has to be 
maintained in parallel with main.c.

> +    c = getopt_long (argc, argv, short_options, long_options, &long_index);
> +    if (c == -1)
> +      break;
> +
> +    /* long_index is only set if it's an actual long option.  However
> +     * in nbdkit all short options have long equivalents so we can
> +     * associate those with a long_index, but we must do it ourselves.
> +     * https://stackoverflow.com/a/34070441
> +     */
> +    if (long_index == -1) {
> +      for (i = 0; long_options[i].name != NULL; ++i) {
> +        if (long_options[i].val == c) {
> +          long_index = i;
> +          break;
> +        }
> +      }
> +      if (long_index == -1)
> +        abort ();

I can abort this, by passing in a garbage argument - at which point 
getopt_long returns '?', but that doesn't map back to long_options[].

$ ./nbdkit -1
./nbdkit: invalid option -- '1'
Aborted (core dumped)
$ ./nbdkit --huh
./nbdkit: unrecognized option '--huh'
Aborted (core dumped)
$ ./nbdkit --filter
./nbdkit: option '--filter' requires an argument
Aborted (core dumped)

Best is probably to just exit(1) if c == '?' (you've already diagnosed 
the option error, so passing through to the real nbdkit would diagnose 
it again), or MAYBE to do the equivalent of "src/nbdkit --help" (but 
then still exit with non-zero status - that gets trickier, unless you 
also move the usage text to options.h in patch 1) to get the effects of 
normal nbdkit printing usage after improper options.

> +    }

Still, I wonder if this loop is necessary - if long_index is -1, you 
know the user passed a short opt, AND you know that printf("-%c", c) 
will spell that short opt (except for c == '?' - but you should already 
be special-casing that). So is it really necessary to map all the user's 
short options into a long option before doing the passthrough?

> +
> +    if (c == FILTER_OPTION) {
> +      /* Filters can be rewritten if purely alphanumeric. */
> +      if (is_short_name (optarg)) {
> +        passthru_format ("--%s", /* --filter */
> +                         long_options[long_index].name);

Here, you know that there is no short option for --filter, but you ALSO 
know that the option is spelled exactly "--filter", so why bother with 
formatting the reverse lookup into long_options when you can just 
hardcode passthru("--format")?

> +        passthru_format ("%s/filters/%s/.libs/nbdkit-%s-filter.so",
> +                         builddir, optarg, optarg);
> +      }
> +      else goto opt_with_arg;
> +    }
> +    else if (c == 'v') {
> +      /* Verbose is special because we will print the final command. */
> +      verbose = 1;
> +      passthru_format ("--%s", long_options[long_index].name);

Okay, here if you don't reverse map long_index when the user passed a 
short option, then you have to do a runtime decision between "--%s" or 
"-%c",

> +    }
> +    else if (long_options[long_index].has_arg) {
> +      /* Remaining options that take an argument are passed through. */
> +    opt_with_arg:
> +      passthru_format ("--%s", long_options[long_index].name);

and repeat that runtime decision,

> +      passthru (optarg);
> +    }
> +    else {
> +      /* Remaining options that do not take an argument. */
> +      passthru_format ("--%s", long_options[long_index].name);

and repeat again.  At the end of the day, it may still be slightly more 
efficient to avoid the reverse lookup loop, but I don't know if it 
becomes any easier to maintain (you'd be best off adding in a helper 
function rather than open-coding the decision).  Offhand,

void passthru_option (int c, int long_index)
{
   assert (c != '?');
   if (long_index == -1)
     passthru_format ("--%s", long_options[long_index].name);
   else {
     assert (c <= CHAR_MAX)
     passthru_format ("-%c", c);
   }
}

> +    }
> +  }
> +
> +  /* Are there any non-option arguments? */
> +  if (optind < argc) {
> +    /* The first non-option argument is the plugin name.  If it is
> +     * purely alphanumeric then rewrite it.
> +     */
> +    if (is_short_name (argv[optind])) {

is_short_name(" ") and is_short_name("-") both return true, leading to 
some unusual messages:

$ ./nbdkit -
nbdkit: /home/eblake/nbdkit/plugins/-/.libs/nbdkit---plugin.so: 
/home/eblake/nbdkit/plugins/-/.libs/nbdkit---plugin.so: cannot open 
shared object file: No such file or directory
$ ./nbdkit ' '
nbdkit: /home/eblake/nbdkit/plugins/ /.libs/nbdkit- -plugin.so: 
/home/eblake/nbdkit/plugins/ /.libs/nbdkit- -plugin.so: cannot open 
shared object file: No such file or directory

but that's pre-existing even with the real nbdkit, and not something you 
need to worry about in this patch other than the comment.  Among other 
things, the comment lies (while I can allow "-" as a stretch that is 
nearly alphanumeric, " " certainly doesn't fit the bill).  Note that it 
IS different from the shell script, where you did the rewrite only after 
checking the regex ^[a-zA-Z0-9]+$ (no '_'?) and was therefore purely 
alphanumeric - but by being consistent with src/main.c, now you can 
rewrite is_short_name() in a separate patch to get both the main binary 
and the wrapper to change their rules on how to treat the plugin name.

> +      /* Special plugins written in Perl. */
> +      if (strcmp (argv[optind], "example4") == 0 ||
> +          strcmp (argv[optind], "tar") == 0) {
> +        passthru_format ("%s/plugins/perl/.libs/nbdkit-perl-plugin.so",
> +                         builddir);
> +        passthru_format ("%s/plugins/%s/nbdkit-%s-plugin",
> +                         builddir, argv[optind], argv[optind]);
> +      }
> +      else {
> +        passthru_format ("%s/plugins/%s/.libs/nbdkit-%s-plugin.so",
> +                         builddir, argv[optind], argv[optind]);
> +      }
> +      ++optind;
> +    }

Hmm. If the user calls 'nbdkit file -- -my-file' as an alternative to 
'nbdkit file ./-my-file', you end up not repeating the "--" argument. 
You probably want to do passthru("--") right here, whether or not it was 
present in the caller's command line, to ensure that the real nbdkit 
doesn't see any remaining arguments as options.

> +
> +    /* Everything else is passed through without rewriting. */
> +    while (optind < argc) {
> +      passthru (argv[optind]);
> +      ++optind;
> +    }
> +  }
> +
> +  end_passthru ();
> +  if (verbose)
> +    print_command ();

The rewrite from './nbdkit -f' into '/path/to/src/nbdkit --foreground' 
looks weird; I argued that a helper function to preserve the short 
option spelling when possible might be nicer (you'll still rewrite 
'./nbdkit -vf' into '/path/to/src/nbdkit -v -f' even if you preserve 
short options, but that's not as drastic).  Similarly, the rewrite from 
'./nbdkit --filter=foo' into '/path/to/src/nbdkit --filter foo' is 
reasonable, as is './nbdkit -Pfile.pid' into '/path/to/src/nbdkit -P 
file.pid'.

> +
> +  /* Run the final command. */
> +  execvp (cmd[0], (char **) cmd);
> +  perror (cmd[0]);
> +  exit (EXIT_FAILURE);
> +}
> 

Hmm - should 'nbdkit --help --garbage' warn about --garbage being 
unrecognized, or should it unconditionally print help regardless of the 
rest of the command line?  Right now it does the former; but if it 
switches to the latter, then you have to special-case --help in 
wrapper.c to get the same effect.  Ditto for --version.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




More information about the Libguestfs mailing list