[Libguestfs] [PATCH FOR DISCUSSION ONLY] Rewrite libguestfs-supermin-helper in C.

Jim Meyering jim at meyering.net
Fri Mar 12 11:06:24 UTC 2010


Richard W.M. Jones wrote:
> Subject: [PATCH] Rewrite libguestfs-supermin-helper in C.
>
> libguestfs-supermin-helper was previously a shell script.  Although
> we had steadily optimized it, there were a number of intractable
> hot spots:
>
>   (1) cpio still reads input files in 512 byte chunks; this is *very*
>     pessimal behaviour, particularly when SELinux is enabled.
>   (2) the hostfiles globbing was done very inefficiently by the shell,
>     with the shell rereading the same directory over and over again.
>
> This is a rewrite of this shell script in C.  It is approximately
> 3 times faster without SELinux, and has an even greater speed difference
> with SELinux.
>
> The main features are:
>
>   (a) It never frees memory, making it simpler.  The program is designed
>     to run and exit in sub-second times, so this is acceptable.
>   (b) It caches directory reads, making the globbing of host files much
>     faster (measured this as ~ 4 x speed up).
>   (c) It doesn't use external cpio, but instead contains code to write
>     newc format cpio files, which is all that the kernel can read.  Unlike
>     cpio, this code uses large buffers for reads and writes.
>   (d) Ignores missing or unreadable hostfiles, whereas cpio gave a
>     warning.
>   (e) Checks all return values from system calls.
>   (f) With --verbose flag, it will print messages timing itself.

Hi Rich,

This looks nice.
Mostly nit-picky robustness and style-related suggestions below:


> +static void
> +copy_hostfiles (const char *sourcedir, const char *hostcpu, const char *repo)
> +{
> +  char tmp[PATH_MAX];
> +  snprintf (tmp, sizeof tmp, "%s/initramfs.%s.%s.supermin.hostfiles",
> +            sourcedir, repo, hostcpu);

IMHO, it's very often better to use asprintf.
Two reasons:
  - eliminates wasteful (and not-so-portable) [PATH_MAX] use.
  - eliminates risk of misbehavior if/when some combination of inputs
      leads to a truncated buffer.

> +  char **hostfiles = load_file (tmp);
> +
> +  /* Hostfiles list can contain "." before each path - ignore it.
> +   * It also contains each directory name before we enter it.  But
> +   * we don't read that until we see a wildcard for that directory.
> +   */
> +  size_t i, j;
> +  for (i = 0; hostfiles[i] != NULL; ++i) {
> +    char *hostfile = hostfiles[i];
> +    if (hostfile[0] == '.')
> +      hostfile++;
> +
> +    struct stat statbuf;
> +
> +    /* Is it a wildcard? */
> +    if (strchr (hostfile, '*') || strchr (hostfile, '?')) {
> +      char *dirname = strdup (hostfile);

We need to detect strdup failure here.
Just use xstrdup (see below, re gnulib's xalloc module).

> +      char *patt = strrchr (dirname, '/');
> +      assert (patt);
> +      *patt++ = '\0';
> +
> +      char **files = read_dir (dirname);
> +      files = filter_fnmatch (files, patt, FNM_NOESCAPE);
> +
> +      /* Add matching files. */
> +      for (j = 0; files[j] != NULL; ++j) {
> +        snprintf (tmp, sizeof tmp, "%s/%s", dirname, files[j]);

For improved robustness (but we're getting into really low probability
for abuse here), you'd want to use asprintf or detect snprintf failure.

> +        if (verbose >= 2)
> +          fprintf (stderr, "including host file %s (matches %s)\n", tmp, patt);
> +
> +        cpio_append (tmp);
> +      }
> +    }
> +    /* Else does this file/directory/whatever exist? */
> +    else if (lstat (hostfile, &statbuf) == 0) {
> +      if (verbose >= 2)
> +        fprintf (stderr, "including host file %s (directly referenced)\n",
> +                 hostfile);
> +
> +      cpio_append_stat (hostfile, &statbuf);
> +    } /* Ignore files that don't exist. */
> +  }
> +}
> +
> +/*----------*/
> +/* Helper functions.  *NOTE* we never bother to free anything. */
> +
> +static void
> +add_string (char ***argv, size_t *size, size_t *alloc, const char *str)
> +{

For a variable that counts items as this "size" does,
I'd prefer to use a name like n_used, n_items or even just "n",
since "size" tends to evoke "buffer size", which might be misleading here.

Similar, s/alloc/n_alloc/ would make me slightly happier ;-)

> +  char **new_argv;
> +  char *new_str;
> +
> +  if (*size >= *alloc) {
> +    *alloc += 64;
> +    new_argv = realloc (*argv, *alloc * sizeof (char *));
> +    if (new_argv == NULL) {
> +      perror ("realloc");

Here, I'd suggest using gnulib's xnrealloc function (via the xalloc module).
Not only does it encapsulate the allocation and diagnose+exit-upon-failure,
but it also detects integer overflow (for which the above would misbehave):

       new_argv = xnrealloc (*argv, *alloc, sizeof **argv);

For extra credit, see xalloc.h's comment with sample use of x2nrealloc.
Using that would let you remove the += 64 code above,
and make this function even more concise.

> +      exit (EXIT_FAILURE);
> +    }
> +    *argv = new_argv;
> +  }
> +
> +  if (str) {
> +    new_str = strdup (str);

Similarly, use xstrdup.

> +    if (new_str == NULL) {
> +      perror ("strdup");
> +      exit (EXIT_FAILURE);
> +    }
> +  } else
> +    new_str = NULL;
> +
> +  (*argv)[*size] = new_str;
> +
> +  (*size)++;
> +}
...
> +static char **
> +read_dir (const char *name)
> +{
> +  static Hash_table *ht = NULL;
> +
> +  if (!ht)
> +    ht = hash_initialize (1024, NULL, dir_cache_hash, dir_cache_compare, NULL);
> +
> +  struct dir_cache key = { .path = (char *) name };
> +  struct dir_cache *p = hash_lookup (ht, &key);
> +  if (p)
> +    return p->files;
> +
> +  char **files = NULL;
> +  size_t size = 0, alloc = 0;
> +
> +  DIR *dir = opendir (name);
> +  if (!dir) {
> +    /* If it fails to open, that's OK, skip to the end. */
> +    /*perror (name);*/
> +    goto done;
> +  }
> +
> +  struct dirent *d;
> +  for (;;) {
> +    errno = 0;
> +    d = readdir (dir);

It'd be nice to move the declaration of "d" down into the loop.
That'd save a line and ensure that there is no accidental use
outside of the scope where it's intended to be used.

     for (;;) {
       errno = 0;
       struct dirent *d = readdir (dir);


> +    if (d == NULL) {
> +      if (errno != 0) {
> +        /* But if it fails here, after opening and potentially reading
> +         * part of the directory, that's a proper failure - inform the
> +         * user and exit.
> +         */
> +        perror (name);
> +        exit (EXIT_FAILURE);

Personally, I have a bias for using the "error" function to report diagnostics.
Then, they always start with "program_name: ", and that makes it easier
to attribute a diagnostic to its source.

Another advantage is readability: it combines the common, 2-line fprintf;exit
idiom into a single statement, so you often end up being able to avoid
the curly braces, too.

> +      }
> +      break;
> +    }
> +
> +    add_string (&files, &size, &alloc, d->d_name);
> +  }
> +
> +  if (closedir (dir) == -1) {
> +    perror (name);
> +    exit (EXIT_FAILURE);
> +  }
> +
> + done:
> +  /* NULL-terminate the array. */
> +  add_string (&files, &size, &alloc, NULL);
> +
> +  /* Add it to the hash for next time. */
> +  p = malloc (sizeof *p);
> +  if (p == NULL) {
> +    perror ("malloc");
> +    exit (EXIT_FAILURE);
> +  }
> +  p->path = (char *) name;
> +  p->files = files;
> +  p = hash_insert (ht, p);
> +  assert (p != NULL);
> +
> +  return files;
> +}
> +
> +/* Filter a list of strings and return only those matching the wildcard. */
> +static char **
> +filter_fnmatch (char **strings, const char *patt, int flags)
> +{
> +  char **out = NULL;
> +  size_t size = 0, alloc = 0;
> +
> +  int i, r;
> +  for (i = 0; strings[i] != NULL; ++i) {
> +    r = fnmatch (patt, strings[i], flags);
> +    if (r == 0)
> +      add_string (&out, &size, &alloc, strings[i]);
> +    else if (r != FNM_NOMATCH) {
> +      fprintf (stderr, "libguestfs-supermin-helper: internal error: fnmatch ('%s', '%s', %d) returned unexpected non-zero value %d\n",

The string, "libguestfs-supermin-helper" appears enough that I'd
factor it out.  But if you go with using the "error" function,
you'll get that for free.

> +               patt, strings[i], flags, r);
> +      exit (EXIT_FAILURE);
> +    }
> +  }
> +
> +  add_string (&out, &size, &alloc, NULL);
> +  return out;
> +}
> +
> +/* Filter a list of strings and return only those which DON'T contain sub. */
> +static char **
> +filter_notmatching_substring (char **strings, const char *sub)
> +{
> +  char **out = NULL;
> +  size_t size = 0, alloc = 0;
> +
> +  int i;
> +  for (i = 0; strings[i] != NULL; ++i) {
> +    if (strstr (strings[i], sub) == NULL)
> +      add_string (&out, &size, &alloc, strings[i]);
> +  }
> +
> +  add_string (&out, &size, &alloc, NULL);
> +  return out;
> +}
> +
> +/* Sort a list of strings, in place, with the comparison function supplied. */
> +static void
> +sort (char **strings, int (*compare) (const void *, const void *))
> +{
> +  qsort (strings, count_strings (strings), sizeof (char *), compare);
> +}
> +
> +/* Return true iff path exists and is a directory.  This version
> + * follows symlinks.
> + */
> +static int
> +isdir (const char *path)
> +{
> +  struct stat statbuf;
> +
> +  if (stat (path, &statbuf) == -1)
> +    return 0;
> +
> +  return S_ISDIR (statbuf.st_mode);
> +}
> +
> +/* Copy contents of buffer to out_fd and keep out_offset correct. */
> +static void
> +copy_to_fd (const void *buffer, size_t len)

You might prefer write_to_fd, since "write" is usually what you
do when sending the contents of a buffer to an output stream/device.

> +{
> +  if (full_write (out_fd, buffer, len) != len) {
> +    perror ("libguestfs-supermin-helper: write");
> +    exit (EXIT_FAILURE);
> +  }
> +  out_offset += len;
> +}
> +
> +/* Copy contents of file to out_fd. */
> +static void
> +copy_file_to_fd (const char *filename)
> +{
> +  char buffer[BUFFER_SIZE];
> +  int fd2;
> +  ssize_t r;

You know my bias is to move declarations, like the above two,
"down" to first use, whenever possible.

> +  if (verbose >= 2)
> +    fprintf (stderr, "copy_file_to_fd %s -> %d\n", filename, out_fd);
> +
> +  fd2 = open (filename, O_RDONLY);
> +  if (fd2 == -1) {
> +    perror (filename);
> +    exit (EXIT_FAILURE);
> +  }
> +  for (;;) {
> +    r = read (fd2, buffer, sizeof buffer);
> +    if (r == 0)
> +      break;
> +    if (r == -1) {
> +      if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
> +        continue;
> +      perror (filename);
> +      exit (EXIT_FAILURE);
> +    }
> +    copy_to_fd (buffer, r);
> +  }
> +
> +  if (close (fd2) == -1) {
> +    perror (filename);
> +    exit (EXIT_FAILURE);
> +  }
> +}
> +
> +/* Copy file of given length to output, and fail if the file has
> + * changed size.
> + */
> +static void
> +copy_file_len_to_fd (const char *filename, size_t len)
> +{
> +  char buffer[BUFFER_SIZE];
> +  int fd2;
> +  size_t count = 0;
> +  ssize_t r;
> +
> +  if (verbose >= 2)
> +    fprintf (stderr, "copy_file_to_fd %s -> %d\n", filename, out_fd);
> +
> +  fd2 = open (filename, O_RDONLY);
> +  if (fd2 == -1) {
> +    perror (filename);
> +    exit (EXIT_FAILURE);
> +  }
> +  for (;;) {
> +    r = read (fd2, buffer, sizeof buffer);
> +    if (r == 0)
> +      break;
> +    if (r == -1) {
> +      if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
> +        continue;
> +      perror (filename);
> +      exit (EXIT_FAILURE);
> +    }
> +    copy_to_fd (buffer, r);
> +    count += r;
> +    if (count > len) {
> +      fprintf (stderr, "libguestfs-supermin-helper: copy_file_len_to_fd: %s: file has increased in size\n", filename);

long lines

> +      exit (EXIT_FAILURE);
> +    }
> +  }
> +
> +  if (close (fd2) == -1) {
> +    perror (filename);
> +    exit (EXIT_FAILURE);
> +  }
> +
> +  if (count != len) {
> +    fprintf (stderr, "libguestfs-supermin-helper: copy_file_len_to_fd: %s: file has changed size\n", filename);
> +    exit (EXIT_FAILURE);
> +  }
> +}
> +
> +/* Load in a file, returning a list of lines. */
> +static char **
> +load_file (const char *filename)
> +{
> +  char **lines = 0;
> +  size_t size = 0, alloc = 0;
> +
> +  FILE *fp;
> +  fp = fopen (filename, "r");
> +  if (fp == NULL) {
> +    perror (filename);
> +    exit (EXIT_FAILURE);
> +  }
> +
> +  char line[4096];
> +  while (fgets (line, sizeof line, fp)) {
> +    size_t len = strlen (line);
> +    if (len > 0 && line[len-1] == '\n')
> +      line[len-1] = '\0';
> +    add_string (&lines, &size, &alloc, line);
> +  }
> +
> +  add_string (&lines, &size, &alloc, NULL);
> +  return lines;
> +}
> +
> +/* Append the file pointed to by FTSENT to the cpio output. */
> +static void
> +cpio_append_fts_entry (FTSENT *entry)
> +{
> +  if (entry->fts_info & FTS_NS || entry->fts_info & FTS_NSOK)
> +    cpio_append (entry->fts_path);
> +  else
> +    cpio_append_stat (entry->fts_path, entry->fts_statp);
> +}
> +
> +/* Append the file named 'filename' to the cpio output. */
> +static void
> +cpio_append (const char *filename)
> +{
> +  struct stat statbuf;
> +
> +  if (lstat (filename, &statbuf) == -1) {
> +    perror ("lstat");
> +    exit (EXIT_FAILURE);
> +  }
> +  cpio_append_stat (filename, &statbuf);
> +}
> +
> +/* Append the file to the cpio output. */
> +#define PADDING(len) ((((len) + 3) & ~3) - (len))
> +
> +#define CPIO_HEADER_LEN (6 + 13*8)
> +
> +static void
> +cpio_append_stat (const char *filename, struct stat *statbuf)
> +{
> +  const char *orig_filename = filename;
> +
> +  if (*filename == '/')
> +    filename++;
> +  if (*filename == '\0')
> +    filename = ".";
> +
> +  if (verbose >= 2)
> +    fprintf (stderr, "cpio_append_stat %s 0%o -> %d\n",
> +             orig_filename, statbuf->st_mode, out_fd);
> +
> +  /* Regular files and symlinks are the only ones that have a "body"
> +   * in this cpio entry.
> +   */
> +  int has_body = S_ISREG (statbuf->st_mode) || S_ISLNK (statbuf->st_mode);
> +
> +  size_t len = strlen (filename) + 1;
> +
> +  char header[CPIO_HEADER_LEN + 1];
> +  snprintf (header, sizeof header,
> +            "070701"            /* magic */
> +            "%08X"              /* inode */
> +            "%08X"              /* mode */
> +            "%08X" "%08X"       /* uid, gid */
> +            "%08X"              /* nlink */
> +            "%08X"              /* mtime */
> +            "%08X"              /* file length */
> +            "%08X" "%08X"       /* device holding file major, minor */
> +            "%08X" "%08X"       /* for specials, device major, minor */
> +            "%08X"              /* name length (including \0 byte) */
> +            "%08X",             /* checksum (not used by the kernel) */
> +            (unsigned) statbuf->st_ino, statbuf->st_mode,
> +            statbuf->st_uid, statbuf->st_gid,
> +            (unsigned) statbuf->st_nlink, (unsigned) statbuf->st_mtime,
> +            has_body ? (unsigned) statbuf->st_size : 0,
> +            major (statbuf->st_dev), minor (statbuf->st_dev),
> +            major (statbuf->st_rdev), minor (statbuf->st_rdev),
> +            (unsigned) len, 0);
> +
> +  /* Write the header. */
> +  copy_to_fd (header, CPIO_HEADER_LEN);
> +
> +  /* Follow with the filename, and pad it. */
> +  copy_to_fd (filename, len);
> +  size_t padding_len = PADDING (CPIO_HEADER_LEN + len);
> +  copy_padding (padding_len);
> +
> +  /* Follow with the file or symlink content, and pad it. */
> +  if (has_body) {
> +    if (S_ISREG (statbuf->st_mode))
> +      copy_file_len_to_fd (orig_filename, statbuf->st_size);
> +    else if (S_ISLNK (statbuf->st_mode)) {
> +      char tmp[PATH_MAX];

Why bother reading the symlink dest. name?
It might well be yet another symlink.

> +      if (readlink (orig_filename, tmp, sizeof tmp) == -1) {
> +        perror ("readlink");
> +        exit (EXIT_FAILURE);
> +      }
> +      copy_to_fd (tmp, statbuf->st_size);
> +    }
> +
> +    padding_len = PADDING (statbuf->st_size);
> +    copy_padding (padding_len);
> +  }
> +}
...

------------------------------------
This is actually out of order...

> +static const char *
> +create_kernel (const char *hostcpu, const char *kernel)
> ...
> +    /* /lib/modules/<version> */
> +    char *modpath = malloc (strlen (MODULESDIR) + strlen (version) + 2);
> +    if (modpath == NULL) {
> +      perror ("malloc");
> +      exit (1);
> +    }
> +    sprintf (modpath, MODULESDIR "/%s", version);

This is the classic case in which using asprintf is better.
Or actually, better still, xasprintf, since you want to exit upon failure.
You can replace the above with this one line:

    char *modpath = xasprintf (MODULESDIR "/%s", version);


> +static void
> +print_timestamped_message (const char *fs, ...)
> +{
> +  struct timeval tv;
> +  gettimeofday (&tv, NULL);

Technically, gettimeofday *may* fail.  Solaris documents it:

     Additionally, the gettimeofday() function will fail for 32-bit
     interfaces if:
     EOVERFLOW
           The system time has progressed beyond 2038, thus the
           size of the tv_sec member of the timeval structure
           pointed to by tp is insufficient to hold the current
           time in seconds.




More information about the Libguestfs mailing list