[Libguestfs] [PATCH nbdkit 4/4] Add linuxdisk plugin.

Tue Feb 19 15:21:10 UTC 2019

On 2/19/19 1:49 AM, Richard W.M. Jones wrote:
> From: "Richard W.M. Jones" <rjones at redhat.com>
> 
> This plugin allows you to create a complete ext2 filesystem in a GPT
> partitioned disk image.  This can be attached as a disk to a Linux
> virtual machine.  It is implemented using libext2fs (the same as
> supermin).
> 
> Although there is some overlap with nbdkit-iso-plugin and
> nbdkit-floppy-plugin, the implementations and use cases of all three
> plugins are sufficiently different that it seems to make sense to add
> another plugin rather than attempting to extend one of the existing
> plugins.
> 
> Largely to avoid user error this plugin is read-only.  This is a major
> difference from the floppy plugin: that plugin allows files to be
> modified (but not resized or created) and writes those changes through
> to the backing filesystem.  While this plugin could easily be made
> writable, this would cause almost certain disk corruption when someone
> connected two clients at the same time.  In any case it doesn't make
> much sense for it to be writable by default since the expectation that
> writes would somehow modify the original directory on the host
> filesystem cannot be satisfied by this or any reasonable
> implementation.  Users can add the cow filter on top if they really
> want writes and know what they are doing: instructions plus disclaimer
> about this are included in the man page.
> 
> As mentioned above, this implementation is based on the same idea as
> the appliance creation code in supermin.  Eventually we could replace
> that supermin code with this plugin, but there are some missing
> features that would need to be implemented first.
> ---

> +++ b/plugins/floppy/nbdkit-floppy-plugin.pod
> @@ -74,6 +74,7 @@ important, and it simplifies the implementation greatly.
>  L<nbdkit(1)>,
>  L<nbdkit-plugin(3)>,
>  L<nbdkit-file-plugin(1)>,
> +L<nbdkit-linuxdisk-plugin(1)>,
>  L<nbdkit-iso-plugin(1)>.
>  
>  =head1 AUTHORS
> diff --git a/plugins/iso/nbdkit-iso-plugin.pod b/plugins/iso/nbdkit-iso-plugin.pod
> index 4d9cf41..90e26f0 100644

Should this have as much text pointing to the other disk plugins...

> --- a/plugins/iso/nbdkit-iso-plugin.pod
> +++ b/plugins/iso/nbdkit-iso-plugin.pod
> @@ -17,8 +17,9 @@ read-only over the NBD protocol.
>  This plugin uses L<genisoimage(1)> or L<mkisofs(1)> to create the ISO
>  content.
>  
> -To create a virtual floppy disk instead of a CD, see
> -L<nbdkit-floppy-plugin(1)>.
> +To create a FAT-formatted virtual floppy disk instead of a CD, see
> +L<nbdkit-floppy-plugin(1)>.  To create a Linux compatible virtual
> +disk, see L<nbdkit-linuxdisk-plugin(1)>.

...as what you did here?

> +++ b/plugins/linuxdisk/filesystem.c

> +int
> +create_filesystem (struct virtual_disk *disk)
> +{

> +    /* Add 20% to the estimate to account for the overhead of
> +     * filesystem metadata.  Also set a minimum size.  Note we are
> +     * only wasting virtual space (since this will be stored sparsely
> +     * under $TMDIR) so we can be generous here.

TMPDIR

> +  /* Create the filesystem file. */
> +  tmpdir = getenv ("TMPDIR");
> +  if (tmpdir == NULL)
> +    tmpdir = LARGE_TMPDIR;
> +  if (asprintf (&filename, "%s/linuxdiskXXXXXX", tmpdir) == -1) {
> +    nbdkit_error ("asprintf: %m");
> +    return -1;
> +  }
> +
> +  disk->fd = mkstemp (filename);
> +  if (disk->fd == -1) {
> +    nbdkit_error ("mkstemp: %s: %m", filename);
> +    free (filename);
> +    return -1;
> +  }
> +  if (ftruncate (disk->fd, size) == -1) {
> +    nbdkit_error ("ftruncate: %s: %m", filename);
> +    free (filename);

Missing an unlink().  Also, does the caller properly use close(disk->fd)
on early exits like this, or does it not matter because we're just going
to exit()?

> +    return -1;
> +  }
> +
> +  /* Create the filesystem. */
> +  if (mke2fs (filename) == -1) {
> +    unlink (filename);
> +    free (filename);
> +    return -1;

Another example of early exit leaving disk->fd open.

> +  }
> +
> +  /* Open the filesystem. */
> +  err = ext2fs_open (filename, EXT2_FLAG_RW|EXT2_FLAG_64BITS, 0, 0,
> +                     unix_io_manager, &fs);
> +  unlink (filename);
> +  free (filename);
> +  if (err) {
> +    nbdkit_error ("ext2fs_open: %s", error_message (err));
> +    return -1;
> +  }
> +
> +  err = ext2fs_read_bitmaps (fs);
> +  if (err) {
> +    nbdkit_error ("ext2fs_read_bitmaps: %s", error_message (err));
> +    ext2fs_close (fs);
> +    return -1;
> +  }
> +
> +  /* First pass: This creates subdirectories in the filesystem and
> +   * also builds a list of files so we can identify hard links.
> +   */
> +  for (i = 0; i < nr_dirs; ++i) {
> +    if (visit (dirs[i], fs, EXT2_ROOT_INO, &files, &nr_files) == -1)
> +      return -1;
> +  }
> +
> +  if (nr_files > 0) {
> +    assert (files != NULL);
> +
> +    /* Sort the files by device and inode number to identify hard links. */
> +    qsort (files, nr_files, sizeof (struct file), compare_inodes);
> +
> +    /* Second pass: Copy/create the files. */
> +    last_file = NULL;
> +    for (i = 0; i < nr_files; ++i) {
> +      hardlinked = last_file && compare_inodes (last_file, &files[i]) == 0;
> +
> +      if (!hardlinked) {
> +        /* Normal case: creating a new inode. */
> +        last_file = &files[i];
> +        assert (!S_ISDIR (files[i].statbuf.st_mode));
> +
> +        if (e2copyfile (fs, &files[i]) == -1)
> +          return -1;
> +
> +        if (linuxdisk_debug_filesystem)
> +          nbdkit_debug ("%s: <%" PRIu32 ">/%s -> <%" PRIu32 ">",
> +                        files[i].type,
> +                        files[i].dir_ino, files[i].name, files[i].ino);
> +      }
> +      else {
> +        /* Creating a hard link to an existing inode. */
> +        if (e2hardlink (fs, last_file, &files[i]) == -1)
> +          return -1;

More early exits; this time, skipping...

> +
> +        if (linuxdisk_debug_filesystem)
> +          nbdkit_debug ("hard link: <%" PRIu32 ">/%s -> <%" PRIu32 ">",
> +                        files[i].dir_ino, files[i].name, last_file->ino);
> +      }
> +    }
> +
> +    for (i = 0; i < nr_files; ++i) {
> +      free (files[i].pathname);
> +      free (files[i].name);
> +    }
> +    free (files);

...the memory cleanup.  Again, I guess that's okay if we know the caller
is going to exit() on our failure.

> +  }
> +
> +  /* Close the filesystem.  Note we don't bother to sync it because
> +   * it's a private temporary file which only we will read.
> +   */
> +  ext2fs_close2 (fs, EXT2_FLAG_FLUSH_NO_SYNC);
> +  return 0;
> +}
> +
> +/* Use ‘du’ to estimate the size of the filesystem quickly.
> + *
> + * Typical output from ‘du -cs dir1 dir2’ is:
> + *
> + * 12345   dir1
> + * 34567   dir2
> + * 46912   total
> + *
> + * We ignore everything except the first number on the last line.
> + */
> +static int64_t
> +estimate_size (void)
> +{

> +  fprintf (fp, "du -c -k -s");

'du -c' is not specified by POSIX, but is reasonable to expect (we can't
build this plugin without ext2 support, which implies we are probably on
a Linux system to begin with).  If so, should we rely on having GNU
coreutils' du, where we can use -B1 to force output in bytes...

> +  /* Run the command. */
> +  nbdkit_debug ("%s", command);
> +  fp = popen (command, "r");
> +  free (command);
> +  if (fp == NULL) {
> +    nbdkit_error ("du command failed: %m");
> +    return -1;
> +  }
> +
> +  /* Ignore everything up to the last line. */
> +  len = 0;
> +  while (getline (&line, &len, fp) != -1)
> +    /* empty */;
> +  if (ferror (fp)) {
> +    nbdkit_error ("getline failed: %m");
> +    free (line);
> +    fclose (fp);

You should really use pclose(fp) here, to avoid libc leaking internal
data associated with the pipe.

> +    return -1;
> +  }
> +
> +  fclose (fp);

and again here, and maybe even check WIFEXITED() and WEXITSTATUS() of
the results.

> +
> +  /* Parse the last line. */
> +  if (sscanf (line, "%" SCNi64, &ret) != 1 || ret < 0) {
> +    nbdkit_error ("could not parse last line of output: %s", line);
> +    free (line);
> +    return -1;
> +  }
> +  free (line);
> +
> +  /* Result is in 1K blocks, convert it to bytes. */
> +  ret *= 1024;

...rather than having to scale the output of -k?

> +  return ret;
> +}
> +
> +static int
> +mke2fs (const char *filename)
> +{

> +  /* Run the command. */
> +  nbdkit_debug ("%s", command);
> +  r = system (command);
> +  free (command);
> +
> +  if (WIFEXITED (r) && WEXITSTATUS (r) != 0) {
> +    nbdkit_error ("mke2fs command failed with exit code %d", WEXITSTATUS (r));
> +    return -1;
> +  }
> +  else if (WIFSIGNALED (r)) {
> +    nbdkit_error ("mke2fs command was killed by signal %d", WTERMSIG (r));
> +    return -1;
> +  }
> +  else if (WIFSTOPPED (r)) {
> +    nbdkit_error ("mke2fs command was stopped by signal %d", WSTOPSIG (r));
> +    return -1;
> +  }

Is WIFSTOPPED() even a possible result of system()? 'man waitpid' says
this status is only possible for a call made with WUNTRACED, which seems
contradictory to system() running the command to completion without tracing.

> +
> +  return 0;
> +}
> +
> +static int
> +visit (const char *dir, ext2_filsys fs, ext2_ino_t dir_ino,
> +       struct file **files, size_t *nr_files)
> +{

> + error2:
> +  closedir (DIR);
> + error1:
> +  err = errno;
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wunused-result"
> +  chdir (origdir);
> +#pragma GCC diagnostic pop

Failure to return to the original directory may cause the rest of nbdkit
to misbehave, shouldn't we check the result after all and exit on error?

> +
> +/* To identify hard links we sort the files array by device and inode
> + * number (thus hard linked files will be next to each other after the
> + * sort).  Hence this odd sorting function.
> + */
> +static int
> +compare_inodes (const void *vp1, const void *vp2)
> +{
> +  const struct file *f1 = vp1;
> +  const struct file *f2 = vp2;
> +
> +  if (f1->statbuf.st_dev < f2->statbuf.st_dev)
> +    return -1;
> +  else if (f1->statbuf.st_dev > f2->statbuf.st_dev)
> +    return 1;
> +  else {
> +    if (f1->statbuf.st_ino < f2->statbuf.st_ino)
> +      return -1;
> +    else if (f1->statbuf.st_ino > f2->statbuf.st_ino)
> +      return 1;
> +    else
> +      return 0;

The sort is not a complete order (two files with the same inode have an
arbitrary order), but I don't think that matters.  And it's not that odd
to visit files in inode order - GNU coreutils has discovered that when
deleting a large directory, performance is often faster if you visit
files in inode order rather than in readdir() order.

> +static int
> +e2emptyinode (ext2_filsys fs, ext2_ino_t dir_ino,
> +              const char *name, const struct stat *statbuf,
> +              int ino_flags, ext2_ino_t *ino)
> +{
> +  errcode_t err;
> +  struct ext2_inode inode;
> +  int major_, minor_;
> +
> +  err = ext2fs_new_inode (fs, dir_ino, statbuf->st_mode, 0, ino);
> +  if (err) {
> +    nbdkit_error ("ext2fs_new_inode: %s", error_message (err));
> +    return -1;
> +  }
> +
> +  memset (&inode, 0, sizeof inode);
> +  inode.i_mode = statbuf->st_mode;
> +  inode.i_uid = statbuf->st_uid;
> +  inode.i_gid = statbuf->st_gid;
> +  inode.i_blocks = 0;
> +  inode.i_links_count = 1;
> +  /* XXX nanosecond times? */
> +  inode.i_ctime = statbuf->st_ctime;
> +  inode.i_atime = statbuf->st_atime;
> +  inode.i_mtime = statbuf->st_mtime;
> +  inode.i_size = 0;

Not for this patch, but now that newer Linux has the statx() call that
can expose birthtime, should we be worrying about that (well, ext4
supports birthtime, but if you are targetting just ext2, it probably
doesn't matter).

Otherwise looks reasonable.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org