[Libguestfs] [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.

Richard W.M. Jones rjones at redhat.com
Mon Jul 6 20:44:11 UTC 2020


On Mon, Jul 06, 2020 at 01:58:35PM -0500, Eric Blake wrote:
> so if we like the idea, we'd have to allow the user to specify
> mutually-exclusive config parameters: either file= to something
> within the file, or exportname=on to allow the client to choose,
> where we then validate that exactly one of those two options is
> configured.

Right, I think that would be the way to do it, to make it fully
general, and to avoid any possibility that someone turns on the
exportname feature by accident (which could be a security issue).

> >+++ b/plugins/tar/Makefile.am
> >@@ -1,5 +1,5 @@
> >  # nbdkit
> >-# Copyright (C) 2013-2020 Red Hat Inc.
> >+# Copyright (C) 2018-2020 Red Hat Inc.
> 
> Interesting change in dates.

I think that was a copy/paste error effectively.  I copied the
Makefile from an existing C plugin, rather than attempting to rewrite
the old Perl-based one.

> >+static int
> >+tar_config_complete (void)
> >+{
> >+  if (tarfile == NULL || file == NULL) {
> >+    nbdkit_error ("you must supply the tar=<TARFILE> and file=<FILENAME> "
> >+                  "parameters");
> >+    return -1;
> >+  }
> >+
> >+  return 0;
> >+}
> >+
> >+#define tar_config_help \
> >+  "[tar=]<TARBALL>     (required) The name of the tar file.\n" \
> >+  "file=<FILENAME>                The path inside the tar file to server."
> 
> Should '(required)' be listed on both lines?  (Not necessarily on
> the second, if we go with the exportname=on option)

Yes, that's a bug (until exportname is implemented) - will fix soon.

> >+
> >+static int
> >+tar_get_ready (void)
> >+{
> >+  FILE *fp;
> >+  CLEANUP_FREE char *cmd = NULL;
> >+  size_t len = 0;
> >+  bool scanned_ok;
> >+  char s[256];
> >+
> >+  /* Construct the tar command to examine the tar file. */
> >+  fp = open_memstream (&cmd, &len);
> >+  if (fp == NULL) {
> >+    nbdkit_error ("open_memstream: %m");
> >+    return -1;
> >+  }
> >+  fprintf (fp, "LANG=C tar --no-auto-compress -tRvf ");
> 
> Use of --no-auto-compress is specific to GNU tar, do we care?

I copied that from the existing Perl plugin, but note that -R is also
not supported by non-GNU tar.

> Should we allow a 'tar=/path/to/tar' parameter during .config to
> allow a user to point to an alternative tar?

Yes, this is a potential enhancement, especially where GNU tar might
be called "gtar" (as it is on FreeBSD).

> >+  shell_quote (tarfile, fp);
> >+  fputc (' ', fp);
> >+  shell_quote (file, fp);
> >+  if (fclose (fp) == EOF) {
> >+    nbdkit_error ("memstream failed: %m");
> >+    return -1;
> >+  }
> >+
> >+  /* Run the command and read the first line of the output. */
> >+  nbdkit_debug ("%s", cmd);
> >+  fp = popen (cmd, "r");
> >+  if (fp == NULL) {
> >+    nbdkit_error ("tar: %m");
> >+    return -1;
> >+  }
> >+  scanned_ok = fscanf (fp, "block %" SCNu64 ": %*s %*s %" SCNu64,
> >+                       &offset, &size) == 2;
> 
> fscanf() parsing an integer is subject to undefined behavior on
> overflow.  (Yes, I know it is a pre-existing and prevalent issue...)
> 
> >+  /* We have to now read and discard the rest of the output until EOF. */
> >+  while (fread (s, sizeof s, 1, fp) > 0)
> >+    ;
> >+  if (pclose (fp) != 0) {
> >+    nbdkit_error ("tar subcommand failed, "
> >+                  "check that the file really exists in the tarball");
> >+    return -1;
> >+  }
> 
> If we add exportname support, we'll have to defer checking for file
> existence until during .open.
> 
> >+
> >+  if (!scanned_ok) {
> >+    nbdkit_error ("unexpected output from the tar subcommand");
> >+    return -1;
> >+  }
> >+
> >+  /* Adjust the offset: Add 1 for the tar header, then multiply by the
> >+   * block size.
> >+   */
> >+  offset = (offset+1) * 512;
> 
> Is 512 always correct, or can a tar created with -b > 1 cause issues?

Nir actually raised this issue before and I convinced myself at the
time that the block size is always 512 bytes.  I didn't notice the -b
option til now however, but I don't know if it affects this block size
of not, but I'd say it's vanishingly unlikely that there exist any OVA
files in the wild that use this feature (and if they do, even less
likely that anything can read them).

> >+  h->fd = open (tarfile, (readonly ? O_RDONLY : O_RDWR) | O_CLOEXEC);
> 
> Should we really be opening a different fd per client, or can we get
> away with opening only one fd during .get_ready and sharing it among
> all clients?

The original code actually opened the file once, but it had the
disadvantage that you had to always open the file for writes (because
you don't know if the client will request read-only or not).  But that
breaks if the tar file itself cannot be opened for writing.

So I believe it is necessary to open it per handle, unless you can
think of another way to get around that..

> The unguarded use of O_CLOEXEC makes sense, but may cause
> compilation issues on Haiku.

I'll check this.

> >+/* Write data to the file. */
> >+static int
> >+tar_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offs)
> >+{
> >+  struct handle *h = handle;
> >+
> >+  offs += offset;
> >+
> >+  while (count > 0) {
> >+    ssize_t r = pwrite (h->fd, buf, count, offs);
> 
> Does this always work even when the tar file was created with sparse
> file support?  For that matter, if the tar file was created with
> sparse file contents, are we even guaranteed that a contiguous
> offset of the tar file is blindly usable as the data to serve?

We've used the same technique on thousands of OVA files without
hitting any problem so far, so I guess that ...

> $ truncate --size=1M large
> $ echo 'hi' >> large
> $ tar cSf f.tar large
> $ ls -l large f.tar
> -rw-rw-r--. 1 eblake eblake   10240 Jul  6 13:57 f.tar
> -rw-rw-r--. 1 eblake eblake 1048579 Jul  6 13:48 large
> $ LANG=C tar --no-auto-compress -tRvf f.tar large
> block 0: -rw-rw-r-- eblake/eblake 1048579 2020-07-06 13:48 large
> block 2: ** Block of NULs **
> 
> Yep, we need to special-case sparse tar files :(

.. OVAs don't use this feature.  I wonder if it's a GNU feature?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/




More information about the Libguestfs mailing list