[Libguestfs] [PATCH libnbd] nbdfuse: New tool to present a network block device in a FUSE filesystem.

Mon Oct 14 21:06:27 UTC 2019

On 10/12/19 9:21 AM, Richard W.M. Jones wrote:
> This program allows you to turn a network block device source into a
> FUSE filesystem containing a virtual file:
> 
>    $ nbdkit memory 128M
>    $ mkdir mp
>    $ nbdfuse mp/ramdisk nbd://localhost &
>    $ ls -l mp
>    total 0
>    -rw-rw-rw-. 1 rjones rjones 134217728 Oct 12 15:09 ramdisk
>    $ dd if=/dev/urandom bs=1M count=128 of=mp/ramdisk conv=notrunc,nocreat
>    128+0 records in
>    128+0 records out
>    134217728 bytes (134 MB, 128 MiB) copied, 3.10171 s, 43.3 MB/s
>    $ fusermount -u mp
> 

Cool!

> There are still some shortcomings, such as lack of zero and trim
> support.  These are documented in the TODO file.
> 

> +++ b/README
> @@ -82,6 +82,8 @@ Optional:
>   
>    * Python >= 3.3 to build the Python 3 bindings and NBD shell (nbdsh).
>   
> + * FUSE to build the nbdfuse program.

Minimum version?

> +++ b/docs/libnbd.pod
> @@ -840,6 +840,7 @@ L<https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md>.
>   
>   L<libnbd-security(3)>,
>   L<nbdsh(1)>,
> +L<nbdfuse(1)>,

Worth sorting these two alphabetically?

>   L<qemu(1)>.
>   

> +++ b/fuse/nbdfuse.c

> +
> +#define FUSE_USE_VERSION 26
> +
> +#include <fuse.h>
> +#include <fuse_lowlevel.h>
> +
> +#include <libnbd.h>
> +
> +#define MAX_REQUEST_SIZE (64 * 1024 * 1024)

Although this works with nbdkit, qemu-nbd doesn't like more than 32M. 
(We really should find time to teach nbdkit/libnbd about block size 
reporting, but that's a bigger project...)

> +
> +static struct nbd_handle *nbd;
> +static bool readonly = false;

Looks funny to initialize a static variable to 0 in a block of static 
variables with no initializers (C guarantees 0-initialization even if 
you aren't explicit).

> +static char *mountpoint, *filename;
> +static const char *pidfile;
> +static char *fuse_options;
> +static struct fuse_chan *ch;
> +static struct fuse *fuse;
> +static struct timespec start_t;
> +static uint64_t size;
> +

> +static void __attribute__((noreturn))
> +usage (FILE *fp, int exitcode)
> +{
> +  fprintf (fp,
> +"    nbdfuse [-r] MOUNTPOINT[/FILENAME] URI\n"

Do we want to use any #ifdefs to avoid advertising URI support on the 
command line when libnbd is compiled without libxml2?

> +"Other modes:\n"
> +"    nbdfuse MOUNTPOINT[/FILENAME] --command CMD [ARGS ...]\n"
> +"    nbdfuse MOUNTPOINT[/FILENAME] --socket-activation CMD [ARGS ...]\n"
> +"    nbdfuse MOUNTPOINT[/FILENAME] --fd N\n"
> +"    nbdfuse MOUNTPOINT[/FILENAME] --tcp HOST PORT\n"
> +"    nbdfuse MOUNTPOINT[/FILENAME] --unix SOCKET\n"

No mention of nbdfuse -o or -P.

> +"\n"
> +"Please read the nbdfuse(1) manual page for full usage.\n"
> +);
> +  exit (exitcode);

nbdfuse --help > /dev/full

exits with status 0 because we didn't check for error on stdout/stderr. 
That's a corner case, and many programs don't care about it, but it's 
worth deciding if we want to care.

> +}
> +
> +static void
> +display_version (void)
> +{
> +  printf ("%s %s\n", PACKAGE_NAME, PACKAGE_VERSION);
> +}
> +
> +static void
> +fuse_help (const char *prog)
> +{
> +  static struct fuse_operations null_operations;
> +  const char *tmp_argv[] = { prog, "--help", NULL };
> +  fuse_main (2, (char **) tmp_argv, &null_operations, NULL);
> +  exit (EXIT_SUCCESS);
> +}
> +
> +static bool
> +is_directory (const char *path)
> +{
> +  struct stat statbuf;
> +
> +  if (stat (path, &statbuf) == -1)
> +    return false;
> +  return S_ISDIR (statbuf.st_mode);

Accepts a symlink-to-directory, but that's fine by me.

> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  enum {
> +    MODE_URI,
> +    MODE_COMMAND,
> +    MODE_FD,
> +    MODE_SOCKET_ACTIVATION,
> +    MODE_TCP,
> +    MODE_UNIX,
> +  } mode = MODE_URI;
> +  enum {
> +    HELP_OPTION = CHAR_MAX + 1,
> +    FUSE_HELP_OPTION,
> +  };
> +  /* Note the "+" means we stop processing as soon as we get to the
> +   * first non-option argument (the mountpoint) and then we parse the
> +   * rest of the command line without getopt.
> +   */
> +  const char *short_options = "+o:P:rV";
> +  const struct option long_options[] = {
> +    { "fuse-help",          no_argument,       NULL, FUSE_HELP_OPTION },
> +    { "help",               no_argument,       NULL, HELP_OPTION },
> +    { "pidfile",            required_argument, NULL, 'P' },
> +    { "pid-file",           required_argument, NULL, 'P' },
> +    { "readonly",           no_argument,       NULL, 'r' },
> +    { "read-only",          no_argument,       NULL, 'r' },
> +    { "version",            no_argument,       NULL, 'V' },

Worth a long-option synonym for -o?

> +  /* The next parameter is either a URI or a mode switch. */
> +  if (strcmp (argv[optind], "--command") == 0 ||
> +      strcmp (argv[optind], "--cmd") == 0) {
> +    mode = MODE_COMMAND;
> +    optind++;
> +  }

Is it worth using getopt_long() in this section for allowing unambiguous 
prefix spellings (--c for example) and/or a short option (-c for example)?

> +  else if (strcmp (argv[optind], "--socket-activation") == 0 ||
> +           strcmp (argv[optind], "--systemd-socket-activation") == 0) {
> +    mode = MODE_SOCKET_ACTIVATION;
> +    optind++;
> +  }

On the same theme, '--socket' as a synonym is easier to type than 
--socket-activation.

> +  else if (strcmp (argv[optind], "--fd") == 0) {
> +    mode = MODE_FD;
> +    optind++;
> +  }
> +  else if (strcmp (argv[optind], "--tcp") == 0) {
> +    mode = MODE_TCP;
> +    optind++;
> +  }
> +  else if (strcmp (argv[optind], "--unix") == 0) {
> +    mode = MODE_UNIX;
> +    optind++;
> +  }
> +  else if (argv[optind][0] == '-') {
> +    fprintf (stderr, "%s: unknown mode: %s\n\n", argv[0], argv[optind]);
> +    usage (stderr, EXIT_FAILURE);
> +  }
> +
> +  /* Check there are enough parameters following given the mode. */
> +  switch (mode) {
> +  case MODE_URI:
> +  case MODE_FD:
> +  case MODE_UNIX:
> +    if (argc - optind != 1)
> +      usage (stderr, EXIT_FAILURE);
> +    break;
> +  case MODE_TCP:
> +    if (argc - optind != 2)
> +      usage (stderr, EXIT_FAILURE);
> +    break;
> +  case MODE_COMMAND:
> +  case MODE_SOCKET_ACTIVATION:
> +    if (argc - optind < 1)
> +      usage (stderr, EXIT_FAILURE);
> +    break;
> +  }
> +  /* At this point we know the command line is valid, and so can start
> +   * opening FUSE and libnbd.
> +   */
> +
> +  /* Create the libnbd handle. */
> +  nbd = nbd_create ();
> +  if (nbd == NULL) {
> +    fprintf (stderr, "%s\n", nbd_get_error ());
> +    exit (EXIT_FAILURE);
> +  }
> +
> +  /* Connect to the NBD server synchronously. */
> +  switch (mode) {

> +
> +  case MODE_FD:
> +    if (sscanf (argv[optind], "%d", &fd) != 1) {

Overflow is undetected.

> +      fprintf (stderr, "%s: could not parse file descriptor: %s\n\n",
> +               argv[0], argv[optind]);
> +      exit (EXIT_FAILURE);
> +    }
> +    if (nbd_connect_socket (nbd, fd) == -1) {
> +      fprintf (stderr, "%s\n", nbd_get_error ());
> +      exit (EXIT_FAILURE);
> +    }
> +    break;
> +

> +
> +  /* Create the FUSE args. */
> +  if (fuse_opt_add_arg (&fuse_args, argv[0]) == -1) {
> +  fuse_opt_error:
> +    perror ("fuse_opt_add_arg");
> +    exit (EXIT_FAILURE);
> +  }
> +
> +  if (fuse_options) {
> +    if (fuse_opt_add_arg (&fuse_args, "-o") == -1 ||
> +        fuse_opt_add_arg (&fuse_args, fuse_options) == -1)
> +      goto fuse_opt_error;
> +  }
> +
> +  /* Create the FUSE mountpoint. */
> +  ch = fuse_mount (mountpoint, &fuse_args);
> +  if (ch == NULL) {
> +    fprintf (stderr,
> +             "%s: fuse_mount failed: see error messages above", argv[0]);
> +    exit (EXIT_FAILURE);
> +  }
> +
> +  /* Set F_CLOEXEC on the channel.  Some versions of libfuse don't do
> +   * this.
> +   */
> +  fd = fuse_chan_fd (ch);
> +  if (fd >= 0) {
> +    int flags = fcntl (fd, F_GETFD, 0);
> +    if (flags >= 0)
> +      fcntl (fd, F_SETFD, flags & ~FD_CLOEXEC);

Doesn't check for (unlikely) error.

> +  }
> +
> +  /* Create the FUSE handle. */
> +  fuse = fuse_new (ch, &fuse_args,
> +                   &fuse_operations, sizeof fuse_operations, NULL);
> +  if (!fuse) {
> +    perror ("fuse_new");
> +    exit (EXIT_FAILURE);
> +  }
> +  fuse_opt_free_args (&fuse_args);
> +
> +  /* Catch signals since they can leave the mountpoint in a funny
> +   * state.  To exit the program callers must use ‘fusermount -u’.  We
> +   * also must be careful not to call exit(2) in this program until we
> +   * have unmounted the filesystem below.
> +   */
> +  memset (&sa, 0, sizeof sa);
> +  sa.sa_handler = SIG_IGN;
> +  sa.sa_flags = SA_RESTART;
> +  sigaction (SIGPIPE, &sa, NULL);
> +  sigaction (SIGINT, &sa, NULL);
> +  sigaction (SIGQUIT, &sa, NULL);
> +
> +  /* Ready to serve, write pidfile. */
> +  if (pidfile) {
> +    fp = fopen (pidfile, "w");
> +    if (fp) {
> +      fprintf (fp, "%ld", (long) getpid ());
> +      fclose (fp);
> +    }
> +  }
> +
> +  /* Enter the main loop. */
> +  r = fuse_loop (fuse);
> +  if (r != 0)
> +    perror ("fuse_loop");
> +
> +  /* Close FUSE. */
> +  fuse_unmount (mountpoint, ch);
> +  fuse_destroy (fuse);
> +
> +  /* Close NBD handle. */
> +  nbd_close (nbd);
> +
> +  free (mountpoint);
> +  free (filename);
> +  free (fuse_options);
> +
> +  exit (r == 0 ? EXIT_SUCCESS : EXIT_FAILURE);

Looks deceptively simple :)

> +}
> +
> +/* Wraps calls to libnbd functions and automatically checks for a
> + * returns errors in the format required by FUSE.  It also prints out

Missing a word or two after 'checks for a'

> + * the full error message on stderr, so that we don't lose it.
> + */
> +#define CHECK_NBD_ERROR(CALL)                                   \
> +  do { if ((CALL) == -1) return check_nbd_error (); } while (0)
> +static int
> +check_nbd_error (void)
> +{
> +  int err;
> +
> +  fprintf (stderr, "%s\n", nbd_get_error ());
> +  err = nbd_get_errno ();
> +  if (err != 0)
> +    return -err;
> +  else
> +    return -EIO;
> +}
> +
> +static int
> +nbdfuse_getattr (const char *path, struct stat *statbuf)
> +{
> +  const int mode = readonly ? 0444 : 0666;
> +
> +  memset (statbuf, 0, sizeof (struct stat));
> +
> +  /* We're probably making some Linux-specific assumptions here, but
> +   * this file is not compiled on non-Linux systems.
> +   */
> +  statbuf->st_atim = start_t;
> +  statbuf->st_mtim = start_t;
> +  statbuf->st_ctim = start_t;
> +  statbuf->st_uid = geteuid ();
> +  statbuf->st_gid = getegid ();

Comment is interesting if true.  However, a google search for 'man 
fuse_main' pulls up https://man.openbsd.org/fuse_main.3 as its first 
hit, so I think FUSE has graduated to non-Linux systems, so we may have 
to revisit this later.

> +static int
> +nbdfuse_readdir (const char *path, void *buf,
> +                 fuse_fill_dir_t filler,
> +                 off_t offset, struct fuse_file_info *fi)
> +{
> +  if (strcmp (path, "/") != 0)
> +    return -ENOENT;
> +
> +  filler (buf, ".", NULL, 0);
> +  filler (buf, "..", NULL, 0);
> +  filler (buf, filename, NULL, 0);
> +

Does FUSE have a way to populate d_type during readdir (DT_DIR for '.', 
'..', DT_REG for filename)?

> +static int
> +nbdfuse_write (const char *path, const char *buf,
> +               size_t count, off_t offset,
> +               struct fuse_file_info *fi)
> +{
> +  /* Probably shouldn't happen because of nbdfuse_open check. */
> +  if (readonly)
> +    return -EACCES;

Is EROFS any better here?

> +++ b/fuse/nbdfuse.pod
> @@ -0,0 +1,262 @@
> +=head1 NAME
> +
> +nbdfuse - present a network block device in a FUSE filesystem
> +
> +=head1 SYNOPSIS
> +
> + nbdfuse [-o FUSE-OPTION] [-P PIDFILE] [-r]
> +         MOUNTPOINT[/FILENAME] URI

This synopsis looks better than the one in usage().

> +
> +The NBD device itself can be local or remote and is specified by an
> +NBD URI (like C<nbd://localhost>, see L<nbd_connect_uri(3)>) or
> +various other modes.
> +
> +Use C<fusermount -u MOUNTPOINT> to unmount the filesystem after you
> +have used it.

Does umount(8) call into fusermount correctly?

> +
> +This program is similar in concept to L<nbd-client(8)> (which turns
> +NBD into F</dev/nbdX> device nodes), except:

Is it worth mentioning that qemu-nbd(8) alongside nbd-client(8)?

> +
> +=over 4
> +
> +=item *
> +
> +nbd-client is faster because it uses a special kernel module
> +
> +=item *
> +
> +nbd-client requires root, but nbdfuse can be used by any user
> +
> +=item *
> +
> +nbdfuse virtual files can be mounted anywhere in the filesystem
> +
> +=item *
> +
> +nbdfuse uses libnbd to talk to the NBD server
> +
> +=item *
> +
> +nbdfuse requires FUSE support in the kernel
> +
> +=back

Decent list.

> +
> +=head1 EXAMPLES
> +
> +=head2 Present a remote NBD server as a local file
> +
> +If there is a remote NBD server running on C<example.com> at the
> +default NBD port number (10809) then you can turn it into a local file
> +by doing:
> +
> + $ mkdir dir
> + $ nbdfuse dir nbd://example.com &
> + $ ls -l dir/
> + total 0
> + -rw-rw-rw-. 1 nbd nbd 1073741824 Jan  1 10:10 nbd
> +
> +The file is called F<dir/nbd> and you can read and write to it as if
> +it is a normal file.  Note that writes to the file will write to the
> +remote NBD server.  After using it, unmount it:
> +
> + $ fusermount -u dir
> + $ rmdir dir
> +
> +=head2 Use nbdkit to create a file backed by a temporary RAM disk
> +
> +L<nbdkit(1)> has an I<-s> option allowing it to serve over
> +stdin/stdout.  You can combine this with nbdfuse as follows:
> +
> + $ mkdir dir
> + $ nbdfuse dir/ramdisk --command nbdkit -s memory 1G &
> + $ ls -l dir/
> + total 0
> + -rw-rw-rw-. 1 nbd nbd 1073741824 Jan  1 10:10 ramdisk
> + $ dd if=/dev/urandom bs=1M count=100 of=mp/ramdisk conv=notrunc,nocreat
> + 100+0 records in
> + 100+0 records out
> + 104857600 bytes (105 MB, 100 MiB) copied, 2.08319 s, 50.3 MB/s
> +
> +When you have finished with the RAM disk, you can unmount it as below
> +which will cause nbdkit to exit and the RAM disk contents to be
> +discarded:
> +
> + $ fusermount -u dir
> + $ rmdir dir

What a fun way to use memory :)

> +
> +=head2 Use qemu-nbd to read and modify a qcow2 file
> +
> +L<qemu-nbd(8)> cannot serve over stdin/stdout, but it can use systemd
> +socket activation.  You can combine this with nbdfuse and use it to
> +open any file format which qemu understands:
> +
> + $ mkdir dir
> + $ nbdfuse dir/file.raw \
> +           --socket-activation qemu-nbd -f qcow2 file.qcow2 &
> + $ ls -l dir/
> + total 0
> + -rw-rw-rw-. 1 nbd nbd 1073741824 Jan  1 10:10 file.raw
> +
> +File F<dir/file.raw> is in raw format, backed by F<file.qcow2>.  Any
> +changes made to F<dir/file.raw> are reflected into the qcow2 file.  To
> +unmount the file do:
> +
> + $ fusermount -u dir
> + $ rmdir dir
> +

The real power shines through - we have used the FUSE kernel module for 
user-space mounting of a qcow2 image, instead of the nbd kernel module 
for root-only mounting of a qcow2 image ;)

> +Some potentially useful FUSE options:
> +
> +=over 4
> +
> +=item B<-o> B<allow_other>
> +
> +Allow other users to see the filesystem.  This option has no effect
> +unless you enable it globally in F</etc/fuse.conf>.
> +
> +=item B<-o> B<kernel_cache>
> +
> +Allow the kernel to cache files (reduces the number of reads that have
> +to go through the L<libnbd(3)> API).  This is generally a good idea if
> +you can afford the extra memory usage.
> +
> +=item B<-o> B<uid=>N B<-o> B<gid=>N
> +
> +Use these options to map UIDs and GIDs.

Does this line up with the stats we reported earlier in getattr()?

> +
> +=back
> +
> +=item B<-P> PIDFILE
> +
> +=item B<--pidfile> PIDFILE
> +
> +When nbdfuse is ready to serve, write the nbdfuse process ID (PID) to
> +F<PIDFILE>.  This can be used in scripts to wait until nbdfuse is
> +ready.  Note you mustn't try to kill nbdfuse.  Use C<fusermount -u> to
> +unmount the mountpoint which will cause nbdfuse to exit cleanly.
> +
> +=item B<-r>
> +
> +=item B<--readonly>
> +
> +Access the network block device read-only.  The virtual file will have
> +read-only permissions, and any writes will return errors.
> +
> +=item B<--socket-activation> CMD [ARGS ...]
> +
> +Select systemd socket activation mode.  This is similar to
> +I<--command>, but is used for servers like L<qemu-nbd(8)> which
> +support systemd socket activation.  See L</EXAMPLES> above and
> +L<nbd_connect_systemd_socket_activation(3)>.
> +
> +=item B<--tcp> HOST PORT
> +
> +Select TCP mode.  Connect to an NBD server on a host and port over an
> +unencrypted TCP socket.  See also L<nbd_connect_tcp(3)>.

How hard would it be to support encryption?  Obviously, the fuse-mounted 
file will be unencrypted, but libnbd connect to an encrypted nbd server 
could prove useful.

> +
> +=item B<--unix> SOCKET
> +
> +Select Unix mode.  Connect to an NBD server on a Unix domain socket.
> +See also L<nbd_connect_unix(3)>.
> +
> +=item B<-V>
> +
> +=item B<--version>
> +
> +Display the package name and version and exit.
> +
> +=back
> +
> +=head1 NOTES
> +
> +=head2 Loop mounting
> +
> +It is tempting (and possible) to loop mount the file.  However this
> +will be very slow and may sometimes deadlock.  Better alternatives are
> +to use either L<nbd-client(8)>, or more securely L<libguestfs(3)>,

Worth mentioning qemu-nbd(8) alongside nbd-client(8)?

> +L<guestfish(1)> or L<guestmount(1)> which can all access NBD servers.
> +
> +=head2 As a way to access NBD servers
> +
> +You can use this to access NBD servers, but it is usually better (and
> +definitely much faster) to use L<libnbd(3)> directly instead.  To
> +access NBD servers from the command line, look at L<nbdsh(1)>.
> +

Overall looks like a fun wrapper, to demonstrate how many layers we can 
shuffle data through to produce/consume it in the format of interest ;)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org