[Libguestfs] [nbdkit PATCH 0/2] bind .zero to more languages

Tue Jan 24 15:16:45 UTC 2017

On Tue, Jan 24, 2017 at 08:51:05AM -0600, Eric Blake wrote:
> On 01/24/2017 05:38 AM, Richard W.M. Jones wrote:
> > On Mon, Jan 23, 2017 at 09:13:23PM -0600, Eric Blake wrote:
> >> Begin the language binding followups to my new .zero callback, since
> >> Rich was indeed correct that we want them.
> >>
> >> I'm more familiar with python and perl (at least to the point that
> >> I was able to modify the appropriate example files and prove to
> >> myself that the bindings worked), so I've started with those.
> >> I'm less familiar with ruby and ocaml, so I've left those for
> >> tomorrow (it will rely heavily on copy-and-paste); it is also a
> >> chance to review if my decision of requiring a return value in
> >> order to trigger the graceful fallback is the best approach.
> > 
> > The other methods (eg. trim) deal with this by having a separate
> > method called "can_*" (eg. can_trim).  Can we have a can_zero callback
> > instead?
> 
> I thought about that, but there's several problems:
> 
> 1) the can_trim and similar functions are required in order to alter
> what the server advertises to the guest during the handshake phase; at
> which point the guest is out-of-spec if it tries to use the command that
> the plugin did not advertise. But while those other commands have no
> real impact to the amount of wire traffic, we want to always advertise
> WRITE_ZEROES support to the client (because it is so much more efficient
> to pass a WRITE_ZEROES command over the wire than a regular WRITE with a
> payload of zeroes), even when the plugin doesn't support it.  Since
> having a can_zero callback does not affect what we advertise, the only
> other reason to implement it would be if can affect what we ask the
> plugin to do.
> 
> 2) the fallocate() system call does not have any introspectible way to
> tell if it will work on a given file system, other than to try it and
> see if it fails with EOPNOTSUPP.  On the C side, I thought it was
> elegant to let the errno value be a deciding factor on whether the
> attempt was fatal or whether to gracefully fall back to calling the
> pwrite callback, while still centralizing the calloc() logic in the
> plugin.c file rather than making every single C implementation of .zero
> repeat the logic.  Since it is not easy to introspect whether
> fallocate() will fail on a given file, it's much harder to write a
> can_zero() function than it is to just try fallocate() and have a
> graceful fallback.
> 
> 3) the NBD_CMD_WRITE_ZEROES command has a flag value that affects
> operation.  The existing pread and pwrite callbacks do NOT have a flag
> argument (arguably, a hole in those specifications if we ever come up
> with a reason that we need to pass flags to those commands, but not
> necessarily something to worry about today).  But the write zeroes
> command, per spec, has a way to state whether the server may use trim
> (provided that the hole reads back as zero) or must leave things
> allocated, giving the client some rudimentary control over whether the
> server will create a sparse file.  The spec says that the server is not
> required to trim (so falling back to pwrite is always appropriate), but
> the way the spec is worded, using fallocate(FALLOC_FL_PUNCH_HOLE) is
> only appropriate when the may_trim flag is set in the existing C
> interface (which in turn corresponds to a lack of the
> NBD_CMD_FLAG_NO_HOLE flag on the wire).  Having a boolean can_zero()
> callback is insufficient to tell whether the plugin would behave
> differently for may_trim=0 vs. may_trim=1
> 
> So here's some ideas for alternative implementations in the language
> bindings:
> 
> 1) give the binding a way to request graceful fallback to write (what I
> implemented in this version)
> 
> 2) provide a completely different interface to the language bindings,
> but keeping the C interface as already committed.  The language binding
> would only implement a single new entry point, perhaps named
> trim_reads_as_zero().  On the C side of the binding, when plugin.c calls
> lang_zero(, may_trim=1), C checks if trim_reads_as_zero() is defined and
> returns true, and if so calls the existing trim() binding; if
> trim_reads_as-zero() is not defined, or returns false, or trim() is not
> defined, then C calls the existing pwrite() binding.  Thus, languages do
> NOT need a zero() binding.  But doing this may lose opportunities for
> optimizing an all-zero write even where fallocate(FALLOC_FL_PUNCH_HOLE)
> is inappropriate, as newer Linux has added
> fallocate(FALLOC_FL_ZERO_RANGE) [hmm, maybe I should do a followup patch
> to file.c to add the use of that option, but that's a story for another
> patch]
> 
> 3) adjust the C interface to match the bindings interface proposed in 2,
> by making trim_reads_as_zero() part of the overall callback interface.
> We haven't released yet, so now's the time to tweak the interfaces, but
> like option 2, it makes it harder for an implementation to take
> advantage of faster ways to write large blocks of zeroes while still
> avoiding holes.
> 
> 4) document that the language bindings can't use automatic fallback to
> write - they must implement it themselves.  But unlike C having to do a
> verbose reimplementation of a call to calloc() and free(), the languages
> we bind to tend to have more compact representations for quickly
> creating a buffer of all zeros (perl's "\0" x count, python's
> bytearray(count), etc).
> 
> I'm probably leaning towards number 4, since you expressed concern about
> number 1, but would love further opinions before I spend any more time
> churning out patches that aren't needed.

Another option -- simpler, I think -- is to review the way that
plugins set errors.

Currently there is only one call (nbdkit_error) which takes a string.
Thus if the plugin error comes with an errno, it is effectively lost
(or at best, turned into a string, but not in a way that callers can
use).

However we could add another API:

  nbdkit_set_errno (int errno);

This allows the plugin to send the errno back to the core code
explicitly.  It's effectively stored as a global in the core code.

Note that the correct calling sequence from C is:

  nbdkit_set_error (errno);  // this call is optional and may be omitted
  nbdkit_error ("oops: %s", strerror (errno));
  return -1;

Mapping these to the language bindings should be easy.  We just need
to add a binding to nbdkit_set_error, but the exception mechanism for
indicating an error and sending the string can stay the same.

In Perl, old code to raise an error would look like:

  sub pread
  {
     my $h = shift;
     my $count = shift;
     my $offset = shift;
     my $ret;
     read ($FH, $ret, $count, $offset) || die "read: $!"
     return $ret;
  }

(Note that die does not exit the program, it raises an exception which
is caught and turned into nbdkit_error in the Perl nbdkit plugin).

This would continue to work fine, but new code which cared about the
errno could do this instead:

  sub pread
  {
     my $h = shift;
     my $count = shift;
     my $offset = shift;
     my $ret;
     read ($FH, $ret, $count, $offset) || {
       nbdkit_set_errno (POSIX::errno ());
       die "read: $!"
     }
     return $ret;
  }

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org