[Libguestfs] Supporting sparse disks in nbdkit

Richard W.M. Jones rjones at redhat.com
Fri Mar 8 13:48:46 UTC 2019


I've posted a couple of patches towards the ultimate goal of
implementing NBD_CMD_BLOCK_STATUS / base:allocation in nbdkit.  Before
I can do the final patch I think we need to discuss how this would be
exposed to plugins since at the end of the day they need to implement
the feature.

Background reading:
 - preparatory patches:
     https://www.redhat.com/archives/libguestfs/2019-March/msg00013.html
     https://www.redhat.com/archives/libguestfs/2019-March/msg00016.html
 - NBD protocol, see in particular NBD_CMD_BLOCK_STATUS and
     NBD_REPLY_TYPE_BLOCK_STATUS:
     https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md

I think we shouldn't modify the pread() callback.  If we decide to
implement Structured Replies properly at some point in the future we
might need to do that, but it's not necessary now.

We could introduce a new call ‘extents’ to return the list of extents.
I believe it would look like this:

  struct nbdkit_extent {
    uint64_t offset;
    uint32_t length;  // XXX is 32 bit right here?
    uint32_t flag;    // hole, zero, data ... more in future?
  };

  int extents (void *handle, uint32_t count, uint64_t offset,
               uint32_t flags /* always 0? */,
               size_t *nr_extents, struct nbdkit_extent *extents);

The function is meant to scan [offset, offset+count-1] and return a
list of all extents overlapping this range, and their status (hole,
zero, data).

To make writing plugins easier we could say that extents don't need to
be returned in order, and may include extents which don't actually
overlap the requested range.  Also missing regions would mean "hole"
(makes writing the VDDK plugin easier), and adjacent extents of the
same type would be coalesced automatically.  But it's an error if
returned extents overlap each other.

nbdkit would need to do some massaging on this to get it into the
right format for NBD_CMD_BLOCK_STATUS.  (I'm very confused about what
NBD_CMD_FLAG_REQ_ONE is supposed to do.)

We will also need a corresponding ‘can_extents’, which is analogous to
‘can_write’ etc and is what would control the output of
NBD_OPT_{SET,LIST}_META_CONTEXT.

For nbdkit-file-filter:

- Fairly simple implementation using SEEK_HOLE/SEEK_DATA.

- Not sure how we detect zeroes without reading the file.

For nbdkit-memory-plugin:

 - Pretty simple implementation, which can even detect non-hole zeroes.

For VDDK:

 - VixDiskLib_QueryAllocatedBlocks can return allocated blocks, but
   doesn't return holes separately (they are assumed from what is
   omitted from the list).  No support for detecting zeroes that I can
   see.

Some existing filters would have to be modified to correctly adjust
‘extents’ offsets:

 - nbdkit-offset-filter

 - nbdkit-partition-filter

 - nbdkit-truncate-filter (? maybe not)

 - nbdkit-xz-filter is complicated: XZ files support sparseness so in
   theory we should try to return this data

Your thoughts on this appreciated,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top




More information about the Libguestfs mailing list