[Libguestfs] [PATCH] file: Zero support for block devices and NFS 4.2

Eric Blake eblake at redhat.com
Mon Jul 30 15:01:47 UTC 2018


On 07/29/2018 07:04 AM, Nir Soffer wrote:
> If we may not trim, we tried ZERO_RANGE, but this is not well supported
> yet, for example it is not available on NFS 4.2. ZERO_RANGE and
> PUNCH_HOLE are supported now on block devices, but not on RHRL 7, so we
> fallback to slow manual zeroing there.
> 
> Change the logic to support block devices on RHEL 7, and file systems
> that do not support ZERO_RANGE.
> 
> The new logic:
> - If we may trim, try PUNCH_HOLE
> - If we can zero range, Try ZERO_RANGE
> - If we can punch hole and fallocate, try fallocate(PUNCH_HOLE) followed
>    by fallocate(0).
> - If underlying file is a block device, try ioctl(BLKZEROOUT)
> - Otherwise fallback to manual zeroing
> 
> The handle keeps now the underlying file capabilities, so once we
> discover that an operation is not supported, we never try it again.
> 

> 
> Issues:
> - ioctl(BLKZEROOUT) will fail if offset or count are not aligned to
>    logical sector size. I'm not sure if nbdkit or qemu-img ensure this.

qemu-img tends to default to 512-byte alignment, but can be told to 
follow 4k alignment instead. nbdkit includes a filter that can force 4k 
alignment on top of any plugin, regardless of client alignment.

Someday, I'd like to enhance nbdkit to support block size advertisement 
(qemu-img already knows how to honor such advertisements). It's on my 
todo queue, but lower in priority than getting incremental backups 
working in libvirt.

> - Need testing with NFS
> ---
>   plugins/file/file.c | 126 ++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 103 insertions(+), 23 deletions(-)

> +++ b/plugins/file/file.c
> @@ -33,6 +33,7 @@
>   
>   #include <config.h>
>   
> +#include <stdbool.h>
>   #include <stdio.h>
>   #include <stdlib.h>
>   #include <string.h>
> @@ -42,6 +43,8 @@
>   #include <sys/stat.h>
>   #include <errno.h>
>   #include <linux/falloc.h>   /* For FALLOC_FL_* on RHEL, glibc < 2.18 */
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>

Does this need a configure-time probe to see if it exists, since it will 
break compilation on BSD systems?  Same question to linux/falloc.h. 
Actually, linux/falloc.h doesn't see any use in the current nbdkit.git; 
does this email depend on another thread being applied first?


> +
> +#ifdef FALLOC_FL_PUNCH_HOLE
> +  /* If we can punch hole but may not trim, we can combine punching hole and
> +     fallocate to zero a range. This is much more efficient than writing zeros
> +     manually. */

s/is/can be/ (it's two syscalls instead of one, and may not be as 
efficient as we'd like - but does indeed stand a chance of being more 
efficient than manual efforts)

> +  if (h->can_punch_hole && h->can_fallocate) {
> +    r = do_fallocate (h->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> +                      offset, count);
> +    if (r == 0) {
> +      r = do_fallocate(h->fd, 0, offset, count);
> +      if (r == 0)
> +        return 0;
> +
> +      if (errno != EOPNOTSUPP) {
> +        nbdkit_error ("zero: %m");
> +        return r;
> +      }
> +
> +      h->can_fallocate = false;
> +    } else {
> +      if (errno != EOPNOTSUPP) {
> +        nbdkit_error ("zero: %m");
> +        return r;
> +      }
> +
> +      h->can_punch_hole = false;
> +    }
> +  }
> +#endif
> +
> +  /* For block devices, we can use BLKZEROOUT.
> +     NOTE: count and offset must be aligned to logical block size. */
> +  if (h->is_block_device) {
> +    uint64_t range[2] = {offset, count};

Is it worth attempting the ioctl only when you have aligned values?

> +
> +    r = ioctl(h->fd, BLKZEROOUT, &range);

This portion of the code be conditional on whether BLKZEROOUT is defined.

> +    if (r == 0)
> +      return 0;
> +
> +    nbdkit_error("zero: %m");
> +    return r;
> +  }
> +
>     /* Trigger a fall back to writing */
>     errno = EOPNOTSUPP;
> -#endif
>   
>     return r;
>   }
> 



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




More information about the Libguestfs mailing list