[Libguestfs] [PATCH] file: Zero support for block devices and NFS 4.2
Eric Blake
eblake at redhat.com
Mon Jul 30 15:01:47 UTC 2018
On 07/29/2018 07:04 AM, Nir Soffer wrote:
> If we may not trim, we tried ZERO_RANGE, but this is not well supported
> yet, for example it is not available on NFS 4.2. ZERO_RANGE and
> PUNCH_HOLE are supported now on block devices, but not on RHRL 7, so we
> fallback to slow manual zeroing there.
>
> Change the logic to support block devices on RHEL 7, and file systems
> that do not support ZERO_RANGE.
>
> The new logic:
> - If we may trim, try PUNCH_HOLE
> - If we can zero range, Try ZERO_RANGE
> - If we can punch hole and fallocate, try fallocate(PUNCH_HOLE) followed
> by fallocate(0).
> - If underlying file is a block device, try ioctl(BLKZEROOUT)
> - Otherwise fallback to manual zeroing
>
> The handle keeps now the underlying file capabilities, so once we
> discover that an operation is not supported, we never try it again.
>
>
> Issues:
> - ioctl(BLKZEROOUT) will fail if offset or count are not aligned to
> logical sector size. I'm not sure if nbdkit or qemu-img ensure this.
qemu-img tends to default to 512-byte alignment, but can be told to
follow 4k alignment instead. nbdkit includes a filter that can force 4k
alignment on top of any plugin, regardless of client alignment.
Someday, I'd like to enhance nbdkit to support block size advertisement
(qemu-img already knows how to honor such advertisements). It's on my
todo queue, but lower in priority than getting incremental backups
working in libvirt.
> - Need testing with NFS
> ---
> plugins/file/file.c | 126 ++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 103 insertions(+), 23 deletions(-)
> +++ b/plugins/file/file.c
> @@ -33,6 +33,7 @@
>
> #include <config.h>
>
> +#include <stdbool.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> @@ -42,6 +43,8 @@
> #include <sys/stat.h>
> #include <errno.h>
> #include <linux/falloc.h> /* For FALLOC_FL_* on RHEL, glibc < 2.18 */
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
Does this need a configure-time probe to see if it exists, since it will
break compilation on BSD systems? Same question to linux/falloc.h.
Actually, linux/falloc.h doesn't see any use in the current nbdkit.git;
does this email depend on another thread being applied first?
> +
> +#ifdef FALLOC_FL_PUNCH_HOLE
> + /* If we can punch hole but may not trim, we can combine punching hole and
> + fallocate to zero a range. This is much more efficient than writing zeros
> + manually. */
s/is/can be/ (it's two syscalls instead of one, and may not be as
efficient as we'd like - but does indeed stand a chance of being more
efficient than manual efforts)
> + if (h->can_punch_hole && h->can_fallocate) {
> + r = do_fallocate (h->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> + offset, count);
> + if (r == 0) {
> + r = do_fallocate(h->fd, 0, offset, count);
> + if (r == 0)
> + return 0;
> +
> + if (errno != EOPNOTSUPP) {
> + nbdkit_error ("zero: %m");
> + return r;
> + }
> +
> + h->can_fallocate = false;
> + } else {
> + if (errno != EOPNOTSUPP) {
> + nbdkit_error ("zero: %m");
> + return r;
> + }
> +
> + h->can_punch_hole = false;
> + }
> + }
> +#endif
> +
> + /* For block devices, we can use BLKZEROOUT.
> + NOTE: count and offset must be aligned to logical block size. */
> + if (h->is_block_device) {
> + uint64_t range[2] = {offset, count};
Is it worth attempting the ioctl only when you have aligned values?
> +
> + r = ioctl(h->fd, BLKZEROOUT, &range);
This portion of the code be conditional on whether BLKZEROOUT is defined.
> + if (r == 0)
> + return 0;
> +
> + nbdkit_error("zero: %m");
> + return r;
> + }
> +
> /* Trigger a fall back to writing */
> errno = EOPNOTSUPP;
> -#endif
>
> return r;
> }
>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
More information about the Libguestfs
mailing list