[Libguestfs] [PATCH nbdkit] cache, cow: Do not round size down.
Eric Blake
eblake at redhat.com
Wed Feb 17 14:57:36 UTC 2021
On 2/17/21 6:57 AM, Richard W.M. Jones wrote:
> Both the cache and cow filters have the annoying behaviour that the
> size of the underlying plugin is rounded down to the block size used
> by the filters (ie. 4K). This is especially visible for single sector
> disk images:
>
> $ ./nbdkit memory size=512 --filter=cow --run 'nbdinfo --size $uri'
> 0
> $ ./nbdkit memory size=512 --filter=cache --run 'nbdinfo --size $uri'
> 0
>
> but affects all users of these filters somewhat randomly and
> unexpectedly. (For example I had a GPT partitioned disk image from a
> customer which did not have a 4K aligned size, which was unexpectedly
> "corrupted" when I tried to open it using the cow filter - the reason
> for the corruption was the backup partition table at the end of the
> disk being truncated.)
>
> Getting rid of this assumption is awkward: the general idea is that we
> round up the size of the backing file / bitmap by a full block. But
> whenever we are asked to read or write to the underlying plugin we
> handle the tail case carefully.
>
> This also tests various corner cases.
> ---
> +++ b/filters/cache/nbdkit-cache-filter.pod
> @@ -25,12 +25,6 @@ does not have effective caching, or (with C<cache=unsafe>) to defeat
> flush requests from the client (which is unsafe and can cause data
> loss, as the name suggests).
>
> -Note that the use of this filter rounds the image size down to a
> -multiple of the caching granularity (the larger of 4096 or the
> -C<f_bsize> field of L<fstatvfs(3)>), to ease the implementation. If
> -you need to round the image size up instead to access the last few
> -bytes, combine this filter with L<nbdkit-truncate-filter(1)>.
Nice to see this restriction go.
Do you also want to tweak the sentence in nbdkit-truncate-filter.pod
that mentions cache and cow filters?
> +++ b/filters/cache/blk.c
> int
> blk_set_size (uint64_t new_size)
> {
> - if (bitmap_resize (&bm, new_size) == -1)
> + size = new_size;
> +
> + if (bitmap_resize (&bm, size) == -1)
> return -1;
>
> - if (ftruncate (fd, new_size) == -1) {
> + if (ftruncate (fd, ROUND_UP (size, blksize)) == -1) {
So the underlying fd is aligned...
> nbdkit_error ("ftruncate: %m");
> return -1;
> }
>
> - if (lru_set_size (new_size) == -1)
> + if (lru_set_size (size) == -1)
...while the lru size is still the original size.
> @@ -199,9 +207,22 @@ blk_read (struct nbdkit_next_ops *next_ops, void *nxdata,
> "unknown");
>
> if (state == BLOCK_NOT_CACHED) { /* Read underlying plugin. */
> - if (next_ops->pread (nxdata, block, blksize, offset, 0, err) == -1)
> + unsigned n = blksize, tail = 0;
> +
> + if (offset + n > size) {
> + tail = offset + n - size;
> + n -= tail;
> + }
> +
> + if (next_ops->pread (nxdata, block, n, offset, 0, err) == -1)
> return -1;
Good, you were careful to never read beyond EOF of the underlying plugin.
>
> + /* Normally we're reading whole blocks, but at the very end of the
> + * file we might read a partial block. Deal with that case by
> + * zeroing the tail.
> + */
> + memset (block + n, 0, tail);
> +
I don't know if that tail can ever leak out to the client, but agree
that it is safer to zero it anyway.
> /* If cache-on-read, copy the block to the cache. */
> if (cache_on_read) {
> nbdkit_debug ("cache: cache-on-read block %" PRIu64
> @@ -247,9 +268,22 @@ blk_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
>
> if (state == BLOCK_NOT_CACHED) {
> /* Read underlying plugin, copy to cache regardless of cache-on-read. */
> - if (next_ops->pread (nxdata, block, blksize, offset, 0, err) == -1)
> + unsigned n = blksize, tail = 0;
> +
> + if (offset + n > size) {
> + tail = offset + n - size;
> + n -= tail;
> + }
> +
> + if (next_ops->pread (nxdata, block, n, offset, 0, err) == -1)
> return -1;
>
> + /* Normally we're reading whole blocks, but at the very end of the
> + * file we might read a partial block. Deal with that case by
> + * zeroing the tail.
> + */
> + memset (block + n, 0, tail);
> +
Duplicated code, but I'm not sure if it warrants a helper function.
> nbdkit_debug ("cache: cache block %" PRIu64 " (offset %" PRIu64 ")",
> blknum, (uint64_t) offset);
>
> @@ -281,6 +315,12 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata,
> int *err)
> {
> off_t offset = blknum * blksize;
> + unsigned n = blksize, tail = 0;
> +
> + if (offset + n > size) {
> + tail = offset + n - size;
> + n -= tail;
> + }
>
> reclaim (fd, &bm);
>
> @@ -293,7 +333,7 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata,
> return -1;
> }
>
> - if (next_ops->pwrite (nxdata, block, blksize, offset, flags, err) == -1)
> + if (next_ops->pwrite (nxdata, block, n, offset, flags, err) == -1)
> return -1;
Okay, you're careful to not write beyond EOF of the plugin, regardless
of what extra may live in the tail in the local fd.
LGTM
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
More information about the Libguestfs
mailing list