[Libguestfs] [PATCH nbdkit] New filter: gzip
Eric Blake
eblake at redhat.com
Fri Jul 10 14:20:29 UTC 2020
On 7/10/20 6:50 AM, Richard W.M. Jones wrote:
> Turn the existing nbdkit-gzip-plugin into a filter so it can be
> applied on top of files or other sources:
>
> nbdkit file --filter=gzip file.gz
> nbdkit curl --filter=gzip https://example.com/disk.gz
>
> Because of the nature of the gzip format which is not blocked based
> and thus not seekable, this filter caches the whole uncompressed file
> in a hidden temporary file. This is required in order to implement
> .get_size. See this link for a more detailed explanation:
> https://stackoverflow.com/a/9213826
>
> This commit deprecates nbdkit-gzip-plugin and suggests removal in
> nbdkit 1.26.
> ---
Nice - this one seems like a fairly straight-forward conversion.
> +++ b/filters/tar/nbdkit-tar-filter.pod
> @@ -42,11 +42,13 @@ server use:
> nbdkit -r curl https://example.com/file.tar \
> --filter=tar tar-entry=disk.img
>
> -=head2 Open an xz-compressed tar file (read-only)
> +=head2 Open an gzip-compressed tar file (read-only)
Should this heading s/gzip-// ?
>
> This filter cannot handle compressed tar files itself, but you can
> -combine it with L<nbdkit-xz-filter(1)>:
> +combine it with L<nbdkit-gzip-filter(1)> or L<nbdkit-xz-filter(1)>:
>
> + nbdkit file filename.tar.gz \
> + --filter=tar tar-entry=disk.img --filter=gzip
> nbdkit file filename.tar.xz \
> --filter=tar tar-entry=disk.img --filter=xz
>
> +++ b/tests/Makefile.am
> @@ -557,23 +557,6 @@ EXTRA_DIST += test-floppy.sh
> TESTS += test-full.sh
> EXTRA_DIST += test-full.sh
>
> -# gzip plugin test.
> -if HAVE_MKE2FS_WITH_D
> -if HAVE_ZLIB
> -LIBGUESTFS_TESTS += test-gzip
> -check_DATA += disk.gz
> -CLEANFILES += disk.gz
Is it worth keeping this around until we actually retire the plugin?
> +++ b/filters/gzip/gzip.c
> @@ -0,0 +1,347 @@
> +/* The first thread to call gzip_prepare uncompresses the whole plugin. */
> +static int
> +do_uncompress (struct nbdkit_next_ops *next_ops, void *nxdata)
> +{
> + int64_t compressed_size;
> + z_stream strm;
> + int zerr;
> + const char *tmpdir;
> + size_t len;
> + char *template;
> + CLEANUP_FREE char *in_block = NULL, *out_block = NULL;
> +
> + /* This was the same buffer size as used in the old plugin. As far
> + * as I know it was chosen at random.
> + */
> + const size_t block_size = 128 * 1024;
> +
> + assert (size == -1);
> +
> + /* Get the size of the underlying plugin. */
> + compressed_size = next_ops->get_size (nxdata);
If we wanted, we could save this...
> +/* Similar to above, whatever the plugin says, extents are not
> + * supported.
> + */
> +static int
> +gzip_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
> + void *handle)
Well, since we stored it as a temp file, we could actually report
extents off of lseek(SEEK_HOLE). But that can be added separately, if
we want it (it goes back to your notion of a library for file-based
accessor functions).
> +{
> + return 0;
> +}
> +
> +/* We are already operating as a cache regardless of the plugin's
> + * underlying .can_cache, but it's easiest to just rely on nbdkit's
> + * behavior of calling .pread for caching.
> + */
> +static int
> +gzip_can_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
> + void *handle)
> +{
> + return NBDKIT_CACHE_EMULATE;
> +}
> +
> +/* Get the file size. */
> +static int64_t
> +gzip_get_size (struct nbdkit_next_ops *next_ops, void *nxdata,
> + void *handle)
> +{
> + /* This must be true because gzip_prepare must have been called. */
> + assert (size >= 0);
> +
> + /* We must call underlying get_size even though we don't use the
> + * result, because it caches the plugin size in server/backend.c.
> + */
> + if (next_ops->get_size (nxdata) == -1)
> + return -1;
...and verify here that the underlying plugin isn't changing size. But
probably not necessary.
With the tar filter, you HAD to call next_ops->get_size() per
connection, because you called next_ops->pread() per connection. But
with this filter, you don't actually have to call next_ops->get_size(),
because your only use of next_ops->pread() is during the initial
decompression, and the rest of this filter reads from local cache rather
than deferring to the plugin.
Overall looks good.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
More information about the Libguestfs
mailing list