[Libguestfs] [PATCH nbdkit] New filter: gzip

Eric Blake eblake at redhat.com
Fri Jul 10 14:20:29 UTC 2020


On 7/10/20 6:50 AM, Richard W.M. Jones wrote:
> Turn the existing nbdkit-gzip-plugin into a filter so it can be
> applied on top of files or other sources:
> 
>    nbdkit file --filter=gzip file.gz
>    nbdkit curl --filter=gzip https://example.com/disk.gz
> 
> Because of the nature of the gzip format which is not blocked based
> and thus not seekable, this filter caches the whole uncompressed file
> in a hidden temporary file.  This is required in order to implement
> .get_size.  See this link for a more detailed explanation:
> https://stackoverflow.com/a/9213826
> 
> This commit deprecates nbdkit-gzip-plugin and suggests removal in
> nbdkit 1.26.
> ---

Nice - this one seems like a fairly straight-forward conversion.


> +++ b/filters/tar/nbdkit-tar-filter.pod
> @@ -42,11 +42,13 @@ server use:
>    nbdkit -r curl https://example.com/file.tar \
>           --filter=tar tar-entry=disk.img
>   
> -=head2 Open an xz-compressed tar file (read-only)
> +=head2 Open an gzip-compressed tar file (read-only)

Should this heading s/gzip-// ?

>   
>   This filter cannot handle compressed tar files itself, but you can
> -combine it with L<nbdkit-xz-filter(1)>:
> +combine it with L<nbdkit-gzip-filter(1)> or L<nbdkit-xz-filter(1)>:
>   
> + nbdkit file filename.tar.gz \
> +        --filter=tar tar-entry=disk.img --filter=gzip
>    nbdkit file filename.tar.xz \
>           --filter=tar tar-entry=disk.img --filter=xz
>   

> +++ b/tests/Makefile.am
> @@ -557,23 +557,6 @@ EXTRA_DIST += test-floppy.sh
>   TESTS += test-full.sh
>   EXTRA_DIST += test-full.sh
>   
> -# gzip plugin test.
> -if HAVE_MKE2FS_WITH_D
> -if HAVE_ZLIB
> -LIBGUESTFS_TESTS += test-gzip
> -check_DATA += disk.gz
> -CLEANFILES += disk.gz

Is it worth keeping this around until we actually retire the plugin?

> +++ b/filters/gzip/gzip.c
> @@ -0,0 +1,347 @@

> +/* The first thread to call gzip_prepare uncompresses the whole plugin. */
> +static int
> +do_uncompress (struct nbdkit_next_ops *next_ops, void *nxdata)
> +{
> +  int64_t compressed_size;
> +  z_stream strm;
> +  int zerr;
> +  const char *tmpdir;
> +  size_t len;
> +  char *template;
> +  CLEANUP_FREE char *in_block = NULL, *out_block = NULL;
> +
> +  /* This was the same buffer size as used in the old plugin.  As far
> +   * as I know it was chosen at random.
> +   */
> +  const size_t block_size = 128 * 1024;
> +
> +  assert (size == -1);
> +
> +  /* Get the size of the underlying plugin. */
> +  compressed_size = next_ops->get_size (nxdata);

If we wanted, we could save this...


> +/* Similar to above, whatever the plugin says, extents are not
> + * supported.
> + */
> +static int
> +gzip_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
> +                  void *handle)

Well, since we stored it as a temp file, we could actually report 
extents off of lseek(SEEK_HOLE).  But that can be added separately, if 
we want it (it goes back to your notion of a library for file-based 
accessor functions).

> +{
> +  return 0;
> +}
> +
> +/* We are already operating as a cache regardless of the plugin's
> + * underlying .can_cache, but it's easiest to just rely on nbdkit's
> + * behavior of calling .pread for caching.
> + */
> +static int
> +gzip_can_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
> +                void *handle)
> +{
> +  return NBDKIT_CACHE_EMULATE;
> +}
> +
> +/* Get the file size. */
> +static int64_t
> +gzip_get_size (struct nbdkit_next_ops *next_ops, void *nxdata,
> +               void *handle)
> +{
> +  /* This must be true because gzip_prepare must have been called. */
> +  assert (size >= 0);
> +
> +  /* We must call underlying get_size even though we don't use the
> +   * result, because it caches the plugin size in server/backend.c.
> +   */
> +  if (next_ops->get_size (nxdata) == -1)
> +    return -1;

...and verify here that the underlying plugin isn't changing size.  But 
probably not necessary.

With the tar filter, you HAD to call next_ops->get_size() per 
connection, because you called next_ops->pread() per connection.  But 
with this filter, you don't actually have to call next_ops->get_size(), 
because your only use of next_ops->pread() is during the initial 
decompression, and the rest of this filter reads from local cache rather 
than deferring to the plugin.

Overall looks good.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




More information about the Libguestfs mailing list