[Libguestfs] [PATCH nbdkit 8/8] file: Implement extents.

Eric Blake eblake at redhat.com
Sat Mar 23 19:45:44 UTC 2019


On 3/20/19 5:11 PM, Richard W.M. Jones wrote:
> This uses lseek SEEK_DATA/SEEK_HOLE to search for allocated data and
> holes in the underlying file.
> ---
>  plugins/file/file.c | 139 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 129 insertions(+), 10 deletions(-)
> 

> -int file_debug_zero;            /* to enable: -D file.zero=1 */
> +/* Any callbacks using lseek must be protected by this lock. */
> +static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER;
> +
> +/* to enable: -D file.zero=1 */
> +int file_debug_zero;
>  
>  static void
>  file_unload (void)
> @@ -220,6 +227,21 @@ file_close (void *handle)
>  
>  #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL
>  
> +/* For block devices, stat->st_size is not the true size. */
> +static int64_t
> +block_device_size (int fd)
> +{
> +  off_t size;
> +
> +  size = lseek (fd, 0, SEEK_END);

Calling lseek without the lock? I'm not sure if you can guarantee that
.size won't be called in parallel with some other .extents.
/me reads ahead
Oh, the caller has the lock. I might add a comment to this function that
it expects the caller to grab the lock.

>  
> +#ifdef SEEK_HOLE
> +/* Extents. */
> +
> +static int
> +file_can_extents (void *handle)
> +{
> +  struct handle *h = handle;
> +  off_t r;
> +
> +  /* A simple test to see whether SEEK_HOLE etc is likely to work on
> +   * the current filesystem.
> +   */
> +  pthread_mutex_lock (&lseek_lock);
> +  r = lseek (h->fd, 0, SEEK_HOLE);
> +  pthread_mutex_unlock (&lseek_lock);
> +  if (r == -1) {
> +    nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m");
> +    return 0;

In fact, if lseek() succeeds but returns the current file size, you know
there are no holes in the file at all, in which case, it might be more
efficient to store a variable stating that .extents should avoid lseek()
altogether and just report the entire file as data.

Hmm - if we just make .extents return false in this case, then nbdkit
won't advertise block status to the guest even though we have valid
status available (knowing there are no holes is nicer than merely
assuming there are no holes because block status was unavailable). But
if we return true, then WE have to do that optimization ourselves. Maybe
.can_extent should be a tri-state return, just like .can_fua, where a
plugin can choose NBDKIT_EXTENT_NONE to force no block status
advertisement, NBDKIT_EXTENT_EMULATE (default if .extents is missing) to
let nbdkit do the work of reporting the entire disk as data without
calling .extents, and NBDKIT_EXTENT_NATIVE (requires .extents, and lets
the plugin do all the work). Then, when lseek(SEEK_HOLE) returns EOF, we
can return NBDKIT_EXTENT_EMULATE instead of having to track our
optimization of skipping lseek in .extents ourselves.


> +static int
> +file_extents (void *handle, uint32_t count, uint64_t offset,
> +              uint32_t flags, struct nbdkit_extents *extents)
> +{
> +  int r;
> +
> +  pthread_mutex_lock (&lseek_lock);
> +  r = do_extents (handle, count, offset, flags, extents);
> +  pthread_mutex_unlock (&lseek_lock);
> +

But this would be the spot where we could optimize by returning all data
when our initial lseek() probe proved the file is not sparse.

Otherwise, looks good.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190323/3bc6f6da/attachment.sig>


More information about the Libguestfs mailing list