[Libguestfs] [PATCH nbdkit 8/8] file: Implement extents.
Richard W.M. Jones
rjones at redhat.com
Mon Mar 25 17:45:16 UTC 2019
On Sat, Mar 23, 2019 at 02:45:44PM -0500, Eric Blake wrote:
> On 3/20/19 5:11 PM, Richard W.M. Jones wrote:
> > This uses lseek SEEK_DATA/SEEK_HOLE to search for allocated data and
> > holes in the underlying file.
> > ---
> > plugins/file/file.c | 139 ++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 129 insertions(+), 10 deletions(-)
> >
>
> > -int file_debug_zero; /* to enable: -D file.zero=1 */
> > +/* Any callbacks using lseek must be protected by this lock. */
> > +static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER;
> > +
> > +/* to enable: -D file.zero=1 */
> > +int file_debug_zero;
> >
> > static void
> > file_unload (void)
> > @@ -220,6 +227,21 @@ file_close (void *handle)
> >
> > #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL
> >
> > +/* For block devices, stat->st_size is not the true size. */
> > +static int64_t
> > +block_device_size (int fd)
> > +{
> > + off_t size;
> > +
> > + size = lseek (fd, 0, SEEK_END);
>
> Calling lseek without the lock? I'm not sure if you can guarantee that
> .size won't be called in parallel with some other .extents.
> /me reads ahead
> Oh, the caller has the lock. I might add a comment to this function that
> it expects the caller to grab the lock.
Yes I'll add a comment as it wasn't even obvious to me when I looked
back at the code.
> > +#ifdef SEEK_HOLE
> > +/* Extents. */
> > +
> > +static int
> > +file_can_extents (void *handle)
> > +{
> > + struct handle *h = handle;
> > + off_t r;
> > +
> > + /* A simple test to see whether SEEK_HOLE etc is likely to work on
> > + * the current filesystem.
> > + */
> > + pthread_mutex_lock (&lseek_lock);
> > + r = lseek (h->fd, 0, SEEK_HOLE);
> > + pthread_mutex_unlock (&lseek_lock);
> > + if (r == -1) {
> > + nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m");
> > + return 0;
>
> In fact, if lseek() succeeds but returns the current file size, you know
> there are no holes in the file at all, in which case, it might be more
> efficient to store a variable stating that .extents should avoid lseek()
> altogether and just report the entire file as data.
>
> Hmm - if we just make .extents return false in this case, then nbdkit
> won't advertise block status to the guest even though we have valid
> status available (knowing there are no holes is nicer than merely
> assuming there are no holes because block status was unavailable). But
> if we return true, then WE have to do that optimization ourselves. Maybe
> .can_extent should be a tri-state return, just like .can_fua, where a
> plugin can choose NBDKIT_EXTENT_NONE to force no block status
> advertisement, NBDKIT_EXTENT_EMULATE (default if .extents is missing) to
> let nbdkit do the work of reporting the entire disk as data without
> calling .extents, and NBDKIT_EXTENT_NATIVE (requires .extents, and lets
> the plugin do all the work). Then, when lseek(SEEK_HOLE) returns EOF, we
> can return NBDKIT_EXTENT_EMULATE instead of having to track our
> optimization of skipping lseek in .extents ourselves.
I'm not really sure I understand this. However in the latest
iteration of the series can_extents is disconnected from what we
advertise back to clients. It is solely used to decide if the server
calls .extents or bypasses it (returning fully allocated).
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
More information about the Libguestfs
mailing list