[libvirt] [PATCH] Use posix_fallocate() to allocate disk space

Daniel P. Berrange berrange at redhat.com
Tue Feb 24 11:58:31 UTC 2009


On Tue, Feb 24, 2009 at 05:09:31PM +0530, Amit Shah wrote:
> Hi,
> 
> This is an untested patch to make disk allocations faster and
> non-fragmented. I'm using posix_fallocate() now but relying on glibc
> really calling fallocate() if it exists for the file system to be the
> fastest.
> 
> - This fails build because libutil needs to be added as a dependency?
> 
> ../src/.libs/libvirt_driver_storage.a(storage_backend_fs.o): In function
> `virStorageBackendFileSystemVolCreate':
> /home/amit/src/libvirt/src/storage_backend_fs.c:1023: undefined
> reference to `safezero'

You'd need to add 'safezero' to src/libvirt_private.syms to allow it
to be linked to by the storage driver.

> - What's vol->capacity? Why is ftruncate() needed after the call to
>   (current) safewrite()? My assumption is that the user can specify some
>   max. capacity and wish to allocate only a chunk off it at create-time.
>   Is that correct?

"allocation" refers to the current physical usage of the volume

"capacity" refers to the logical size of the volume

So, you can have a raw sparse file of size 4 GB, but not allocate any disk
upfront - just allocated on demand when guest writes to it. Or you can 
allocate 1 GB upfront, and leave the rest unallocated. So this code is
first filling out the upfront allocation the user requested, and then using
ftruncate() to extend to a (possibly larger) logical size.

Similarly for qcow files, capacity refers to the logical disk size
but qcow is grow on demand, so allocation will be much lower.

Usually allocation <= capacity, but if the volume format has metadata
overhead, you can get to a place where allocation > capacity if the
entire volume has been written to.

> The best case to get a non-fragmented VM image is to have it allocated
> completely at create-time with fallocate().

The main problem with this change is that it'll make it harder for
us to provide incremental feedback. As per the comment in the code, 
it is our intention to make the volume creation API run as a background
job which provides feedback on progress of allocation, and the ability
to cancel the job. Since posix_fallocate() is an all-or-nothing kind of
API it wouldn't be very helpful. 

What sort of performance boost does this give you ?  Would we perhaps
be able to get close to it by writing in bigger chunks than 4k, or 
mmap'ing the file and then doing a memset across it ?

> >From dfe4780f5990571f026e02e6187cb64505c982c1 Mon Sep 17 00:00:00 2001
> From: Amit Shah <amit.shah at redhat.com>
> Date: Tue, 24 Feb 2009 16:55:58 +0530
> Subject: [PATCH] Use posix_fallocate() to allocate disk space
> 
> Using posix_fallocate() to allocate disk space and fill it with zeros is faster
> than writing the zeros block-by-block.
> 
> Also, for backing file systems that support the fallocate() syscall, this
> operation will give us a big speed boost.
> 
> The biggest advantage of using this is the file will not be fragmented for the
> allocated chunks.
> 
> Signed-off-by: Amit Shah <amit.shah at redhat.com>
> ---
>  src/storage_backend_fs.c |   23 ++++++++---------------
>  src/util.c               |    5 +++++
>  src/util.h               |    1 +
>  3 files changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/src/storage_backend_fs.c b/src/storage_backend_fs.c
> index 240de96..74b0fda 100644
> --- a/src/storage_backend_fs.c
> +++ b/src/storage_backend_fs.c
> @@ -1019,21 +1019,14 @@ virStorageBackendFileSystemVolCreate(virConnectPtr conn,
>          /* XXX slooooooooooooooooow.
>           * Need to add in progress bars & bg thread somehow */
>          if (vol->allocation) {
> -            unsigned long long remain = vol->allocation;
> -            static char const zeros[4096];
> -            while (remain) {
> -                int bytes = sizeof(zeros);
> -                if (bytes > remain)
> -                    bytes = remain;
> -                if ((bytes = safewrite(fd, zeros, bytes)) < 0) {
> -                    virReportSystemError(conn, errno,
> -                                         _("cannot fill file '%s'"),
> -                                         vol->target.path);
> -                    unlink(vol->target.path);
> -                    close(fd);
> -                    return -1;
> -                }
> -                remain -= bytes;
> +            int r;
> +            if ((r = safezero(fd, 0, 0, vol->allocation)) < 0) {
> +                virReportSystemError(conn, r,
> +                                     _("cannot fill file '%s'"),
> +                                     vol->target.path);
> +                unlink(vol->target.path);
> +                close(fd);
> +                return -1;
>              }
>          }
>  
> diff --git a/src/util.c b/src/util.c
> index 990433a..1bee7f0 100644
> --- a/src/util.c
> +++ b/src/util.c
> @@ -117,6 +117,11 @@ ssize_t safewrite(int fd, const void *buf, size_t count)
>          return nwritten;
>  }
>  
> +int safezero(int fd, int flags, off_t offset, off_t len)
> +{
> +        return posix_fallocate(fd, offset, len);
> +}
> +
>  #ifndef PROXY
>  
>  int virFileStripSuffix(char *str,
> diff --git a/src/util.h b/src/util.h
> index a79cfa7..acaabb1 100644
> --- a/src/util.h
> +++ b/src/util.h
> @@ -31,6 +31,7 @@
>  
>  int saferead(int fd, void *buf, size_t count);
>  ssize_t safewrite(int fd, const void *buf, size_t count);
> +int safezero(int fd, int flags, off_t offset, off_t len);
>  
>  enum {
>      VIR_EXEC_NONE   = 0,
> -- 


Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list