[Virtio-fs] [virtiofsd PATCH v6 5/6] virtiofsd: implement file size based dax policy

Vivek Goyal vgoyal at redhat.com
Wed Oct 20 17:54:25 UTC 2021


On Mon, Oct 11, 2021 at 11:09:37AM +0800, Jeffle Xu wrote:
> When DAX window is fully utilized and needs to be expanded to avoid the
> performance fluctuation triggered by DAX window recaliming, it may not
> be wise to allocate DAX window for files with size smaller than some
> specific point, considering from the perspective of reducing memory
> overhead.
> 
> To maintain one DAX window chunk (e.g., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest. Thus it'd better disable DAX for those files smaller than
> 32KB, to reduce the demand for DAX window and thus avoid the unworthy
> memory overhead.
> 
> Thus only flag the file with FUSE_ATTR_DAX when the file size is greater
> than 32 KB. The guest will enable DAX only for those files flagged with
> FUSE_ATTR_DAX, when virtiofs is mounted with '-o dax=inode'.
> 
> To be noted that both FUSE_LOOKUP and FUSE_READDIRPLUS are affected, and
> will convey FUSE_ATTR_DAX flag to the guest.
> 
> This policy will be used when '-o dax=server' option is specified for
> virtiofsd.

So enabling DAX based on size is just one type of policy. Tomorrow one
could think of enabling DAX on some other attribute. So only enable it
for non-executable files etc. 

We could be more specific and define a policy say "-o dax=filesize". This
will allow defining other policies easily later. 

But problem with this option is that defining more complex policies
will be hard. Say I want to enable DAX on non-executable files which
have size greater than 32K. Being able to combine multiple policies
will be easy if we implement separate options/knobs to control them
and that will allow turning on multiple policies together.

Say, -o dax_size_threshold=1MB -o dax_no_exec_file

So I guess a generic "-o dax=server" is not a bad idea. As of now this
will implement a default size based policy of 32K size. But one can add
options later to tweak this.

Vivek

> 
> Signed-off-by: Jeffle Xu <jefflexu at linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index b4de86e317..a5a5061906 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -54,6 +54,7 @@
>  #include <sys/syscall.h>
>  #include <sys/wait.h>
>  #include <sys/xattr.h>
> +#include <sys/user.h>
>  #include <syslog.h>
>  #include <linux/fs.h>
>  
> @@ -61,6 +62,19 @@
>  #include "passthrough_helpers.h"
>  #include "passthrough_seccomp.h"
>  
> +/*
> + * One page descriptor (64 bytes in size) needs to be maintained for every page
> + * in the DAX window chunk, i.e., there is certain guest memory overhead when
> + * DAX is enabled. Thus disable DAX for those files smaller than this certain
> + * memory overhead if virtiofs is mounted in per-file DAX mode, in which case
> + * the guest page cache will consume less memory when DAX is disabled.
> + */
> +#define FUSE_DAX_SHIFT	21
> +#define PAGE_DESC_SHIFT	6
> +#define FUSE_PERFILE_DAX_SHIFT \
> +    (FUSE_DAX_SHIFT - PAGE_SHIFT + PAGE_DESC_SHIFT)
> +#define FUSE_PERFILE_DAX_POINT	(1 << FUSE_PERFILE_DAX_SHIFT)
> +
>  /* Keep track of inode posix locks for each owner. */
>  struct lo_inode_plock {
>      uint64_t lock_owner;
> @@ -986,6 +1000,20 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +static bool lo_should_enable_dax(struct lo_data *lo,
> +                                 struct fuse_entry_param *e)
> +{
> +    if (lo->dax == DAX_NONE || !S_ISREG(e->attr.st_mode)) {
> +        return false;
> +    }
> +
> +    if (lo->dax & DAX_SERVER) {
> +        return e->attr.st_size > FUSE_PERFILE_DAX_POINT;
> +    }
> +
> +    return false;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
> @@ -1076,6 +1104,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      }
>      e->ino = inode->fuse_ino;
>  
> +    if (lo_should_enable_dax(lo, e)) {
> +        e->attr_flags |= FUSE_ATTR_DAX;
> +    }
> +
>      /* Transfer ownership of inode pointer to caller or drop it */
>      if (inodep) {
>          *inodep = inode;
> -- 
> 2.27.0
> 




More information about the Virtio-fs mailing list