[Linux-cluster] slow GFS2 stat() performance

Wed Mar 17 16:27:04 UTC 2010

Hi,

On Wed, 2010-03-17 at 10:16 +0100, Sven Karlsson wrote:
> On Mon, Mar 15, 2010 at 10:50 AM, Steven Whitehouse <swhiteho at redhat.com> wrote:
> > On Mon, 2010-03-15 at 00:54 +0100, Sven Karlsson wrote:
> >> Hello,
> 
> > This is probably down to the access pattern. Assuming that your find
> > test is the only thing running on the cluster, do you find that it goes
> > a lot faster the second time it is run from the same node?
> 
> Yes, then it's down to millisecond scale. I assumed that the linux
> directory cache mechanism was responsible for this.
> 
Yes, in combination with the other caches.

> > I'd be surprised if running find from GFS/GFS2 on a single node,
> > lock_nolock basis would be much slower than any other fs, particularly
> > once the required disk blocks have been cached after the initial run.
> 
> Even with a lock manager performance is great after caching.
> 
> The problem is that there is a total of about a million files
> (expected to grow), so if 4000 files takes about 17 seconds, these
> takes about 70 minutes.
> 
> I have not tried two runs of these, but my assumtion is that the cache
> would be invalidated either due to some cache time constraint or cache
> space contraint. Otherwise the cache could be kept hot I guess by
> refreshing continuously. I've started a test to see what happens.
> 
The dcache and inode cache and page cache depend upon the VM to give
them a request to drop part of their content under memory pressure. So
the constraint is basically an LRU one but with some tweeks.

> Either way, 70 minutes just for traversing the filesystem to figure
> out which files to backup, and then maybe another 70 minutes when
> doing the actual backup, just for stat() calls, seems a bit much to
> me.
> 
Yes, and one solution to that is to distribute the back up process so
that each node backs up the files which are that nodes normal "working
set". That way you can back up in parallel and make the maximum use of
the cache. I know that back up programs are often no designed that way,
but it does make things much faster with filesystems of this type.

There are other possible solutions. Some people have used the VFS/VM
drop_caches on nodes before backup and the backup node after backup in
order to flush the caches. Its a bit of a hack and it doesn't work well
if the workload is running at the same time as the back up.

Also, its possible to use fsync() and madvise() to give hints about
which things should be dropped from the cache, and which will be
required in the future. These have the advantage of being generic and
working with any filesystem.

> Performance for normal usage, i.e. accessing files directly, bulk
> read/writes perform as expected (20-30 MB/s for writing), but we'd
> really like to reduce the fs traversal time.
> 
> > Which version of GFS2 are you using?
> 
> # rpm -qa "*gfs*"
> kmod-gfs-0.1.34-2.el5
> gfs-utils-0.1.20-1.el5
> gfs2-utils-0.1.62-1.el5
> 
Ok, just checking that this is uptodate. Really I was after the kernel
version, but it looks like RHEL5.? anyway,

Steve.