[Linux-cluster] slow GFS2 stat() performance

Wed Mar 17 09:16:02 UTC 2010

On Mon, Mar 15, 2010 at 10:50 AM, Steven Whitehouse <swhiteho at redhat.com> wrote:
> On Mon, 2010-03-15 at 00:54 +0100, Sven Karlsson wrote:
>> Hello,

> This is probably down to the access pattern. Assuming that your find
> test is the only thing running on the cluster, do you find that it goes
> a lot faster the second time it is run from the same node?

Yes, then it's down to millisecond scale. I assumed that the linux
directory cache mechanism was responsible for this.

> I'd be surprised if running find from GFS/GFS2 on a single node,
> lock_nolock basis would be much slower than any other fs, particularly
> once the required disk blocks have been cached after the initial run.

Even with a lock manager performance is great after caching.

The problem is that there is a total of about a million files
(expected to grow), so if 4000 files takes about 17 seconds, these
takes about 70 minutes.

I have not tried two runs of these, but my assumtion is that the cache
would be invalidated either due to some cache time constraint or cache
space contraint. Otherwise the cache could be kept hot I guess by
refreshing continuously. I've started a test to see what happens.

Either way, 70 minutes just for traversing the filesystem to figure
out which files to backup, and then maybe another 70 minutes when
doing the actual backup, just for stat() calls, seems a bit much to
me.

Performance for normal usage, i.e. accessing files directly, bulk
read/writes perform as expected (20-30 MB/s for writing), but we'd
really like to reduce the fs traversal time.

> Which version of GFS2 are you using?

# rpm -qa "*gfs*"
kmod-gfs-0.1.34-2.el5
gfs-utils-0.1.20-1.el5
gfs2-utils-0.1.62-1.el5

Regards
 SK