[Linux-cluster] How does caching work in GFS1?

Wed Aug 11 19:27:45 UTC 2010

Hi Jeff,

I chose the 2G directory for testing since it is the directory with
the largest number of files in our directory tree.

[testuser at buildmgmt-000 testdir]$ i=0;for FILE in `find ./ -type f`;
do ((i+=1)); done; echo $i
64423
[testuser at buildmgmt-000 testdir]$ cd ../main0/
[testuser at buildmgmt-000 main0]$ i=0;for FILE in `find ./ -type f`; do
((i+=1)); done; echo $i
164812

It does seem that the time is a linear function of the number of files
since testdir takes around 1/3 of the time as main0, however the
caching speedup is not a linear function since the percentage
improvement on subsequent runs is much higher on the smaller
directory.

Increasing demote_secs did not seem to have an appreciable effect.

The du command is a simplification of the use case. Our developers run
scripts which make tags in source code directories which require
stat'ing the files. Also, they use integrated development environments
which perform autocompletion of filenames etc. so when editing a file
they literally have to go have a coffee and come back 5 mins later
after their entire environment unfreezes. We've had similar
performance problems in the past with other recursive commands such as
rm -r. Some of this has been resolved by piping file lists through
xargs but while it's possible for us to modify our internal scripts we
can't modify 3rd party software.

What I am looking for is a cache speedup on the large directory that
is proportional to the speedup on the smaller directory and I believe
that would likely resolve our issues. I'm not sure why we're not
seeing the same speedup and can only surmise that there is a
limitation on the amount of information that can be cached. I thought
that increasing the reclaim_limit might work but so far I can't see
any appreciable effect.

Thanks,

Peter
~

> -----Original Message-----
> From: linux-cluster-bounces redhat com
[mailto:linux-cluster-bounces redhat com]
> On Behalf Of Peter Schobel
> Sent: Wednesday, August 11, 2010 2:04 PM
> To: linux clustering
> Subject: [Linux-cluster] How does caching work in GFS1?
>
> I am having an issue with a GFS1 cluster in some use cases. Mainly,
> running du on a directory takes an unusually long time. I have the
> filesystem mounted with noatime and nodiratime statfs_fast is turned
> on. Running du on a 2G directory takes about 2 minutes and each
> subsequent run took about the same amount of time.

A stat() call over GFS is slow, period.  How many files are in the 2GB
directory?  I would expect the time to be a linear function of the
number of files, not the file sizes.

The problem with du isn't that it's reading the directory (which is
quite fast) but that it needs to stat() each file and directory it finds
in order to compute a total size.

We have seen similar performance with a GFS filesystem over which we
regularly rsync entire directory trees.

> I have been trying to tweak tunables such as glock_purge and
> reclaim_limit but to no avail.

All I found that would help me is increasing demote_secs.  I believe
that causes locks to be held for a longer period of time, so that the
initial directory traversal is slow, but subsequent traversals are fast.

If however you are running "du" on multiple cluster nodes at the same
time, I don't think it'll help at all.

> If I could get the same
> speedup on the 30G directory as I'm getting on the 2G directory I
> would be very happy and so would the users on the cluster.

Out of sheer curiosity do your users need to literally run "du" commands
routinely, or is that just a simplification of the actual use case?

Depending on what your application does, there may be strategies in
software that would optimize your performance on GFS.

-Jeff

On Wed, Aug 11, 2010 at 11:03 AM, Peter Schobel <pschobel at 1iopen.net> wrote:
> Hi,
>
> I am having an issue with a GFS1 cluster in some use cases. Mainly,
> running du on a directory takes an unusually long time. I have the
> filesystem mounted with noatime and nodiratime statfs_fast is turned
> on. Running du on a 2G directory takes about 2 minutes and each
> subsequent run took about the same amount of time. Following a tip
> that I got, I turned off kernel i/o scheduling (echo noop >
> /sys/block/sdc/queue/scheduler) and after I did so, I discovered that
> the initial run of du took the same amount of time but subsequent runs
> were very fast presumably due to some glock caching benefit (see
> results below).
>
> [testuser at buildmgmt-000 testdir]$ for ((i=0;i<=3;i++)); do time du
>>/dev/null; done
>
> real    2m10.133s
> user    0m0.193s
> sys     0m14.579s
>
> real    0m1.948s
> user    0m0.043s
> sys     0m1.048s
>
> real    0m0.277s
> user    0m0.034s
> sys     0m0.240s
>
> real    0m0.274s
> user    0m0.033s
> sys     0m0.239s
>
> This looked very promising but then I discovered that the same speedup
> benefit was not realized when traversing our full directory tree.
> Following are the results for a 30G directory tree on the same
> filesystem.
>
> [testuser at buildmgmt-000 main0]$ for ((i=0;i<=3;i++)); do time du
>>/dev/null; done
>
> real    5m41.908s
> user    0m0.596s
> sys     0m36.141s
>
> real    3m45.757s
> user    0m0.574s
> sys     0m43.868s
>
> real    3m17.756s
> user    0m0.484s
> sys     0m44.666s
>
> real    3m15.267s
> user    0m0.535s
> sys     0m45.981s
>
> I have been trying to tweak tunables such as glock_purge and
> reclaim_limit but to no avail. I assume that I am running up against
> some kind of cache size limit but I'm not sure how to circumvent it.
> There are no other cluster nodes accessing the same test data so there
> should not be any lock contention issues. If I could get the same
> speedup on the 30G directory as I'm getting on the 2G directory I
> would be very happy and so would the users on the cluster. Any help
> would be appreciated.
>
> Regards,
>
> --
> Peter Schobel
> ~
>

-- 
Peter Schobel
~