[Linux-cluster] How does caching work in GFS1?

Thu Aug 12 16:25:49 UTC 2010

In this cluster the nodes only have 4G of RAM and 4G of swap. Running
top indicates that there are about 3G used and 1G free and nothing is
swapping.

So running gfs_tool counters shows me that there are around 200000
locks and around 150000-160000 locks held.

Glocks reclaimed is 22764386 and the rate is around 0-1000/s

When running the test on the large directory, I see that the number of
locks and locks held stays pretty much the same (below 200000). The
number of glocks reclaimed fluctuates between 0-15000/s. The number of
glock nq calls and glock dq calls is between 3000-6000/s.

Currently I have glock_purge = 0 and reclaim_limit = 500000 so I don't
understand why there are any glocks being reclaimed at all. This is
what I have been struggling with since I don't feel as though the
tunable parameter changes are doing anything.

When running the test on the smaller directory, I see that the the
usage patterns are pretty much the same initially but following that,
the glocks reclaimed drops to between 0-5000/s and then on subsequent
runs, the du completes so quickly and I can't see any glocks being
reclaimed at all which is the behavior I would expect to see.

It's a bit perplexing.

Peter
~

On Wed, Aug 11, 2010 at 2:35 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com]
>> On Behalf Of Peter Schobel
>> Sent: Wednesday, August 11, 2010 3:28 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] How does caching work in GFS1?
>>
>> Increasing demote_secs did not seem to have an appreciable effect.
>
> We run some hosts with demote_secs=86400, for what it's worth.  They
> tend to go through a "cold start" each morning, but are responsive for
> the remainder of the day.
>
>> The du command is a simplification of the use case. Our developers run
>> scripts which make tags in source code directories which require
>> stat'ing the files.
>
> Gotcha.  I don't have many good suggestions for version control, but I
> can offer commiseration.  Some systems are worse than others.
> (Subversion for instance tends to create lots of little lock files, and
> performs very poorly on just about every filesystem we've tried.)
>
> How much RAM do you have?  All filesystems like plenty of cache.
>
> One thing you can do is run "gfs_tool counters <mount-point>" a few
> times during your 20GB test, that may give you some insight.  For
> example, does the number of locks increase steadily or does it plateau?
> Does it abruptly drop following the test?  Does the "glocks reclaimed"
> number accumulate rapidly?  When locks are held, stat() operations tend
> to be very fast.  When a lock has to be obtained, that's when they are
> slow.
>
> (Any cluster engineers out there, feel free to tell me if any of this is
> right or wrong--I've had to base my understanding of GFS on a lot of
> experimentation and empirical evidence, not on a deep understanding of
> the software.)
>
> -Jeff
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

-- 
Peter Schobel
~