[Linux-cluster] gfs2_tool settune demote_secs

Mon Oct 12 10:07:44 UTC 2009

Hi,

On Fri, 2009-10-09 at 10:57 -0700, Scooter Morris wrote:
> Steve,
>     Thanks for the prompt reply.  Like Kaerka, I'm running on
> large-memory servers and decreasing demote_secs from 300 to 20
> resulted in significant performance improvements because locks get
> freed much more quickly (I assume), resulting in much better response.
> It could certainly be that changing demote_secs was a workaround for a
> different bug that has now been fixed, which would be great.  I'll try
> some tests today and see how "rm -rf" on a large directory behaves.
> 
> -- scooter
> 
The question though, is why that should result in a better response. It
doesn't really make sense, since the caching of the "locks" (really
caching of data and metadata controlled by a lock) should improve the
performance due to more time to write out the dirty data.

Doing an "rm -fr" is also a very different workload to that of reading
all the files in the filesystem once (for backup purposes for example)
since the "rm -fr" requires writing to the fs and the backup process
doesn't do any writing.

How long it takes to remove a file also depends to a large extent on its
size.

In both cases, however it would improve performance if you could arrange
to remove, or read inodes in inode number order. Both GFS and GFS2
return inodes from getdents64 (readdir) in a pseudo-random order based
on the hash of the filename. You can gain a lot of performance if these
results are sorted before they are scanned.

Ideally we'd return them from the fs in sorted order. Unfortunately a
design decision which was made a long time ago which, in combination
with the design of the Linux VFS prevents us from doing that.

If there is a problem with a node caching the whole filesystem after it
has been scanned, then it is still possible to solve this issue:

echo 3 > /proc/sys/vm/drop_caches

I guess I should also point out that it is a good idea to mount with the
noatime mount option if there is going to be a read-only scan of the
complete filesystem on a regular basis, since that will prevent that
becoming a "write to every inode" scan. That will also make a big
performance difference. Note that its ok (in recent kernels) to mount a
GFS2 filesystem more than once with different atime flags (using bind
mounts) in case you have an application which requires atime, but you
want to avoid it when running a back up.

There is also /proc/sys/vm/vfs_cache_pressure as well, which may help
optimise your workload.

... and if all that fails, then the next thing to do is to use
blktrace/seekwatcher to find out whats really going on, on the disk and
send the results so that we can have a look and see if we can improve
the disk I/O. Better still if you can combine that with a trace from the
gfs2 tracepoints so we can see the locking at the same time,

Steve.

> Kaerka Phillips wrote: 
> > If in gfs2 glocks are purged based upon memory constraints, what
> > happens if it is run on a box with large amounts of memory? i.e.
> > RHEL5.x with 128gb ram?  We ended up having to move away from GFS2
> > due to serious performance issues with this exact setup, and our
> > performance issues were largely centered around commands like ls or
> > rm against gfs2 filesystems with large directory structures and
> > millions of files in them.
> > 
> > In our case, something as simple as copying a whole filesystem to
> > another filesystem would cause a load avg of 50 or more, and would
> > take 8+ hours to complete.  The same thing on NFS or ext3 would take
> > usually 1 to 2 hours.  Netbackup of 10 of those filesystems took ~40
> > hours to complete, so we were getting maybe 1 good backup per week,
> > and in some cases the backup itself caused cluster crash.
> > 
> > We are still using our GFS1 clusters, since as long as their network
> > is stable, their performance is very good, but we are phasing out
> > most of our GFS2 clusters to NFS instead.
> > 
> > On Fri, Oct 9, 2009 at 1:01 PM, Steven Whitehouse
> > <swhiteho at redhat.com> wrote:
> >         Hi,
> >         
> >         On Fri, 2009-10-09 at 09:55 -0700, Scooter Morris wrote:
> >         > Hi all,
> >         >     On RHEL 5.3/5.4(?) we had changed the value of
> >         demote_secs to
> >         > significantly improve the performance of our gfs2
> >         filesystem for certain
> >         > tasks (notably rm -r on large directories).  I recently
> >         noticed that
> >         > that tuning value is no longer available (part of a recent
> >         update, or
> >         > part of 5.4?).  Can someone tell me what, if anything
> >         replaces this?  Is
> >         > it now a mount option, or is there some other way to tune
> >         this value?
> >         >
> >         > Thanks in advance.
> >         >
> >         > -- scooter
> >         >
> >         
> >         > --
> >         > Linux-cluster mailing list
> >         > Linux-cluster at redhat.com
> >         > https://www.redhat.com/mailman/listinfo/linux-cluster
> >         
> >         Nothing replaces it. The glocks are disposed of
> >         automatically on an LRU
> >         basis when there is enough memory pressure to require it.
> >         You can alter
> >         the amount of memory pressure on the VFS caches (including
> >         the glocks)
> >         but not specifically the glocks themselves.
> >         
> >         The idea is that is should be self-tuning now, adjusting
> >         itself to the
> >         conditions prevailing at the time. If there are any
> >         remaining
> >         performance issues though, we'd like to know so that they
> >         can be
> >         addressed,
> >         
> >         Steve.
> >         
> >         
> >         --
> >         Linux-cluster mailing list
> >         Linux-cluster at redhat.com
> >         https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > 
> > ____________________________________________________________________
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster