[Linux-cluster] gfs2_tool settune demote_secs

Scooter Morris scooter at cgl.ucsf.edu
Mon Oct 12 12:57:54 UTC 2009


Steve,
    Thanks for the informative, and detailed response -- it really helps 
to understand what might be happening.  We're not mounting with noatime, 
and it sounds like that would be a good first step. 

Thanks!

-- scooter

Steven Whitehouse wrote:
> Hi,
>
> On Fri, 2009-10-09 at 10:57 -0700, Scooter Morris wrote:
>   
>> Steve,
>>     Thanks for the prompt reply.  Like Kaerka, I'm running on
>> large-memory servers and decreasing demote_secs from 300 to 20
>> resulted in significant performance improvements because locks get
>> freed much more quickly (I assume), resulting in much better response.
>> It could certainly be that changing demote_secs was a workaround for a
>> different bug that has now been fixed, which would be great.  I'll try
>> some tests today and see how "rm -rf" on a large directory behaves.
>>
>> -- scooter
>>
>>     
> The question though, is why that should result in a better response. It
> doesn't really make sense, since the caching of the "locks" (really
> caching of data and metadata controlled by a lock) should improve the
> performance due to more time to write out the dirty data.
>
> Doing an "rm -fr" is also a very different workload to that of reading
> all the files in the filesystem once (for backup purposes for example)
> since the "rm -fr" requires writing to the fs and the backup process
> doesn't do any writing.
>
> How long it takes to remove a file also depends to a large extent on its
> size.
>
> In both cases, however it would improve performance if you could arrange
> to remove, or read inodes in inode number order. Both GFS and GFS2
> return inodes from getdents64 (readdir) in a pseudo-random order based
> on the hash of the filename. You can gain a lot of performance if these
> results are sorted before they are scanned.
>
> Ideally we'd return them from the fs in sorted order. Unfortunately a
> design decision which was made a long time ago which, in combination
> with the design of the Linux VFS prevents us from doing that.
>
> If there is a problem with a node caching the whole filesystem after it
> has been scanned, then it is still possible to solve this issue:
>
> echo 3 > /proc/sys/vm/drop_caches
>
> I guess I should also point out that it is a good idea to mount with the
> noatime mount option if there is going to be a read-only scan of the
> complete filesystem on a regular basis, since that will prevent that
> becoming a "write to every inode" scan. That will also make a big
> performance difference. Note that its ok (in recent kernels) to mount a
> GFS2 filesystem more than once with different atime flags (using bind
> mounts) in case you have an application which requires atime, but you
> want to avoid it when running a back up.
>
> There is also /proc/sys/vm/vfs_cache_pressure as well, which may help
> optimise your workload.
>
> ... and if all that fails, then the next thing to do is to use
> blktrace/seekwatcher to find out whats really going on, on the disk and
> send the results so that we can have a look and see if we can improve
> the disk I/O. Better still if you can combine that with a trace from the
> gfs2 tracepoints so we can see the locking at the same time,
>
> Steve.
>
>   
>> Kaerka Phillips wrote: 
>>     
>>> If in gfs2 glocks are purged based upon memory constraints, what
>>> happens if it is run on a box with large amounts of memory? i.e.
>>> RHEL5.x with 128gb ram?  We ended up having to move away from GFS2
>>> due to serious performance issues with this exact setup, and our
>>> performance issues were largely centered around commands like ls or
>>> rm against gfs2 filesystems with large directory structures and
>>> millions of files in them.
>>>
>>> In our case, something as simple as copying a whole filesystem to
>>> another filesystem would cause a load avg of 50 or more, and would
>>> take 8+ hours to complete.  The same thing on NFS or ext3 would take
>>> usually 1 to 2 hours.  Netbackup of 10 of those filesystems took ~40
>>> hours to complete, so we were getting maybe 1 good backup per week,
>>> and in some cases the backup itself caused cluster crash.
>>>
>>> We are still using our GFS1 clusters, since as long as their network
>>> is stable, their performance is very good, but we are phasing out
>>> most of our GFS2 clusters to NFS instead.
>>>
>>> On Fri, Oct 9, 2009 at 1:01 PM, Steven Whitehouse
>>> <swhiteho at redhat.com> wrote:
>>>         Hi,
>>>         
>>>         On Fri, 2009-10-09 at 09:55 -0700, Scooter Morris wrote:
>>>         > Hi all,
>>>         >     On RHEL 5.3/5.4(?) we had changed the value of
>>>         demote_secs to
>>>         > significantly improve the performance of our gfs2
>>>         filesystem for certain
>>>         > tasks (notably rm -r on large directories).  I recently
>>>         noticed that
>>>         > that tuning value is no longer available (part of a recent
>>>         update, or
>>>         > part of 5.4?).  Can someone tell me what, if anything
>>>         replaces this?  Is
>>>         > it now a mount option, or is there some other way to tune
>>>         this value?
>>>         >
>>>         > Thanks in advance.
>>>         >
>>>         > -- scooter
>>>         >
>>>         
>>>         > --
>>>         > Linux-cluster mailing list
>>>         > Linux-cluster at redhat.com
>>>         > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>         
>>>         Nothing replaces it. The glocks are disposed of
>>>         automatically on an LRU
>>>         basis when there is enough memory pressure to require it.
>>>         You can alter
>>>         the amount of memory pressure on the VFS caches (including
>>>         the glocks)
>>>         but not specifically the glocks themselves.
>>>         
>>>         The idea is that is should be self-tuning now, adjusting
>>>         itself to the
>>>         conditions prevailing at the time. If there are any
>>>         remaining
>>>         performance issues though, we'd like to know so that they
>>>         can be
>>>         addressed,
>>>         
>>>         Steve.
>>>         
>>>         
>>>         --
>>>         Linux-cluster mailing list
>>>         Linux-cluster at redhat.com
>>>         https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>> ____________________________________________________________________
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>       
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>     
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091012/5fc3592c/attachment.htm>


More information about the Linux-cluster mailing list