[Linux-cluster] gfs2_tool settune demote_secs
Scooter Morris
scooter at cgl.ucsf.edu
Mon Oct 12 12:57:54 UTC 2009
Steve,
Thanks for the informative, and detailed response -- it really helps
to understand what might be happening. We're not mounting with noatime,
and it sounds like that would be a good first step.
Thanks!
-- scooter
Steven Whitehouse wrote:
> Hi,
>
> On Fri, 2009-10-09 at 10:57 -0700, Scooter Morris wrote:
>
>> Steve,
>> Thanks for the prompt reply. Like Kaerka, I'm running on
>> large-memory servers and decreasing demote_secs from 300 to 20
>> resulted in significant performance improvements because locks get
>> freed much more quickly (I assume), resulting in much better response.
>> It could certainly be that changing demote_secs was a workaround for a
>> different bug that has now been fixed, which would be great. I'll try
>> some tests today and see how "rm -rf" on a large directory behaves.
>>
>> -- scooter
>>
>>
> The question though, is why that should result in a better response. It
> doesn't really make sense, since the caching of the "locks" (really
> caching of data and metadata controlled by a lock) should improve the
> performance due to more time to write out the dirty data.
>
> Doing an "rm -fr" is also a very different workload to that of reading
> all the files in the filesystem once (for backup purposes for example)
> since the "rm -fr" requires writing to the fs and the backup process
> doesn't do any writing.
>
> How long it takes to remove a file also depends to a large extent on its
> size.
>
> In both cases, however it would improve performance if you could arrange
> to remove, or read inodes in inode number order. Both GFS and GFS2
> return inodes from getdents64 (readdir) in a pseudo-random order based
> on the hash of the filename. You can gain a lot of performance if these
> results are sorted before they are scanned.
>
> Ideally we'd return them from the fs in sorted order. Unfortunately a
> design decision which was made a long time ago which, in combination
> with the design of the Linux VFS prevents us from doing that.
>
> If there is a problem with a node caching the whole filesystem after it
> has been scanned, then it is still possible to solve this issue:
>
> echo 3 > /proc/sys/vm/drop_caches
>
> I guess I should also point out that it is a good idea to mount with the
> noatime mount option if there is going to be a read-only scan of the
> complete filesystem on a regular basis, since that will prevent that
> becoming a "write to every inode" scan. That will also make a big
> performance difference. Note that its ok (in recent kernels) to mount a
> GFS2 filesystem more than once with different atime flags (using bind
> mounts) in case you have an application which requires atime, but you
> want to avoid it when running a back up.
>
> There is also /proc/sys/vm/vfs_cache_pressure as well, which may help
> optimise your workload.
>
> ... and if all that fails, then the next thing to do is to use
> blktrace/seekwatcher to find out whats really going on, on the disk and
> send the results so that we can have a look and see if we can improve
> the disk I/O. Better still if you can combine that with a trace from the
> gfs2 tracepoints so we can see the locking at the same time,
>
> Steve.
>
>
>> Kaerka Phillips wrote:
>>
>>> If in gfs2 glocks are purged based upon memory constraints, what
>>> happens if it is run on a box with large amounts of memory? i.e.
>>> RHEL5.x with 128gb ram? We ended up having to move away from GFS2
>>> due to serious performance issues with this exact setup, and our
>>> performance issues were largely centered around commands like ls or
>>> rm against gfs2 filesystems with large directory structures and
>>> millions of files in them.
>>>
>>> In our case, something as simple as copying a whole filesystem to
>>> another filesystem would cause a load avg of 50 or more, and would
>>> take 8+ hours to complete. The same thing on NFS or ext3 would take
>>> usually 1 to 2 hours. Netbackup of 10 of those filesystems took ~40
>>> hours to complete, so we were getting maybe 1 good backup per week,
>>> and in some cases the backup itself caused cluster crash.
>>>
>>> We are still using our GFS1 clusters, since as long as their network
>>> is stable, their performance is very good, but we are phasing out
>>> most of our GFS2 clusters to NFS instead.
>>>
>>> On Fri, Oct 9, 2009 at 1:01 PM, Steven Whitehouse
>>> <swhiteho at redhat.com> wrote:
>>> Hi,
>>>
>>> On Fri, 2009-10-09 at 09:55 -0700, Scooter Morris wrote:
>>> > Hi all,
>>> > On RHEL 5.3/5.4(?) we had changed the value of
>>> demote_secs to
>>> > significantly improve the performance of our gfs2
>>> filesystem for certain
>>> > tasks (notably rm -r on large directories). I recently
>>> noticed that
>>> > that tuning value is no longer available (part of a recent
>>> update, or
>>> > part of 5.4?). Can someone tell me what, if anything
>>> replaces this? Is
>>> > it now a mount option, or is there some other way to tune
>>> this value?
>>> >
>>> > Thanks in advance.
>>> >
>>> > -- scooter
>>> >
>>>
>>> > --
>>> > Linux-cluster mailing list
>>> > Linux-cluster at redhat.com
>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> Nothing replaces it. The glocks are disposed of
>>> automatically on an LRU
>>> basis when there is enough memory pressure to require it.
>>> You can alter
>>> the amount of memory pressure on the VFS caches (including
>>> the glocks)
>>> but not specifically the glocks themselves.
>>>
>>> The idea is that is should be self-tuning now, adjusting
>>> itself to the
>>> conditions prevailing at the time. If there are any
>>> remaining
>>> performance issues though, we'd like to know so that they
>>> can be
>>> addressed,
>>>
>>> Steve.
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>> ____________________________________________________________________
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091012/5fc3592c/attachment.htm>
More information about the Linux-cluster
mailing list