[Linux-cluster] gfs tuning

Terry td3201 at gmail.com
Wed Jun 18 17:48:07 UTC 2008


On Tue, Jun 17, 2008 at 5:22 PM, Terry <td3201 at gmail.com> wrote:
> On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng <s.wendy.cheng at gmail.com> wrote:
>> Hi, Terry,
>>>
>>> I am still seeing some high load averages.  Here is an example of a
>>> gfs configuration.  I left statfs_fast off as it would not apply to
>>> one of my volumes for an unknown reason.  Not sure that would have
>>> helped anyways.  I do, however, feel that reducing scand_secs helped a
>>> little:
>>>
>>
>> Sorry I missed scand_secs (was mindless as the brain was mostly occupied by
>> day time work).
>>
>> To simplify the view, glock states include exclusive (write), share (read),
>> and not-locked (in reality, there are more). Exclusive lock has to be
>> demoted (demote_secs) to share, then to not-locked (another demote_secs)
>> before it is scanned (every scand_secs) to get added into reclaim list where
>> it can be purged. Between exclusive and share state transition, the file
>> contents need to get flushed to disk (to keep file content cluster
>> coherent).  All of above assume the file (protected by this glock) is not
>> accessed (idle).
>>
>> You hit an area that GFS normally doesn't perform well. With GFS1 in
>> maintenance mode while GFS2 seems to be so far away, ext3 could be a better
>> answer. However, before switching, do make sure to test it thoroughly (since
>> Ext3 could have the very same issue as well - check out:
>> http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).
>>
>> Did you look (and test) GFS "nolock" protocol (for single node GFS)? It
>> bypasses some locking overhead and can be switched to  DLM in the future
>> (just make sure you reserve enough journal space - the rule of thumb is one
>> journal per node and know how many nodes you plan to have in the future).
>>
>> -- Wendy
>
> Good points.  I could try the nolock feature I suppose.  Not quite
> clear on how to reserve journal space.  I forgot to post the cpu time,
> check out this:
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4822 root      10  -5     0    0    0 S    1  0.0   2159:15 dlm_recv
>  4820 root      10  -5     0    0    0 S    1  0.0 368:09.34 dlm_astd
>  4821 root      10  -5     0    0    0 S    0  0.0 153:06.80 dlm_scand
>  3659 root      10  -5     0    0    0 S    0  0.0 134:40.14 scsi_wq_4
>  4823 root      11  -5     0    0    0 S    1  0.0 109:33.33 dlm_send
>  367 root      10  -5     0    0    0 S    0  0.0 103:33.74 kswapd0
>
> gfs_glockd is further below so not so concerned with that right now.
> It appears turning on nolock would do the trick.  The times aren't
> extremely accurate because I have failed this cluster between nodes
> while testing.
>

Here is some more testing information....

I created a new volume on my iscsi san of 1 TB and formatted it for
ext3. I then used dd to create a 100G file.  This yielded roughly 900
Mb/sec.  I then stopped my application and did the same thing with an
existing GFS volume.  This gave me about 850 Kb/sec.  This isn't an
iscsi issue.  This appears to be a load issue and the number of I/O
occurring on these volumes.  That said, I would expect that performing
the changes I did would result in a major performance improvement.
Since it didn't, what are my other points I could consider?   If its a
GFS issue, ext3 is the way to go.  Maybe even switch to using
active-active on my NFS cluster.   If its a backend disk issue, I
would expect to see the throughput on my iscsi link (bond1) be fully
utilized.  Its not.  Could I be thrashing the disks?  This is an iscsi
san with 30 sata disks.  Just bouncing some thoughts around to see if
anyone has any more thoughts.

Thanks!




More information about the Linux-cluster mailing list