[Linux-cluster] gfs tuning

Wed Jun 18 17:48:07 UTC 2008

On Tue, Jun 17, 2008 at 5:22 PM, Terry <td3201 at gmail.com> wrote:
> On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng <s.wendy.cheng at gmail.com> wrote:
>> Hi, Terry,
>>>
>>> I am still seeing some high load averages.  Here is an example of a
>>> gfs configuration.  I left statfs_fast off as it would not apply to
>>> one of my volumes for an unknown reason.  Not sure that would have
>>> helped anyways.  I do, however, feel that reducing scand_secs helped a
>>> little:
>>>
>>
>> Sorry I missed scand_secs (was mindless as the brain was mostly occupied by
>> day time work).
>>
>> To simplify the view, glock states include exclusive (write), share (read),
>> and not-locked (in reality, there are more). Exclusive lock has to be
>> demoted (demote_secs) to share, then to not-locked (another demote_secs)
>> before it is scanned (every scand_secs) to get added into reclaim list where
>> it can be purged. Between exclusive and share state transition, the file
>> contents need to get flushed to disk (to keep file content cluster
>> coherent).  All of above assume the file (protected by this glock) is not
>> accessed (idle).
>>
>> You hit an area that GFS normally doesn't perform well. With GFS1 in
>> maintenance mode while GFS2 seems to be so far away, ext3 could be a better
>> answer. However, before switching, do make sure to test it thoroughly (since
>> Ext3 could have the very same issue as well - check out:
>> http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).
>>
>> Did you look (and test) GFS "nolock" protocol (for single node GFS)? It
>> bypasses some locking overhead and can be switched to  DLM in the future
>> (just make sure you reserve enough journal space - the rule of thumb is one
>> journal per node and know how many nodes you plan to have in the future).
>>
>> -- Wendy
>
> Good points.  I could try the nolock feature I suppose.  Not quite
> clear on how to reserve journal space.  I forgot to post the cpu time,
> check out this:
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4822 root      10  -5     0    0    0 S    1  0.0   2159:15 dlm_recv
>  4820 root      10  -5     0    0    0 S    1  0.0 368:09.34 dlm_astd
>  4821 root      10  -5     0    0    0 S    0  0.0 153:06.80 dlm_scand
>  3659 root      10  -5     0    0    0 S    0  0.0 134:40.14 scsi_wq_4
>  4823 root      11  -5     0    0    0 S    1  0.0 109:33.33 dlm_send
>  367 root      10  -5     0    0    0 S    0  0.0 103:33.74 kswapd0
>
> gfs_glockd is further below so not so concerned with that right now.
> It appears turning on nolock would do the trick.  The times aren't
> extremely accurate because I have failed this cluster between nodes
> while testing.
>

Here is some more testing information....

I created a new volume on my iscsi san of 1 TB and formatted it for
ext3. I then used dd to create a 100G file.  This yielded roughly 900
Mb/sec.  I then stopped my application and did the same thing with an
existing GFS volume.  This gave me about 850 Kb/sec.  This isn't an
iscsi issue.  This appears to be a load issue and the number of I/O
occurring on these volumes.  That said, I would expect that performing
the changes I did would result in a major performance improvement.
Since it didn't, what are my other points I could consider?   If its a
GFS issue, ext3 is the way to go.  Maybe even switch to using
active-active on my NFS cluster.   If its a backend disk issue, I
would expect to see the throughput on my iscsi link (bond1) be fully
utilized.  Its not.  Could I be thrashing the disks?  This is an iscsi
san with 30 sata disks.  Just bouncing some thoughts around to see if
anyone has any more thoughts.

Thanks!