[Linux-cluster] GFS Performance Problems (RHEL5)

Tue Nov 27 23:54:19 UTC 2007

Yes and No.

I've been running a RHEL 4.x server connected to a VTrak M500i with 
750GB disks for the last year, and it's run beautifully.  I have had no 
performance problems with a 5TB volume (the disk array wasn't fully loaded).

In an effort to increase storage, I just purchased a VTrak 610 with 1TB 
disks and prepped it exactly like the other (except with RHEL5).  The 
ultimate goal is to have two servers in an active/passive configuration 
serving SAMBA.

Would you be willing to share your discoveries?
Paul

James Chamberlain wrote:
> Hi Paul,
>
> I'm guessing from the information you give below that you're using a 
> Promise VTrak M500i with 1 TB disks?  Can you confirm this?  I had 
> uneven experience with that platform, which led me to abandon it; but 
> I did make one or two discoveries along the way which may be useful if 
> they are applicable to your setup.  Can you share a little more about 
> your hardware and setup?
>
> Regards,
>
> James Chamberlain
>
> On Tue, 27 Nov 2007, Paul Risenhoover wrote:
>
>>
>> Sorry about this mis-send.
>>
>> I'm guessing my problem has to do with this:
>>
>> https://www.redhat.com/archives/linux-cluster/2007-October/msg00332.html
>>
>> BTW: My file system is 13TB.
>>
>> I found this article that talks about tuning the glock_purge setting:
>> http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
>>
>> But it seems to require a special kernel module that I don't have :(. 
>> Anybody know where I can get it?
>>
>> Paul
>>
>> Paul Risenhoover wrote:
>>>  Hi All,
>>>
>>>  I am experiencing some substantial performance problems on my RHEL 5
>>>  server running GFS.  The specific symptom that I'm seeing is that 
>>> the file
>>>  system will hang for anywhere from 5 to 45 seconds on occasion.  
>>> When this
>>>  happens it stalls all processes that are attempting to access the file
>>>  system (ie, "ls -l") such that even a ctrl-break can't stop it.
>>>
>>>  It also appears that gfs_scand is working extremely hard.  It runs at
>>>  7-10% CPU almost constantly.  I did some research on this and 
>>> discovered a
>>>  discussion about cluster locking in relation to directories with large
>>>  numbers of files, and believe it might be related.  I've got some
>>>  directories with 5000+ files.  However, I get the stalling behavior 
>>> even
>>>  when nothing is accessing those particular directories.
>>>
>>>  I also tried some tuning some of the parameters:
>>>
>>>  gfs_tool settune /mnt/promise demote_secs 10
>>>  gfs_tool settune /mnt/promise scand_secs 2
>>>  gfs_tool settune /mnt/promise/ reclaim_limit 1000
>>>
>>>  But this doesn't appear to have done much.    Does anybody have some
>>>  thoughts on how I might resolve this?
>>>
>>>  Paul
>>>
>>>  --