[Linux-cluster] GFS hangs, nodes die
sebastian.walter at fu-berlin.de
Fri Aug 24 11:44:30 UTC 2007
thank you for your answer. Fencing works fine, also everything else
works for long times, except when the I/O raises a certain level....
The kernel instead could indeed be a problem! 2.6.9-55.02 is the
standard Update5 kernel also for CentOS, but I had to downgrade as all
the cs/gfs packages were dependant on 2.6.9-55.0
I totally forgot about it... I will compile everything for the new
kernel now, let's see.
PS: Btw, the hanging df processes came from the daily logwatch...
Marc Grimme wrote:
> Hi Sebastian,
> just to double check. Fencing and everything works as expected, right?
> 2nd the latest RHEL4 kernel is 2.6.9-55.0.2 (is that also available for
> centos?). If yes you might think about updating. I'm not sure if something
> was updated within dlm/gfs but my tests were done with 2.6.9-55.0.2 and I
> didn't encounter those problems whereas before I had huge amounts of locks
> (~2times the number of files on the fs).
> On Friday 24 August 2007 12:37:15 Sebastian Walter wrote:
>> Hi list,
>> just an update. In my scripts, there is nothing about searching the
>> whole file system, but I see several "df" processes blocking the system
>> with 100 % CPU. I will update firmwares now and check for better QLogic
>> drivers. Thanks!
> I fear that a firmware update will not change anything but it's always a good
> option ;-) . I also doubt about the Qlogic drivers cause the ones in 2.6.9-55
> are quite ok (did you configure multipathing properly?).
> Is that df and everything running concurrently on different nodes?
> Last but not least are the "unable to obtain locks" messages the only messages
> that you see when getting problems?
> Regards Marc.
>> Marc Grimme wrote:
>>> On Tuesday 21 August 2007 09:52:32 Sebastian Walter wrote:
>>>> Marc Grimme wrote:
>>>>> Do you also see some messages on the console of the nodes. And the
>>>>> counters would help before that problem occures. So let it run
>>>>> sometimes before to see if locks increase.
>>>>> What kind of stress tests are you doing? I bet searching the whole
>>>>> filesystem. What makes me wonder is that the gfs_tool glock_purge does
>>>>> not work whereas it worked for me with exactly the same problems. Did
>>>>> you set it _AFTER_ the fs was mounted?
>>> Sorry I mean after is right and before not ;-( .
>>> And are you using the latest version of CS/GFS?
>>> Do you have a lot of memory in your machines 16G or more?
>>>> That makes me optimistic. I set it after the volume was mounted, so I
>>>> will give it another try setting it before mounting it. Then I will also
>>>> mail myself the output of the counters every 10 minuts. Let's see...
>>> I would be interested in the counters.
>>> Also add the process list in order to see if how much CPU-Time gfs_scand
>>> ps axwwww | sort -k4 -n | tail -10
>>> Have fun Marc.
>>>> ...with best thanks
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
More information about the Linux-cluster