[Linux-cluster] GFS hangs, nodes die

Sebastian Walter sebastian.walter at fu-berlin.de
Fri Aug 24 13:27:38 UTC 2007


Hi Marc,

did you install the CSGFS packages from the Red Hat repository? I tried
to compile the packages on the new kernel using the centos .src.rpm's
but they also rely on the older kernel. I would have to use a generic
www.kernel.org kernel to compile the cluster-suite packages from source,
which I want to avoid. If the Redhat modules are built on 2.6.9-55.0.2,
there would be a problem with the CentOS repository...

thanks & regards,
Sebastian

Marc Grimme wrote:
> Hi Sebastian,
> just to double check. Fencing and everything works as expected, right?
>
> 2nd the latest RHEL4 kernel is 2.6.9-55.0.2 (is that also available for 
> centos?). If yes you might think about updating. I'm not sure if something 
> was updated within dlm/gfs but my tests were done with 2.6.9-55.0.2 and I 
> didn't encounter those problems whereas before I had huge amounts of locks 
> (~2times the number of files on the fs).
>
> On Friday 24 August 2007 12:37:15 Sebastian Walter wrote:
>   
>> Hi list,
>>
>> just an update. In my scripts, there is nothing about searching the
>> whole file system, but I see several "df" processes blocking the system
>> with 100 % CPU. I will update firmwares now and check for better QLogic
>> drivers. Thanks!
>>     
> I fear that a firmware update will not change anything but it's always a good 
> option ;-) . I also doubt about the Qlogic drivers cause the ones in 2.6.9-55 
> are quite ok (did you configure multipathing properly?).
> Is that df and everything running concurrently on different nodes?
>
> Last but not least are the "unable to obtain locks" messages the only messages 
> that you see when getting problems?
>
> Regards Marc.
>   
>> Regards,
>> Sebastian
>>
>> Marc Grimme wrote:
>>     
>>> On Tuesday 21 August 2007 09:52:32 Sebastian Walter wrote:
>>>       
>>>> Hi,
>>>>
>>>> Marc Grimme wrote:
>>>>         
>>>>> Do you also see some messages on the console of the nodes. And the
>>>>> gfs_tool
>>>>> counters would help before that problem occures. So let it run
>>>>> sometimes before to see if locks increase.
>>>>> What kind of stress tests are you doing? I bet searching the whole
>>>>> filesystem. What makes me wonder is that the gfs_tool glock_purge does
>>>>> not work whereas it worked for me with exactly the same problems. Did
>>>>> you set it _AFTER_ the fs was mounted?
>>>>>           
>>> Sorry I mean after is right and before not ;-( .
>>> And are you using the latest version of CS/GFS?
>>> Do you have a lot of memory in your machines 16G or more?
>>>
>>>       
>>>> That makes me optimistic. I set it after the volume was mounted, so I
>>>> will give it another try setting it before mounting it. Then I will also
>>>> mail myself the output of the counters every 10 minuts. Let's see...
>>>>         
>>> I would be interested in the counters.
>>> Also add the process list in order to see if how much CPU-Time gfs_scand
>>> consumes.
>>> i.e.
>>> ps axwwww | sort -k4 -n | tail -10
>>>
>>> Have fun Marc.
>>>
>>>       
>>>> ...with best thanks
>>>> Sebastian
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>         
>
>
>
>   




More information about the Linux-cluster mailing list