[Linux-cluster] Re: SMP and GFS

Patrick Caulfield pcaulfie at redhat.com
Mon Oct 3 11:02:40 UTC 2005


Axel Thimm wrote:
> On Mon, Oct 03, 2005 at 10:31:02AM +0100, Patrick Caulfield wrote:
> 
>>Axel Thimm wrote:
>>
>>>On Mon, Oct 03, 2005 at 07:59:22AM +0100, Patrick Caulfield wrote:
>>>
>>>
>>>>Axel Thimm wrote:
>>>>
>>>>
>>>>>On Thu, Jul 14, 2005 at 04:57:51PM -0400, Manuel Bujan wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Is there any  issue I should be aware of if SMP is enabled in
>>>>>>my kernel ? What if I compile my kernel to be pre-emptible ? Any problem with that and GFS ?
>>>>>>
>>>>
>>>>Pre-emptible kernels will not work with GFS, that's certain.
>>>
>>>
>>>My report was on a RHEL4 kernel.
>>
>>
>>...but you did ask about pre-emtible kernels :)
> 
> 
> No, I didn't, that was Manuel Bujan 6 weeks ago. ;)
> 
> I replied that I saw the same einval messages on a RHEL4 kernel.
> 
> 
>>The important messages here are these :
>>
>>
>>>Sep 30 05:08:33 zs03 kernel: CMAN: removing node zs02 from the cluster :
>>
>>Missed too many heartbeats (P:kernel)
>>
>>>Sep 30 05:08:39 zs03 kernel: CMAN: removing node zs01 from the cluster : No
>>
>>response to messages (P:kernel)
>>
>>
>>showing that a node has been kicked out of the cluster for not responding
>>quickly enough to messages. You could try increasing the value in
>>
>>/proc/cluster/config/cman/max_retries
> 
> 
> I know, but that doesn't explain the einval messages, or does it? Or
> formulated differently: the einval messages show that the dual Xeon
> box had some issues with sockets and its being kicked out could be
> just a symptom of that.

it probably does explain them. If the node is kicked out of the cluster, the DLM
starts return -EINVAL from lock ops (because the lockspace no longer exists).
This very often causes the GFS lock_dlm module to oops.


The bugzillas are confused about this but it sort-of exists as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=165160
-- 

patrick




More information about the Linux-cluster mailing list