[Linux-cluster] Cannot add nodes

Patrick Caulfield pcaulfie at redhat.com
Wed Nov 14 09:39:23 UTC 2007


isplist at logicore.net wrote:
>>> What should I be looking for to post here?
>>>
>>>       
>> The exact detail of any kernel panic you are seeing .. ALL the text.
>> and then the obvious stuff: cluster.conf file, version numbers of all
>> cluster software, distribution and where you got them from.
>> Copies of things in /proc/cluster are always helpful too, if you can get
>> them from any running node (please say which node).
>>     
>
> That's a lot of info :). Got some of it at least;
>
> ccs 1.0.7-0.XOS.1
> cman 1.0.11-0.XOS.1
> cman-kernel 2.6.9-36.0.XOS.1
> cman-kernel 2.6.9-45.8.XOS.1
> cman-kernel 2.6.9-45.15.XOS.1
> fence 1.32.25-1.XOS.1
> lvm2-cluster 2.02.06-7.0.RHEL4.XOS.1
> magma 1.0.6-0.XOS.1
> magma-plugins 1.0.9-0.XOS.1
> piranha 0.8.2-1.XOS.1
> system-config-cluster 1.0.27-1.0.XOS.1
>
> And yes, I know I'm running old versions but all of the nodes are running the 
> same things and it works fine for me, cept for this new problem :). Now, as I 
> posted this, it does dawn on me that the new node (img62) would have newer 
> versions of all of the above installed. Would this be the cause? Should I 
> upgrade all nodes to the latest versions?
>
>   
Yes, it could be that the versions are out of step. I'm not sure about
what's in each of those versions as I don't recognise the numbers, there
were some incompatibilities between very old versions of cman and newer
ones. So I strongly recommend upgradeing .. or, at least using the same
version on all nodes.
> This is the kernel panic from .58 when .62 (img62) tries to join the cluster. 
> The new node does have an updated cluster.conf and so do all of the other 
> nodes to reflect the new node joining. All nodes had their hosts file updated 
> also so that they know about it's IP.
>
> Nov 13 09:59:32 compdev kernel: klogd 1.4.1, log source = /proc/kmsg started.
> Nov 13 10:03:40 compdev kernel: CMAN: node img62.domain.com rejoining
> Nov 13 10:03:42 compdev kernel: Unable to handle kernel paging request at 
> virtual address 008c9689
> Nov 13 10:03:42 compdev kernel:  printing eip:
> Nov 13 10:03:42 compdev kernel: e09e0d19
> Nov 13 10:03:42 compdev kernel: *pde = 00000000
> Nov 13 10:03:42 compdev kernel: Oops: 0000 [#1]
> Nov 13 10:03:42 compdev kernel: Modules linked in: autofs4 dlm(U) cman(U) md5 
> ipv6 sunrpc dm_mirror uhci_hcd e100 mii floppy ext3 jbd dm_mod qla2200 qla2xxx 
> scsi_transport_fc sd_mod scsi_mod
> Nov 13 10:03:42 compdev kernel: CPU:    0
> Nov 13 10:03:42 compdev kernel: EIP:    0060:[<e09e0d19>]    Not tainted VLI
> Nov 13 10:03:42 compdev kernel: EFLAGS: 00010202   (2.6.9-42.0.10.EL.XOS.1)
> Nov 13 10:03:42 compdev kernel: EIP is at process_join_request+0x65/0x1ba 
> [cman]
> Nov 13 10:03:42 compdev kernel: eax: 00000000   ebx: 008c9689   ecx: e09f20c0 
>   edx: dd439000
> Nov 13 10:03:42 compdev kernel: esi: 00006564   edi: 0000003a   ebp: dd439f98 
>   esp: dd439f58
> Nov 13 10:03:42 compdev kernel: ds: 007b   es: 007b   ss: 0068
> Nov 13 10:03:42 compdev kernel: Process cman_serviced (pid: 2212, 
> threadinfo=dd439000 task=de793340)
> Nov 13 10:03:42 compdev kernel: Stack: 00000000 d6f9c014 0000003e 00000000 
> 00000000 00000000 00000000 00000000
> Nov 13 10:03:42 compdev kernel:        95eb1078 0003641b de750ae0 0000003e 
> d6f9c000 dd439f98 e09de8a3 e09e1125
> Nov 13 10:03:42 compdev kernel:        00000001 00000000 00000000 00070000 
> 61666564 06e57ac4 000000d9 de793340
> Nov 13 10:03:42 compdev kernel: Call Trace:
> Nov 13 10:03:42 compdev kernel:  [<e09de8a3>] serviced+0x0/0x140 [cman]
> Nov 13 10:03:42 compdev kernel:  [<e09e1125>] process_message+0x32/0x93 [cman]
> Nov 13 10:03:42 compdev kernel:  [<e09e12a9>] process_messages+0x123/0x13e 
>   
Patrick




More information about the Linux-cluster mailing list