[Linux-cluster] Cannot add nodes
Patrick Caulfield
pcaulfie at redhat.com
Wed Nov 14 09:39:23 UTC 2007
isplist at logicore.net wrote:
>>> What should I be looking for to post here?
>>>
>>>
>> The exact detail of any kernel panic you are seeing .. ALL the text.
>> and then the obvious stuff: cluster.conf file, version numbers of all
>> cluster software, distribution and where you got them from.
>> Copies of things in /proc/cluster are always helpful too, if you can get
>> them from any running node (please say which node).
>>
>
> That's a lot of info :). Got some of it at least;
>
> ccs 1.0.7-0.XOS.1
> cman 1.0.11-0.XOS.1
> cman-kernel 2.6.9-36.0.XOS.1
> cman-kernel 2.6.9-45.8.XOS.1
> cman-kernel 2.6.9-45.15.XOS.1
> fence 1.32.25-1.XOS.1
> lvm2-cluster 2.02.06-7.0.RHEL4.XOS.1
> magma 1.0.6-0.XOS.1
> magma-plugins 1.0.9-0.XOS.1
> piranha 0.8.2-1.XOS.1
> system-config-cluster 1.0.27-1.0.XOS.1
>
> And yes, I know I'm running old versions but all of the nodes are running the
> same things and it works fine for me, cept for this new problem :). Now, as I
> posted this, it does dawn on me that the new node (img62) would have newer
> versions of all of the above installed. Would this be the cause? Should I
> upgrade all nodes to the latest versions?
>
>
Yes, it could be that the versions are out of step. I'm not sure about
what's in each of those versions as I don't recognise the numbers, there
were some incompatibilities between very old versions of cman and newer
ones. So I strongly recommend upgradeing .. or, at least using the same
version on all nodes.
> This is the kernel panic from .58 when .62 (img62) tries to join the cluster.
> The new node does have an updated cluster.conf and so do all of the other
> nodes to reflect the new node joining. All nodes had their hosts file updated
> also so that they know about it's IP.
>
> Nov 13 09:59:32 compdev kernel: klogd 1.4.1, log source = /proc/kmsg started.
> Nov 13 10:03:40 compdev kernel: CMAN: node img62.domain.com rejoining
> Nov 13 10:03:42 compdev kernel: Unable to handle kernel paging request at
> virtual address 008c9689
> Nov 13 10:03:42 compdev kernel: printing eip:
> Nov 13 10:03:42 compdev kernel: e09e0d19
> Nov 13 10:03:42 compdev kernel: *pde = 00000000
> Nov 13 10:03:42 compdev kernel: Oops: 0000 [#1]
> Nov 13 10:03:42 compdev kernel: Modules linked in: autofs4 dlm(U) cman(U) md5
> ipv6 sunrpc dm_mirror uhci_hcd e100 mii floppy ext3 jbd dm_mod qla2200 qla2xxx
> scsi_transport_fc sd_mod scsi_mod
> Nov 13 10:03:42 compdev kernel: CPU: 0
> Nov 13 10:03:42 compdev kernel: EIP: 0060:[<e09e0d19>] Not tainted VLI
> Nov 13 10:03:42 compdev kernel: EFLAGS: 00010202 (2.6.9-42.0.10.EL.XOS.1)
> Nov 13 10:03:42 compdev kernel: EIP is at process_join_request+0x65/0x1ba
> [cman]
> Nov 13 10:03:42 compdev kernel: eax: 00000000 ebx: 008c9689 ecx: e09f20c0
> edx: dd439000
> Nov 13 10:03:42 compdev kernel: esi: 00006564 edi: 0000003a ebp: dd439f98
> esp: dd439f58
> Nov 13 10:03:42 compdev kernel: ds: 007b es: 007b ss: 0068
> Nov 13 10:03:42 compdev kernel: Process cman_serviced (pid: 2212,
> threadinfo=dd439000 task=de793340)
> Nov 13 10:03:42 compdev kernel: Stack: 00000000 d6f9c014 0000003e 00000000
> 00000000 00000000 00000000 00000000
> Nov 13 10:03:42 compdev kernel: 95eb1078 0003641b de750ae0 0000003e
> d6f9c000 dd439f98 e09de8a3 e09e1125
> Nov 13 10:03:42 compdev kernel: 00000001 00000000 00000000 00070000
> 61666564 06e57ac4 000000d9 de793340
> Nov 13 10:03:42 compdev kernel: Call Trace:
> Nov 13 10:03:42 compdev kernel: [<e09de8a3>] serviced+0x0/0x140 [cman]
> Nov 13 10:03:42 compdev kernel: [<e09e1125>] process_message+0x32/0x93 [cman]
> Nov 13 10:03:42 compdev kernel: [<e09e12a9>] process_messages+0x123/0x13e
>
Patrick
More information about the Linux-cluster
mailing list