[Linux-cluster] Cannot add nodes

isplist at logicore.net isplist at logicore.net
Wed Nov 14 17:39:09 UTC 2007


>> What should I be looking for to post here?
>> 
> The exact detail of any kernel panic you are seeing .. ALL the text.
> and then the obvious stuff: cluster.conf file, version numbers of all
> cluster software, distribution and where you got them from.
> Copies of things in /proc/cluster are always helpful too, if you can get
> them from any running node (please say which node).

That's a lot of info :). Got some of it at least;

ccs 1.0.7-0.XOS.1
cman 1.0.11-0.XOS.1
cman-kernel 2.6.9-36.0.XOS.1
cman-kernel 2.6.9-45.8.XOS.1
cman-kernel 2.6.9-45.15.XOS.1
fence 1.32.25-1.XOS.1
lvm2-cluster 2.02.06-7.0.RHEL4.XOS.1
magma 1.0.6-0.XOS.1
magma-plugins 1.0.9-0.XOS.1
piranha 0.8.2-1.XOS.1
system-config-cluster 1.0.27-1.0.XOS.1

And yes, I know I'm running old versions but all of the nodes are running the
same things and it works fine for me, cept for this new problem :). Now, as I
posted this, it does dawn on me that the new node (img62) would have newer 
versions of all of the above installed. Would this be the cause? Should I 
upgrade all nodes to the latest versions?

This is the kernel panic from .58 when .62 (img62) tries to join the cluster.
The new node does have an updated cluster.conf and so do all of the other 
nodes to reflect the new node joining. All nodes had their hosts file updated
also so that they know about it's IP.

Nov 13 09:59:32 compdev kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 13 10:03:40 compdev kernel: CMAN: node img62.domain.com rejoining
Nov 13 10:03:42 compdev kernel: Unable to handle kernel paging request at 
virtual address 008c9689
Nov 13 10:03:42 compdev kernel:  printing eip:
Nov 13 10:03:42 compdev kernel: e09e0d19
Nov 13 10:03:42 compdev kernel: *pde = 00000000
Nov 13 10:03:42 compdev kernel: Oops: 0000 [#1]
Nov 13 10:03:42 compdev kernel: Modules linked in: autofs4 dlm(U) cman(U) md5
ipv6 sunrpc dm_mirror uhci_hcd e100 mii floppy ext3 jbd dm_mod qla2200 qla2xxx 
scsi_transport_fc sd_mod scsi_mod
Nov 13 10:03:42 compdev kernel: CPU:    0
Nov 13 10:03:42 compdev kernel: EIP:    0060:[<e09e0d19>]    Not tainted VLI
Nov 13 10:03:42 compdev kernel: EFLAGS: 00010202   (2.6.9-42.0.10.EL.XOS.1)
Nov 13 10:03:42 compdev kernel: EIP is at process_join_request+0x65/0x1ba 
[cman]
Nov 13 10:03:42 compdev kernel: eax: 00000000   ebx: 008c9689   ecx: e09f20c0
  edx: dd439000
Nov 13 10:03:42 compdev kernel: esi: 00006564   edi: 0000003a   ebp: dd439f98
  esp: dd439f58
Nov 13 10:03:42 compdev kernel: ds: 007b   es: 007b   ss: 0068
Nov 13 10:03:42 compdev kernel: Process cman_serviced (pid: 2212, 
threadinfo=dd439000 task=de793340)
Nov 13 10:03:42 compdev kernel: Stack: 00000000 d6f9c014 0000003e 00000000 
00000000 00000000 00000000 00000000
Nov 13 10:03:42 compdev kernel:        95eb1078 0003641b de750ae0 0000003e 
d6f9c000 dd439f98 e09de8a3 e09e1125
Nov 13 10:03:42 compdev kernel:        00000001 00000000 00000000 00070000 
61666564 06e57ac4 000000d9 de793340
Nov 13 10:03:42 compdev kernel: Call Trace:
Nov 13 10:03:42 compdev kernel:  [<e09de8a3>] serviced+0x0/0x140 [cman]
Nov 13 10:03:42 compdev kernel:  [<e09e1125>] process_message+0x32/0x93 [cman]
Nov 13 10:03:42 compdev kernel:  [<e09e12a9>] process_messages+0x123/0x13e 
[cman]
Nov 13 10:03:42 compdev kernel:  [<e09de8ce>] serviced+0x2b/0x140 [cman]
Nov 13 10:03:42 compdev kernel:  [<c013cc2d>] kthread+0x69/0x91
Nov 13 10:03:42 compdev kernel:  [<c013cbc4>] kthread+0x0/0x91
Nov 13 10:03:42 compdev kernel:  [<c01041dd>] kernel_thread_helper+0x5/0xb
Nov 13 10:03:42 compdev kernel: Code: 74 df e8 1e 69 93 df b9 c0 20 9f e0 ff 
0d c0 20 9f e0 0f 88 e9 08 00 00 8b 3d 6c 1f 9f e0 39 7c 24 08 74 3d 8b 1c f5
a0 20 9f e0 <8b> 03 0f 18 00 90 8d 04 f5 a0 20 9f e0 39 c3 74 25 0f b7 45 12
Nov 13 10:03:42 compdev kernel:  <0>Fatal exception: panic in 5 seconds

cluster.conf;

<cluster config_version="80" name="vgcomp">
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
    <clusternodes>
        <clusternode name="compdev.domain.com" votes="1" nodeid="58">
                <fence>
                    <method name="1">
                        <device name="brocade215" port="2"/>
                    </method>
                </fence>
        </clusternode>
        <clusternode name="cweb92.domain.com" votes="1" nodeid="92">
                <fence>
                    <method name="1">
                        <device name="brocade215" port="3"/>
                    </method>
                </fence>
        </clusternode>
        <clusternode name="cweb93.domain.com" votes="1" nodeid="93">
                <fence>
                    <method name="1">
                        <device name="brocade215" port="4"/>
                    </method>
                </fence>
        </clusternode>
        <clusternode name="cweb94.domain.com" votes="1" nodeid="94">
                <fence>
                    <method name="1">
                        <device name="brocade215" port="5"/>
                    </method>
                </fence>
        </clusternode>
        <clusternode name="img62.domain.com" votes="1" nodeid="62">
                <fence>
                    <method name="1">
                        <device name="brocade215" port="7"/>
                    </method>
                </fence>
        </clusternode>
    </clusternodes>
<fencedevices>
    <fencedevice agent="fence_brocade" ipaddr="192.168.1.215" login="xxx" 
name="brocade215" passwd="xxxx
s"/>
</fencedevices>
</cluster>







More information about the Linux-cluster mailing list