[Linux-cluster] Two-node cluster disconnecting

Mikko Partio mpartio at gmail.com
Thu Oct 11 04:39:31 UTC 2007


Hello list

I have a problem with a two-node cluster going split-brain. When I first
boot the other node, it correctly starts all the services and informs that
cluster is quorate. Then when I boot the other node, on the boot phase when
it starts the cluster software it does not find the node already running and
starts the same services already running on node 1! When the boot is
complete I can see that the nodes have found each other for a small period
of time but then immediately disconnect from each other. The cluster is
created with Conga with shared disk support though no shared disks are
created yet. This is on CentOS 5.

cluster.conf:

<?xml version="1.0"?>
<cluster alias="testcluster" config_version="11" name="testcluster">
        <fence_daemon clean_start="0" post_fail_delay="5"
post_join_delay="1200"/>
        <clusternodes>
                <clusternode name="hume" nodeid="1" votes="1">
                        <fence>
                                <method name="2">
                                        <device name="ilohume"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="kant" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="ilokant"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ilo" hostname="x.x.x.x" login="*"
name="ilohume" passwd="*"/>
                <fencedevice agent="fence_ilo" hostname="x.x.x.x" login="*"
name="ilokant" passwd="*"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
                <service autostart="1" exclusive="0" name="test"
recovery="relocate">
                        <script file="/etc/init.d/pgtest" name="pg"/>
                </service>
                <service autostart="1" exclusive="0" name="test2">
                        <script file="/etc/init.d/pgtest2" name="pg2"/>
                </service>
        </rm>
</cluster>

clustat & cman_tool status & cman_tool nodes on node already running:

$ sudo clustat
Member Status: Quorate

Member Name                        ID   Status
------ ----                        ---- ------
hume                           1 Online, Local, rgmanager
kant                           2 Offline

Service Name         Owner (Last)                   State
------- ----         ----- ------                   -----
service:test         hume                    started
service:test2        hume                    started


$ sudo cman_tool status
Version: 6.0.1
Config Version: 11
Cluster Name: testcluster
Cluster Id: 31540
Cluster Member: Yes
Cluster Generation: 32
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Quorum: 1
Active subsystems: 8
Flags: 2node
Ports Bound: 0 11 177
Node name: hume
Node ID: 1
Multicast addresses: 239.192.123.175
Node addresses: 193.166.192.100


$ sudo cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2007-10-10 14:58:53  hume
   2   X     28                        kant


Here's what gets logged in /var/log/messages

Oct 11 07:20:15 hume openais[2410]: [TOTEM] entering GATHER state from 9.
Oct 11 07:20:15 hume openais[2410]: [TOTEM] Creating commit token because I
am the rep.
Oct 11 07:20:15 hume openais[2410]: [TOTEM] Saving state aru 13bd2 high seq
received 13bd2
Oct 11 07:20:15 hume openais[2410]: [TOTEM] entering COMMIT state.
Oct 11 07:20:15 hume openais[2410]: [TOTEM] entering RECOVERY state.
Oct 11 07:20:15 hume openais[2410]: [TOTEM] position [0] member
193.166.192.100:
Oct 11 07:20:15 hume openais[2410]: [TOTEM] previous ring seq 24 rep
193.166.192.100
Oct 11 07:20:15 hume openais[2410]: [TOTEM] aru 13bd2 high delivered 13bd2
received flag 0
Oct 11 07:20:15 hume openais[2410]: [TOTEM] position [1] member
193.166.192.101:
Oct 11 07:20:15 hume openais[2410]: [TOTEM] previous ring seq 4 rep
193.166.192.101
Oct 11 07:20:15 hume openais[2410]: [TOTEM] aru 27 high delivered 27
received flag 0
Oct 11 07:20:15 hume openais[2410]: [TOTEM] Did not need to originate any
messages in recovery.
Oct 11 07:20:15 hume openais[2410]: [TOTEM] Storing new sequence id for ring
1c
Oct 11 07:20:15 hume kernel: dlm: connecting to 2
Oct 11 07:20:15 hume openais[2410]: [TOTEM] Sending initial ORF token
Oct 11 07:20:15 hume kernel: dlm: got connection from 2
Oct 11 07:20:15 hume openais[2410]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ] New Configuration:
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ]     r(0) ip(193.166.192.100)
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ] Members Left:
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ] Members Joined:
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ] New Configuration:
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ]     r(0) ip(193.166.192.100)
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ]     r(0) ip(193.166.192.101)
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ] Members Left:
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ] Members Joined:
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ]     r(0) ip(193.166.192.101)
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [TOTEM] entering OPERATIONAL state.
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ] got nodejoin message
193.166.192.100
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CLM  ] got nodejoin message
193.166.192.101
Oct 11 07:20:15 hume openais[2410]: [CPG  ] got joinlist message from node 1
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume openais[2410]: [CPG  ] got joinlist message from node 2
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:15 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:15 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: device eth0 entered promiscuous mode
Oct 11 07:20:16 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:16 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:17 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:17 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:17 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:17 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:17 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:17 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:18 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:18 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:18 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:18 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:18 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:18 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:19 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:19 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:19 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:19 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:20 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:20 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:20 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:20 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:21 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:21 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:21 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:21 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:22 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:22 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:22 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:22 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:23 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:23 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:23 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:23 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:24 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:24 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:25 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:25 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:25 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:25 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:25 hume kernel: device eth0 left promiscuous mode
Oct 11 07:20:26 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:26 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:27 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:27 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:27 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:27 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:28 hume kernel: device eth0 entered promiscuous mode
Oct 11 07:20:28 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:28 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:29 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:29 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:29 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:29 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:30 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:30 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:31 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:31 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:32 hume openais[2410]: [TOTEM] The token was lost in the
OPERATIONAL state.
Oct 11 07:20:32 hume openais[2410]: [TOTEM] Receive multicast socket recv
buffer size (262142 bytes).
Oct 11 07:20:32 hume openais[2410]: [TOTEM] Transmit multicast socket send
buffer size (262142 bytes).
Oct 11 07:20:32 hume openais[2410]: [TOTEM] entering GATHER state from 2.
Oct 11 07:20:32 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:32 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:32 hume kernel: device eth0 left promiscuous mode
Oct 11 07:20:33 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:33 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:34 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:34 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:34 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:34 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:35 hume kernel: dlm: lockspace 30002 from 2 type 1 not found
Oct 11 07:20:35 hume kernel: dlm: lockspace 20002 from 2 type 1 not found
Oct 11 07:20:36 hume kernel: dlm: connecting to 2
Oct 11 07:20:36 hume openais[2410]: [TOTEM] entering GATHER state from 0.
Oct 11 07:20:36 hume openais[2410]: [TOTEM] Creating commit token because I
am the rep.
Oct 11 07:20:36 hume openais[2410]: [TOTEM] Saving state aru 1d high seq
received 20
Oct 11 07:20:36 hume openais[2410]: [TOTEM] entering COMMIT state.
Oct 11 07:20:36 hume openais[2410]: [TOTEM] entering RECOVERY state.
Oct 11 07:20:36 hume openais[2410]: [TOTEM] position [0] member
193.166.192.100:
Oct 11 07:20:36 hume openais[2410]: [TOTEM] previous ring seq 28 rep
193.166.192.100
Oct 11 07:20:36 hume openais[2410]: [TOTEM] aru 1d high delivered 1d
received flag 0
Oct 11 07:20:36 hume openais[2410]: [TOTEM] copying all old ring messages
from 1e-20.
Oct 11 07:20:36 hume openais[2410]: [TOTEM] Originated 0 messages in
RECOVERY.
Oct 11 07:20:36 hume openais[2410]: [TOTEM] Originated for recovery:
Oct 11 07:20:36 hume openais[2410]: [TOTEM] Not Originated for recovery: 1e
1f 20
Oct 11 07:20:36 hume openais[2410]: [TOTEM] Storing new sequence id for ring
20
Oct 11 07:20:36 hume kernel: dlm: closing connection to node 2
Oct 11 07:20:36 hume openais[2410]: [TOTEM] Sending initial ORF token
Oct 11 07:20:37 hume openais[2410]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:20:37 hume openais[2410]: [CLM  ] New Configuration:
Oct 11 07:20:37 hume openais[2410]: [CLM  ]     r(0) ip(193.166.192.100)
Oct 11 07:20:37 hume openais[2410]: [CLM  ] Members Left:
Oct 11 07:20:37 hume openais[2410]: [CLM  ]     r(0) ip(193.166.192.101)
Oct 11 07:20:37 hume openais[2410]: [CLM  ] Members Joined:
Oct 11 07:20:37 hume openais[2410]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:20:37 hume openais[2410]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:20:37 hume openais[2410]: [CLM  ] New Configuration:
Oct 11 07:20:37 hume openais[2410]: [CLM  ]     r(0) ip(193.166.192.100)
Oct 11 07:20:37 hume openais[2410]: [CLM  ] Members Left:
Oct 11 07:20:37 hume openais[2410]: [CLM  ] Members Joined:
Oct 11 07:20:37 hume openais[2410]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:20:37 hume openais[2410]: [TOTEM] entering OPERATIONAL state.
Oct 11 07:20:37 hume openais[2410]: [CLM  ] got nodejoin message
193.166.192.100
Oct 11 07:20:37 hume openais[2410]: [CPG  ] got joinlist message from node 1


and on the other node:


Oct 11 07:20:16 kant openais[2411]: [TOTEM] entering GATHER state from 11.
Oct 11 07:20:16 kant openais[2411]: [TOTEM] Saving state aru 27 high seq
received 27
Oct 11 07:20:16 kant openais[2411]: [TOTEM] entering COMMIT state.
Oct 11 07:20:16 kant openais[2411]: [TOTEM] entering RECOVERY state.
Oct 11 07:20:16 kant openais[2411]: [TOTEM] position [0] member
193.166.192.100:
Oct 11 07:20:16 kant openais[2411]: [TOTEM] previous ring seq 24 rep
193.166.192.100
Oct 11 07:20:16 kant openais[2411]: [TOTEM] aru 13bd2 high delivered 13bd2
received flag 0
Oct 11 07:20:16 kant openais[2411]: [TOTEM] position [1] member
193.166.192.101:
Oct 11 07:20:16 kant openais[2411]: [TOTEM] previous ring seq 4 rep
193.166.192.101
Oct 11 07:20:16 kant openais[2411]: [TOTEM] aru 27 high delivered 27
received flag 0
Oct 11 07:20:16 kant openais[2411]: [TOTEM] Did not need to originate any
messages in recovery.
Oct 11 07:20:16 kant openais[2411]: [TOTEM] Storing new sequence id for ring
1c
Oct 11 07:20:16 kant openais[2411]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:20:16 kant kernel: dlm: connecting to 1
Oct 11 07:20:16 kant openais[2411]: [CLM  ] New Configuration:
Oct 11 07:20:16 kant kernel: dlm: got connection from 1
Oct 11 07:20:16 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.101)
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ] Members Left:
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ] Members Joined:
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ] New Configuration:
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.100)
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.101)
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ] Members Left:
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ] Members Joined:
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.100)
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [TOTEM] entering OPERATIONAL state.
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ] got nodejoin message
193.166.192.100
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CLM  ] got nodejoin message
193.166.192.101
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CPG  ] got joinlist message from node 1
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:16 kant openais[2411]: [CPG  ] got joinlist message from node 2
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:16 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:16 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:17 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:17 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:17 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:17 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:17 kant openais[2411]: [TOTEM] Retransmit List: 1e
Oct 11 07:20:17 kant openais[2411]: [TOTEM] Retransmit List: 1e
Oct 11 07:20:17 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f
Oct 11 07:20:17 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f
Oct 11 07:20:17 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:17 kant last message repeated 29 times
Oct 11 07:20:17 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:17 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:17 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:17 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:17 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:18 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:18 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:18 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:18 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:18 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:18 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:18 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:18 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:18 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:18 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:18 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:19 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:19 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:19 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:19 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:19 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:19 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:20 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:20 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:20 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:20 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:20 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:20 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:20 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:20 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:20 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:21 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:21 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:21 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:21 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:21 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:21 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:22 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:22 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:22 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:22 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:22 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:22 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:22 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:23 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:23 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:23 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:23 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:23 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:23 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:23 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:23 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:23 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:23 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:23 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:24 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:24 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:24 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:24 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:24 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:24 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:24 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:24 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:24 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:24 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:25 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:25 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:25 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:25 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:25 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:25 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:25 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:25 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:26 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:26 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:26 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:26 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:26 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:26 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:26 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:26 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:26 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:26 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:27 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:27 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:27 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:27 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:27 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:27 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:27 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:27 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:28 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:28 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:28 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:28 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:28 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:28 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:28 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:28 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:28 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:28 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:29 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:29 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:29 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:29 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:29 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:29 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:29 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:29 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:30 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:30 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:30 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:30 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:30 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:30 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:30 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:30 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:31 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:31 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:31 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:31 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:31 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:31 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:31 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:31 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:31 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:31 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:32 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:32 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:32 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:32 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:32 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:32 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:32 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:32 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:33 kant openais[2411]: [TOTEM] Retransmit List: 1e 1f 20
Oct 11 07:20:33 kant openais[2411]: [TOTEM] FAILED TO RECEIVE
Oct 11 07:20:33 kant openais[2411]: [TOTEM] entering GATHER state from 6.
Oct 11 07:20:33 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:33 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:34 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:34 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:35 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:35 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:36 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:36 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:37 kant kernel: dlm: lockspace 20001 from 1 type 1 not found
Oct 11 07:20:37 kant kernel: dlm: lockspace 30001 from 1 type 1 not found
Oct 11 07:20:37 kant openais[2411]: [TOTEM] entering GATHER state from 0.
Oct 11 07:20:37 kant openais[2411]: [TOTEM] Creating commit token because I
am the rep.
Oct 11 07:20:37 kant openais[2411]: [TOTEM] Saving state aru 20 high seq
received 20
Oct 11 07:20:37 kant openais[2411]: [TOTEM] entering COMMIT state.
Oct 11 07:20:37 kant openais[2411]: [TOTEM] entering RECOVERY state.
Oct 11 07:20:37 kant openais[2411]: [TOTEM] position [0] member
193.166.192.101:
Oct 11 07:20:37 kant openais[2411]: [TOTEM] previous ring seq 28 rep
193.166.192.100
Oct 11 07:20:37 kant openais[2411]: [TOTEM] aru 20 high delivered 20
received flag 0
Oct 11 07:20:37 kant openais[2411]: [TOTEM] Did not need to originate any
messages in recovery.
Oct 11 07:20:37 kant openais[2411]: [TOTEM] Storing new sequence id for ring
20
Oct 11 07:20:37 kant openais[2411]: [TOTEM] Sending initial ORF token
Oct 11 07:20:37 kant openais[2411]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:20:37 kant openais[2411]: [CLM  ] New Configuration:
Oct 11 07:20:37 kant kernel: dlm: closing connection to node 1
Oct 11 07:20:37 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.101)
Oct 11 07:20:37 kant kernel: dlm: connect from non cluster node
Oct 11 07:20:37 kant openais[2411]: [CLM  ] Members Left:
Oct 11 07:20:37 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.100)
Oct 11 07:20:38 kant openais[2411]: [CLM  ] Members Joined:
Oct 11 07:20:38 kant openais[2411]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:20:38 kant openais[2411]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:20:38 kant openais[2411]: [CLM  ] New Configuration:
Oct 11 07:20:38 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.101)
Oct 11 07:20:38 kant openais[2411]: [CLM  ] Members Left:
Oct 11 07:20:38 kant openais[2411]: [CLM  ] Members Joined:
Oct 11 07:20:38 kant openais[2411]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:20:38 kant openais[2411]: [TOTEM] entering OPERATIONAL state.
Oct 11 07:20:38 kant openais[2411]: [CLM  ] got nodejoin message
193.166.192.101
Oct 11 07:20:38 kant openais[2411]: [CPG  ] got joinlist message from node 2
Oct 11 07:21:31 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:55646
Oct 11 07:21:31 kant snmpd[2664]: Received SNMP packet(s) from UDP: [
193.166.218.61]:55646
Oct 11 07:21:31 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:55646
Oct 11 07:21:31 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:55647
Oct 11 07:21:31 kant snmpd[2664]: Received SNMP packet(s) from UDP: [
193.166.218.61]:55647
Oct 11 07:21:31 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:55647
Oct 11 07:21:31 kant last message repeated 2 times
Oct 11 07:21:31 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:55646
Oct 11 07:21:41 kant ntpd[2696]: synchronized to LOCAL(0), stratum 10
Oct 11 07:21:41 kant ntpd[2696]: kernel time sync enabled 0001
Oct 11 07:22:45 kant ntpd[2696]: synchronized to 193.166.211.70, stratum 2
Oct 11 07:26:35 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:56021
Oct 11 07:26:35 kant snmpd[2664]: Received SNMP packet(s) from UDP: [
193.166.218.61]:56021
Oct 11 07:26:35 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:56021
Oct 11 07:26:35 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:56022
Oct 11 07:26:35 kant snmpd[2664]: Received SNMP packet(s) from UDP: [
193.166.218.61]:56022
Oct 11 07:26:35 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:56022
Oct 11 07:26:35 kant last message repeated 2 times
Oct 11 07:26:35 kant snmpd[2664]: Connection from UDP: [193.166.218.61
]:56021
Oct 11 07:30:20 kant openais[2411]: [TOTEM] entering GATHER state from 11.
Oct 11 07:30:20 kant openais[2411]: [TOTEM] Saving state aru 14 high seq
received 14
Oct 11 07:30:20 kant openais[2411]: [TOTEM] entering COMMIT state.
Oct 11 07:30:20 kant openais[2411]: [TOTEM] entering RECOVERY state.
Oct 11 07:30:20 kant openais[2411]: [TOTEM] position [0] member
193.166.192.100:
Oct 11 07:30:20 kant openais[2411]: [TOTEM] previous ring seq 32 rep
193.166.192.100
Oct 11 07:30:20 kant openais[2411]: [TOTEM] aru 15 high delivered 15
received flag 0
Oct 11 07:30:20 kant openais[2411]: [TOTEM] position [1] member
193.166.192.101:
Oct 11 07:30:20 kant openais[2411]: [TOTEM] previous ring seq 32 rep
193.166.192.101
Oct 11 07:30:20 kant openais[2411]: [TOTEM] aru 14 high delivered 14
received flag 0
Oct 11 07:30:20 kant openais[2411]: [TOTEM] Did not need to originate any
messages in recovery.
Oct 11 07:30:20 kant openais[2411]: [TOTEM] Storing new sequence id for ring
24
Oct 11 07:30:20 kant openais[2411]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:30:20 kant openais[2411]: [CLM  ] New Configuration:
Oct 11 07:30:20 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.101)
Oct 11 07:30:20 kant openais[2411]: [CLM  ] Members Left:
Oct 11 07:30:20 kant openais[2411]: [CLM  ] Members Joined:
Oct 11 07:30:20 kant openais[2411]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:30:20 kant openais[2411]: [CLM  ] CLM CONFIGURATION CHANGE
Oct 11 07:30:20 kant openais[2411]: [CLM  ] New Configuration:
Oct 11 07:30:20 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.100)
Oct 11 07:30:20 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.101)
Oct 11 07:30:20 kant openais[2411]: [CLM  ] Members Left:
Oct 11 07:30:20 kant openais[2411]: [CLM  ] Members Joined:
Oct 11 07:30:20 kant openais[2411]: [CLM  ]     r(0) ip(193.166.192.100)
Oct 11 07:30:20 kant openais[2411]: [SYNC ] This node is within the primary
component and will provide service.
Oct 11 07:30:20 kant openais[2411]: [TOTEM] entering OPERATIONAL state.
Oct 11 07:30:20 kant openais[2411]: [MAIN ] Killing node hume because it has
rejoined the cluster without cman_tool join
Oct 11 07:30:20 kant openais[2411]: [CMAN ] cman killed by node 1 for reason
3
Oct 11 07:30:20 kant dlm_controld[2433]: cluster is down, exiting
Oct 11 07:30:20 kant kernel: dlm: closing connection to node 2
Oct 11 07:30:20 kant gfs_controld[2439]: groupd_dispatch error -1 errno 11
Oct 11 07:30:20 kant fenced[2427]: cluster is down, exiting
Oct 11 07:30:20 kant gfs_controld[2439]: groupd connection died
Oct 11 07:30:20 kant gfs_controld[2439]: cluster is down, exiting
Oct 11 07:30:47 kant ccsd[2403]: Unable to connect to cluster infrastructure
after 30 seconds.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20071011/9aad3b7f/attachment.htm>


More information about the Linux-cluster mailing list