[Linux-cluster] RHEL Cluster node fencing and cluster membership

Rajkumar, Anoop anoop_rajkumar at merck.com
Sun Jun 27 15:44:55 UTC 2010


Hi Kit
 
ntpd is running on both the systems. I removed following gfs and lvm
packages and my cluster is working perfectly now.
 
gfs2-utils-0.1.53-1.el5
kmod-gfs-0.1.31-3.el5
lvm2-cluster-2.02.40-7.el5
 
so basically as soon as gfs process starts after rpm is added i run into
that problem.
 
Below is the ccs_tool configuration from both the servers.
 
[root at usrylxap237 ~]# ccs_tool lsnode
 
Cluster name: cluster1, config_version: 33
 
Nodename                        Votes Nodeid Fencetype
usrylxap237.merck.com              1    1    usrylxap237r
usrylxap238.merck.com              1    2    usrylxap238r
[root at usrylxap237 ~]# ccs_tool lsfence
Name             Agent
usrylxap237r     fence_ilo
usrylxap238r     fence_ilo
 
[root at usrylxap238 ~]# ccs_tool lsnode
 
Cluster name: cluster1, config_version: 33
 
Nodename                        Votes Nodeid Fencetype
usrylxap237.merck.com              1    1    usrylxap237r
usrylxap238.merck.com              1    2    usrylxap238r
[root at usrylxap238 ~]# ccs_tool lsfence
Name             Agent
usrylxap237r     fence_ilo
usrylxap238r     fence_ilo
 
Below is the firewall configuration on both the servers.
 
 
[root at usrylxap237 ~]# iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     udp  --  anywhere             anywhere            udp
dpt:netsupport
ACCEPT     udp  --  anywhere             anywhere            udp
spt:netsupport
ACCEPT     udp  --  anywhere             anywhere            udp
dpt:50007
ACCEPT     udp  --  anywhere             anywhere            udp
spt:50007
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:21064
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:21064
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:50009
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:50009
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:50008
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:50008
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:50006
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:50006
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:41969
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:41969
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:41968
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:41968
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:41967
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:41967
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:41966
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:41966
 
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
 
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     udp  --  anywhere             anywhere            udp
spt:netsupport
ACCEPT     udp  --  anywhere             anywhere            udp
dpt:netsupport
ACCEPT     udp  --  anywhere             anywhere            udp
spt:50007
ACCEPT     udp  --  anywhere             anywhere            udp
dpt:50007
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:21064
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:21064
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:50009
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:50009
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:50008
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:50008
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:50006
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:50006
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:41969
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:41969
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:41968
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:41968
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:41967
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:41967
ACCEPT     tcp  --  anywhere             anywhere            tcp
spt:41966
ACCEPT     tcp  --  anywhere             anywhere            tcp
dpt:41966
 
Thanks
Anoop

________________________________

From: Kit Gerrits [mailto:kitgerrits at gmail.com] 
Sent: Sunday, June 27, 2010 6:35 AM
To: 'linux clustering'
Cc: Rajkumar, Anoop
Subject: RE: [Linux-cluster] RHEL Cluster node fencing and cluster
membership


Have you tried comparing the output of the cluster tools between the two
nodes?
 
Maybe the internal cluster services are not 'synchronised'
I have seen this on clusters with connection issues.
 
I'm not familiar enough with the messages to understand them exactly, 
  but my gut instinct tells me you temporarily have 2 seperate clusters
with 1 vote each.
My guess:
1/ the secundary node fails to join the cluster on the first node
2/ the secundary node starts its own cluster 
3/ the primary node sees the secundary node and says hello
4/ the secundary node and then fences the primary node 
 
Are both nodes running NTP (timing issues, log timestamps)
Are ther any firewalls or network issues? (multicast packets traveling
only one way)
 
 
Regards,
 
Kit
 
________________________________

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Rajkumar, Anoop
Sent: zaterdag 26 juni 2010 17:40
To: linux-cluster at redhat.com
Subject: [Linux-cluster] RHEL Cluster node fencing and cluster
membership



Hi 

I have two dl585 with shared storage from MSA 1000 in a two node rhel
5.3 cluster. Priority in cluster.conf are like below.

<failoverdomainnode name="usrylxap237.merck.com" priority="1"/> 
                                <failoverdomainnode
name="usrylxap238.merck.com" priority="2"/> 

Whenever lower priority node usrylxap238 Is rebooted it kills cman on
usrylxap237 (Higher priority node) and fence it causing reboot of it.
Message I see in /var/log/messages of higher priority node is 

Jun 26 11:02:36 usrylxap237 openais[4750]: [CMAN ] cman killed by node 2
because we rejoined the cluster without a full restart

Jun 26 11:03:57 usrylxap237 openais[27373]: [CMAN ] cman killed by node
1 because we were killed by cman_tool or other application

After reboot when higher priority node usrylxap237 comes up it tranfers
services from lower priority node to itself and everything works fine
for some time. Then I see following message in /var/log/messages of
higher priority node running services.

Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] The token was lost in
the OPERATIONAL state. 
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Receive multicast
socket recv buffer size (2880 
00 bytes). 
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Transmit multicast
socket send buffer size (288 
000 bytes). 
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
from 2. 
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Creating commit token
because I am the rep. 
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Saving state aru 17
high seq received 17 
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Storing new sequence
id for ring 420 
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] entering COMMIT
state. 
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] The token was lost in
the COMMIT state. 
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
from 4. 
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] Creating commit token
because I am the rep. 
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] Storing new sequence
id for ring 424 
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] entering COMMIT
state. 
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] The token was lost in
the COMMIT state. 
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
from 4. 
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] Creating commit token
because I am the rep. 
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] Storing new sequence
id for ring 428 
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] entering COMMIT
state. 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] entering RECOVERY
state. 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] position [0] member
54.3.254.237: 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] previous ring seq
1052 rep 54.3.254.237 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] aru 17 high delivered
17 received flag 1 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] position [1] member
54.3.254.238: 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] previous ring seq
1052 rep 54.3.254.237 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] aru 17 high delivered
17 received flag 1 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] Did not need to
originate any messages in recov 
ery. 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] Sending initial ORF
token 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] CLM CONFIGURATION
CHANGE 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] New Configuration: 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ]      r(0)
ip(54.3.254.237) 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ]      r(0)
ip(54.3.254.238) 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] Members Left: 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] Members Joined: 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] CLM CONFIGURATION
CHANGE 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] New Configuration: 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ]      r(0)
ip(54.3.254.237) 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ]      r(0)
ip(54.3.254.238) 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] Members Left: 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] Members Joined: 
Jun 26 09:24:54 usrylxap237 openais[5792]: [SYNC ] This node is within
the primary component and w 
ill provide service. 
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] entering OPERATIONAL
state. 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] got nodejoin message
54.3.254.237 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] got nodejoin message
54.3.254.238 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CPG  ] got joinlist message
from node 1 
Jun 26 09:24:54 usrylxap237 openais[5792]: [CPG  ] got joinlist message
from node 2 
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] The token was lost in
the OPERATIONAL state. 
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Receive multicast
socket recv buffer size (2880 
00 bytes). 
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Transmit multicast
socket send buffer size (288 
000 bytes). 
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
from 2. 
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Creating commit token
because I am the rep. 
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Saving state aru 17
high seq received 17 
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Storing new sequence
id for ring 42c 
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Creating commit token
because I am the rep. 
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Storing new sequence
id for ring 430 
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
from 13. 
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Creating commit token
because I am the rep. 
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Storing new sequence
id for ring 434 
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] Creating commit token
because I am the rep. 
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] Storing new sequence
id for ring 438 
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
from 13. 
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] Creating commit token
because I am the rep. 


On the second node I can see 

Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
from 12. 
Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] Saving state aru 17
high seq received 17 
Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] Storing new sequence
id for ring 420 
Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] entering COMMIT
state. 
Jun 26 09:24:36 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
from 13. 
Jun 26 09:24:36 usrylxap238 openais[5725]: [TOTEM] Storing new sequence
id for ring 424 
Jun 26 09:24:36 usrylxap238 openais[5725]: [TOTEM] entering COMMIT
state. 
Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] The token was lost in
the COMMIT state. 
Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
from 4. 
Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] Storing new sequence
id for ring 428 
Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] entering COMMIT
state. 
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] entering RECOVERY
state. 
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] position [0] member
54.3.254.237: 
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq
1052 rep 54.3.254.237 
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered
17 received flag 1 
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] position [1] member
54.3.254.238: 
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq
1052 rep 54.3.254.237 
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered
17 received flag 1 
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] Did not need to
originate any messages in re 
covery. 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] CLM CONFIGURATION
CHANGE 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] New Configuration: 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
ip(54.3.254.237) 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
ip(54.3.254.238) 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] Members Left: 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] Members Joined: 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] CLM CONFIGURATION
CHANGE 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] New Configuration: 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
ip(54.3.254.237) 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
ip(54.3.254.238) 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] Members Left: 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] Members Joined: 
Jun 26 09:24:54 usrylxap238 openais[5725]: [SYNC ] This node is within
the primary component an 
d will provide service. 
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] entering OPERATIONAL
state. 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] got nodejoin message
54.3.254.237 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] got nodejoin message
54.3.254.238 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CPG  ] got joinlist message
from node 1 
Jun 26 09:24:54 usrylxap238 openais[5725]: [CPG  ] got joinlist message
from node 2 
Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
from 12. 
Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] Saving state aru 17
high seq received 17 
Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] Storing new sequence
id for ring 42c 
Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] The token was lost in
the COMMIT state. 
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
from 4. 
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] Storing new sequence
id for ring 430 
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
from 13. 
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] Storing new sequence
id for ring 434 
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] The token was lost in
the COMMIT state. 
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
from 4. 
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] Storing new sequence
id for ring 438 
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
from 13. 
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] Storing new sequence
id for ring 43c 
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] The token was lost in
the COMMIT state. 
Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
from 4. 
Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] Storing new sequence
id for ring 440 
Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] entering COMMIT
state. 
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] entering RECOVERY
state. 
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] position [0] member
54.3.254.237: 
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq
1064 rep 54.3.254.237 
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered
17 received flag 1 
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] position [1] member
54.3.254.238: 
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq
1064 rep 54.3.254.237 
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered
17 received flag 1 
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] Did not need to
originate any messages in re 
covery. 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] CLM CONFIGURATION
CHANGE 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] New Configuration: 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
ip(54.3.254.237) 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
ip(54.3.254.238) 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] Members Left: 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] Members Joined: 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] CLM CONFIGURATION
CHANGE 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] New Configuration: 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
ip(54.3.254.237) 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
ip(54.3.254.238) 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] Members Left: 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] Members Joined: 
Jun 26 09:25:54 usrylxap238 openais[5725]: [SYNC ] This node is within
the primary component an 
d will provide service. 
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] entering OPERATIONAL
state. 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] got nodejoin message
54.3.254.237 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] got nodejoin message
54.3.254.238 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CPG  ] got joinlist message
from node 1 
Jun 26 09:25:54 usrylxap238 openais[5725]: [CPG  ] got joinlist message
from node 2 

Now my cluster is messed up. Even though clustat and cman_tool show
everything is fine. As I can not move services between the node (they
are running fine on present node). It even does not give any error
message when I try to move them using clusvcadm.

[root at usrylxap238 ~]# clustat 
Cluster Status for cluster1 @ Sat Jun 26 11:25:12 2010 
Member Status: Quorate 

 Member Name                             ID   Status 
 ------ ----                             ---- ------ 
 usrylxap237.merck.com                       1 Online, rgmanager 
 usrylxap238.merck.com                       2 Online, Local, rgmanager 

 Service Name                   Owner (Last)                   State 
 ------- ----                   ----- ------                   ----- 
 service:http-service           usrylxap237.merck.com          started 
 service:mysql                  usrylxap237.merck.com          started 
[root at usrylxap238 ~]# cman_tool status 
Version: 6.1.0 
Config Version: 32 
Cluster Name: cluster1 
Cluster Id: 26777 
Cluster Member: Yes 
Cluster Generation: 1276 
Membership state: Cluster-Member 
Nodes: 2 
Expected votes: 1 
Total votes: 2 
Quorum: 1 
Active subsystems: 9 
Flags: 2node Dirty 
Ports Bound: 0 11 177 
Node name: usrylxap238.merck.com 
Node ID: 2 
Multicast addresses: 239.192.104.2 
Node addresses: 54.3.254.238 

I have clvmd running with locking_type = 3 and gfs2 file system mounted
(using dlm) which now is hanging on higher priority node but is fine on
lower priority node (Which seems is not part of cluster now).

[root at usrylxap237 ~]# service gfs2 status 
Active GFS2 mountpoints: 
/oracluster1 

[root at usrylxap238 ~]# service gfs2 status 
Configured GFS2 mountpoints: 
/oracluster1 
Active GFS2 mountpoints: 
/oracluster1 

Not sure why cluster is loosing membership and getting staled and GFS
file system is not accessible. 

Thanks 
Anoop 

Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you
are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.829 / Virus Database: 271.1.1/2963 - Release Date: 06/26/10
08:35:00


Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100627/c65249a1/attachment.htm>


More information about the Linux-cluster mailing list