[Linux-cluster] Split Brain
Luis Godoy Gonzalez
lgodoy at atichile.com
Thu Jan 31 18:13:17 UTC 2008
Hi
I have a problem with mi current cluster.
we have a 2 node cluster ( DL385 G2 we not have external storage) with
RedHat 4 u5 and cluster suite 4 u5
When the nodes loose comunication, we get 2 cluster instances with de
service up in both :( .. too bad.
I don't undertand why one node not try to fence the other node, before
form the cluster.
this are the log:
==================================================
Jan 20 22:17:42 node1 kernel: bonding: bond0: link status definitely up
for interface eth0.
Jan 20 22:17:48 node1 clurgmgrd: [4081]: <info> Executing
/home/app/myservice.sh status
Jan 20 22:17:48 node1 su(pam_unix)[11307]: session opened for user
app_usr by (uid=0)
Jan 20 22:17:48 node1 su(pam_unix)[11307]: session closed for user app_usr
Jan 20 22:18:18 node1 clurgmgrd: [4081]: <info> Executing
/home/app/myservice.sh status
Jan 20 22:18:18 node1 su(pam_unix)[11533]: session opened for user
app_usr by (uid=0)
Jan 20 22:18:18 node1 su(pam_unix)[11533]: session closed for user app_usr
Jan 20 22:18:33 node1 kernel: e1000: eth2: e1000_watchdog_task: NIC Link
is Down
Jan 20 22:18:33 node1 kernel: bonding: bond0: link status definitely
down for interface eth2, disabling it
Jan 20 22:18:33 node1 kernel: bonding: bond0: making interface eth0 the
new active one.
Jan 20 22:18:37 node1 kernel: e1000: eth2: e1000_watchdog_task: NIC Link
is Up 1000 Mbps Full Duplex
Jan 20 22:18:37 node1 kernel: bonding: bond0: link status definitely up
for interface eth2.
Jan 20 22:18:43 node1 kernel: bnx2: eth0 NIC Link is Down
Jan 20 22:18:43 node1 kernel: bonding: bond0: link status definitely
down for interface eth0, disabling it
Jan 20 22:18:43 node1 kernel: bonding: bond0: making interface eth2 the
new active one.
Jan 20 22:18:46 node1 kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full
duplex
Jan 20 22:18:46 node1 kernel: bonding: bond0: link status definitely up
for interface eth0.
Jan 20 22:19:03 node1 kernel: CMAN: removing node node2 from the cluster
: Missed too many heartbeats
Jan 20 22:19:05 node1 clurgmgrd[4081]: <info> Magma Event: Membership Change
Jan 20 22:19:05 node1 clurgmgrd[4081]: <info> State change: node2 DOWN
Jan 20 22:19:06 node1 clurgmgrd: [4081]: <info> Executing
/home/app/myservice.sh status
Jan 20 22:19:06 node1 su(pam_unix)[11780]: session opened for user
app_usr by (uid=0)
Jan 20 22:19:06 node1 su(pam_unix)[11780]: session closed for user app_usr
Jan 20 22:19:22 node1 kernel: bnx2: eth0 NIC Link is Down
Jan 20 22:19:22 node1 kernel: bonding: bond0: link status definitely
down for interface eth0, disabling it
Jan 20 22:19:25 node1 kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full
duplex
Jan 20 22:19:25 node1 kernel: bonding: bond0: link status definitely up
for interface eth0.
Jan 20 22:19:40 node1 clurgmgrd: [4081]: <info> Executing
/home/app/myservice.sh status
Jan 20 22:19:40 node1 su(pam_unix)[12037]: session opened for user
app_usr by (uid=0)
Jan 20 22:19:40 node1 su(pam_unix)[12037]: session closed for user app_usr
Jan 20 22:20:10 node1 clurgmgrd: [4081]: <info> Executing
/home/app/myservice.sh status
Jan 20 22:20:10 node1 su(pam_unix)[12236]: session opened for user
app_usr by (uid=0)
Jan 20 22:20:10 node1 su(pam_unix)[12236]: session closed for user app_usr
Jan 20 22:20:40 node1 clurgmgrd: [4081]: <info> Executing
/home/app/myservice.sh status
Jan 20 22:20:40 node1 su(pam_unix)[12461]: session opened for user
app_usr by (uid=0)
Jan 20 22:20:40 node1 su(pam_unix)[12461]: session closed for user app_usr
=====================================================================
Node 2
=====================================================================
Jan 20 22:10:22 node2 sshd(pam_unix)[22703]: session opened for user
app_usr by (uid=0)
Jan 20 22:10:22 node2 sshd(pam_unix)[22703]: session closed for user app_usr
Jan 20 22:10:24 node2 sshd(pam_unix)[22741]: session opened for user
app_usr by (uid=0)
Jan 20 22:10:24 node2 sshd(pam_unix)[22741]: session closed for user app_usr
Jan 20 22:20:07 node2 sshd(pam_unix)[23541]: session opened for user
app_usr by (uid=0)
Jan 20 22:20:07 node2 sshd(pam_unix)[23541]: session closed for user app_usr
Jan 20 22:20:09 node2 sshd(pam_unix)[23578]: session opened for user
app_usr by (uid=0)
Jan 20 22:20:09 node2 sshd(pam_unix)[23578]: session closed for user app_usr
Jan 20 22:21:38 node2 kernel: CMAN: removing node node1 from the cluster
: Missed too many heartbeats
Jan 20 22:21:40 node2 clurgmgrd[4177]: <info> Magma Event: Membership Change
Jan 20 22:21:40 node2 clurgmgrd[4177]: <info> State change: node1 DOWN
Jan 20 22:21:41 node2 clurgmgrd[4177]: <notice> Taking over service
myservice from down member (null)
Jan 20 22:21:41 node2 clurgmgrd: [4177]: <info> Adding IPv4 address
10.10.65.1 to bond0
Jan 20 22:21:42 node2 clurgmgrd: [4177]: <info> Adding IPv4 address
10.10.65.10 to bond0
Jan 20 22:21:43 node2 clurgmgrd: [4177]: <info> Executing
/home/app/myservice.sh start
Jan 20 22:21:43 node2 su(pam_unix)[23855]: session opened for user
app_usr by (uid=0)
Jan 20 22:21:43 node2 su(pam_unix)[23855]: session closed for user app_usr
Jan 20 22:21:43 node2 clurgmgrd: [4177]: <info> Adding IPv4 address
10.10.70.20 to bond1
Jan 20 22:21:44 node2 clurgmgrd[4177]: <notice> Service myservice started
Jan 20 22:21:50 node2 clurgmgrd: [4177]: <info> Executing
/home/app/myservice.sh status
Jan 20 22:21:50 node2 su(pam_unix)[24022]: session opened for user
app_usr by (uid=0)
Jan 20 22:21:50 node2 su(pam_unix)[24022]: session closed for user app_usr
Jan 20 22:22:20 node2 clurgmgrd: [4177]: <info> Executing
/home/app/myservice.sh status
Jan 20 22:22:20 node2 su(pam_unix)[24244]: session opened for user
app_usr by (uid=0)
Jan 20 22:22:20 node2 su(pam_unix)[24244]: session closed for user app_usr
Jan 20 22:22:50 node2 clurgmgrd: [4177]: <info> Executing
/home/app/myservice.sh status
Jan 20 22:22:50 node2 su(pam_unix)[24469]: session opened for user
app_usr by (uid=0)
=================================================================
I have configured the fences device and the power off work fine ....
when I power up the machine the first en startup "fenced" the other and
startup continue ok
Any help will by apreciated ..
Sorry for my bad inglish
Luis G.
More information about the Linux-cluster
mailing list