[Linux-cluster] Problem in cluster with xen kernel

Nuno Fernandes npf at eurotux.com
Tue Apr 3 09:12:20 UTC 2007


Hi,

I'm using rhel5 default kernel and everything seems ok.

[root at xen1 ~]# clustat
Member Status: Quorate

  Member Name                        ID   Status
  ------ ----                        ---- ------
  xen1.dc.server.pt                      1 Online, Local
  xen2.dc.server.pt                      2 Online
  xen3.dc.server.pt                      3 Online

Later on, i reboot  xen3 to a Dom0 kernel and get in xen1 logs:

Dec 19 23:02:47 xen1 openais[2747]: [TOTEM] The token was lost in the 
OPERATIONAL state.
Dec 19 23:02:47 xen1 openais[2747]: [TOTEM] Receive multicast socket recv 
buffer size (262142 bytes).
Dec 19 23:02:47 xen1 openais[2747]: [TOTEM] Transmit multicast socket send 
buffer size (262142 bytes).
Dec 19 23:02:47 xen1 openais[2747]: [TOTEM] entering GATHER state from 2.

[root at xen1 ~]# Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] entering GATHER 
state from 0.
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Creating commit token because I am 
the rep.
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Saving state aru 2f high seq 
received 2f
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] entering COMMIT state.
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] position [0] member 172.16.40.107:
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] previous ring seq 84 rep 
172.16.40.107
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] aru 2f high delivered 2f received 
flag 0
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] position [1] member 172.16.40.108:
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] previous ring seq 84 rep 
172.16.40.107
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] aru 2f high delivered 2f received 
flag 0
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Did not need to originate any 
messages in recovery.
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Storing new sequence id for ring 
58
Dec 19 23:02:52 xen1 kernel: dlm: closing connection to node 3
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Sending initial ORF token
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:02:52 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:02:52 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.107
Dec 19 23:02:53 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.108
Dec 19 23:02:53 xen1 openais[2747]: [CPG  ] got joinlist message from node 2
Dec 19 23:02:53 xen1 openais[2747]: [CPG  ] got joinlist message from node 1

So far so good, xen3 is offline while it reboots...

[root at xen1 ~]# clustat
Member Status: Quorate

  Member Name                        ID   Status
  ------ ----                        ---- ------
  xen1.dc.server.pt                      1 Online, Local
  xen2.dc.server.pt                      2 Online
  xen3.dc.server.pt                      3 Offline

After it reboots i get node join in xen1 server logs:

Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] entering GATHER state from 11.
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Creating commit token because I am 
the rep.
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Saving state aru 17 high seq 
received 17
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] entering COMMIT state.
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] position [0] member 172.16.40.107:
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] previous ring seq 88 rep 
172.16.40.107
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] aru 17 high delivered 17 received 
flag 0
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] position [1] member 172.16.40.108:
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] previous ring seq 88 rep 
172.16.40.107
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] aru 17 high delivered 17 received 
flag 0
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] position [2] member 172.16.40.116:
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] previous ring seq 4 rep 
172.16.40.116
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] aru 9 high delivered 9 received 
flag 0
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Did not need to originate any 
messages in recovery.
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Storing new sequence id for ring 
5c
Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Sending initial ORF token
Dec 19 23:05:03 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:05:03 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:05:03 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:05:03 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:05:04 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
Dec 19 23:05:04 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:05:04 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.107
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.108
Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.116
Dec 19 23:05:04 xen1 openais[2747]: [CPG  ] got joinlist message from node 1
Dec 19 23:05:04 xen1 openais[2747]: [CPG  ] got joinlist message from node 2
Dec 19 23:05:12 xen1 kernel: dlm: connecting to 3
Dec 19 23:05:12 xen1 kernel: dlm: got connection from 3

Clustat also reports ok status:

[root at xen1 ~]# clustat
Member Status: Quorate

  Member Name                        ID   Status
  ------ ----                        ---- ------
  xen1.dc.server.pt                      1 Online, Local
  xen2.dc.server.pt                      2 Online
  xen3.dc.server.pt                      3 Online

Everything ok so far...

Next i reboot xen2. When xen2 leaves xen1 complains that it can speak with 
xen3 and fences it.

Dec 19 23:08:48 xen1 openais[2747]: [TOTEM] Retransmit List: 32
Dec 19 23:08:48 xen1 openais[2747]: [TOTEM] Retransmit List: 32
Dec 19 23:08:48 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:55 xen1 last message repeated 47 times
Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering GATHER state from 11.
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Creating commit token because I am 
the rep.
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Saving state aru 34 high seq 
received 34
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering COMMIT state.
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] position [0] member 172.16.40.107:
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] previous ring seq 92 rep 
172.16.40.107
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] aru 34 high delivered 34 received 
flag 0
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] position [1] member 172.16.40.108:
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] previous ring seq 92 rep 
172.16.40.107
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] aru 34 high delivered 34 received 
flag 0
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Did not need to originate any 
messages in recovery.
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Storing new sequence id for ring 
60
Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Sending initial ORF token
Dec 19 23:08:59 xen1 kernel: dlm: closing connection to node 3


Dec 19 23:08:59 xen1 fenced[2763]: xen3.dc.aeiou.pt not a cluster member after 
0 sec post_fail_delay



Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:09:00 xen1 fenced[2763]: xen2.dc.aeiou.pt not a cluster member after 
0 sec post_fail_delay
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:09:00 xen1 fenced[2763]: fencing node "xen3.dc.aeiou.pt"
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:09:00 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:09:00 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:09:00 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.107
Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.108
Dec 19 23:09:00 xen1 openais[2747]: [CPG  ] got joinlist message from node 2
Dec 19 23:09:00 xen1 openais[2747]: [CPG  ] got joinlist message from node 1
Dec 19 23:09:05 xen1 openais[2747]: [TOTEM] entering GATHER state from 11.
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] entering GATHER state from 0.
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] Creating commit token because I am 
the rep.
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] Saving state aru 1a high seq 
received 1a
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] entering COMMIT state.
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] position [0] member 172.16.40.107:
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] previous ring seq 96 rep 
172.16.40.107
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] aru 1a high delivered 1a received 
flag 0
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] position [1] member 172.16.40.116:
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] previous ring seq 92 rep 
172.16.40.107
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] aru 31 high delivered 31 received 
flag 0
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] Did not need to originate any 
messages in recovery.
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] Storing new sequence id for ring 
64
Dec 19 23:09:09 xen1 kernel: dlm: closing connection to node 2
Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] Sending initial ORF token
Dec 19 23:09:09 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:09:10 xen1 openais[2747]: [CMAN ] quorum lost, blocking activity
Dec 19 23:09:10 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
Dec 19 23:09:10 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:09:10 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
Dec 19 23:09:10 xen1 openais[2747]: [MAIN ] Node xen3.dc.aeiou.pt not joined 
to cman because it has rejoined an inquorate cluster
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.107
Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.116
Dec 19 23:09:10 xen1 openais[2747]: [CPG  ] got joinlist message from node 3
Dec 19 23:09:10 xen1 openais[2747]: [CPG  ] got joinlist message from node 1
Dec 19 23:09:14 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:14 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:09:19 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:19 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:09:24 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:24 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:09:29 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:29 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:09:34 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:34 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:09:36 xen1 openais[2747]: [TOTEM] The token was lost in the 
OPERATIONAL state.
Dec 19 23:09:36 xen1 openais[2747]: [TOTEM] Receive multicast socket recv 
buffer size (262142 bytes).
Dec 19 23:09:36 xen1 openais[2747]: [TOTEM] Transmit multicast socket send 
buffer size (262142 bytes).
Dec 19 23:09:36 xen1 openais[2747]: [TOTEM] entering GATHER state from 2.
Dec 19 23:09:39 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:39 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] entering GATHER state from 0.
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Creating commit token because I am 
the rep.
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Saving state aru 18 high seq 
received 18
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] entering COMMIT state.
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] position [0] member 172.16.40.107:
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] previous ring seq 100 rep 
172.16.40.107
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] aru 18 high delivered 18 received 
flag 0
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Did not need to originate any 
messages in recovery.
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Storing new sequence id for ring 
68
Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Sending initial ORF token
Dec 19 23:09:40 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:09:40 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:09:41 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] New Configuration:
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] Members Left:
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] Members Joined:
Dec 19 23:09:41 xen1 openais[2747]: [SYNC ] This node is within the primary 
component and will provide service.
Dec 19 23:09:41 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] got nodejoin message 172.16.40.107
Dec 19 23:09:41 xen1 openais[2747]: [CPG  ] got joinlist message from node 1
Dec 19 23:09:44 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:44 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:09:49 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:49 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:09:54 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:54 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:09:59 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:09:59 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:10:04 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:10:04 xen1 ccsd[2740]: Error while processing connect: Connection 
refused
Dec 19 23:10:09 xen1 ccsd[2740]: Cluster is not quorate.  Refusing connection.
Dec 19 23:10:09 xen1 ccsd[2740]: Error while processing connect: Connection 
refused

The last errors are ok because the cluster isn't quorate anymore. xen2 was 
rebooting and xen3 was fenced, so leaving xen1 alone creates an unquorate 
cluster...

The unusual thing is that it only happens when one of the nodes is using rhel5 
xen kernel. Maybe something in the bridge-utils bug and multicast? This 
problem happens if i reboot xen1 server with xen kernel or xen2 server.


Any intel?

Thanks
Nuno Fernandes



-- 
Nuno Pais Fernandes
Cisco Certified Network Associate
Oracle Certified Professional
Eurotux Informatica S.A.
Tel: +351 253257395
Fax: +351 253257396
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070403/08a89ea2/attachment.sig>


More information about the Linux-cluster mailing list