[Linux-cluster] Problem in cluster with xen kernel

Nuno Fernandes npf at eurotux.com
Tue Apr 3 10:12:30 UTC 2007


Hi,

Just for your information we've solved it. It was a problem in the xen bridge 
scripts that restarted network interfaces while the cluster is active.

Changing /etc/xen/xend-config.sxp line

(network-script network-bridge)

to

(network-script /bin/true)

and creating the bridge in /etc/sysconfig/network-scripts/ifcfg-* files 
solved.

Thanks
Nuno Fernandes

On Tuesday 03 April 2007 10:12:20 Nuno Fernandes wrote:
> Hi,
>
> I'm using rhel5 default kernel and everything seems ok.
>
> [root at xen1 ~]# clustat
> Member Status: Quorate
>
>   Member Name                        ID   Status
>   ------ ----                        ---- ------
>   xen1.dc.server.pt                      1 Online, Local
>   xen2.dc.server.pt                      2 Online
>   xen3.dc.server.pt                      3 Online
>
> Later on, i reboot  xen3 to a Dom0 kernel and get in xen1 logs:
>
> Dec 19 23:02:47 xen1 openais[2747]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Dec 19 23:02:47 xen1 openais[2747]: [TOTEM] Receive multicast socket recv
> buffer size (262142 bytes).
> Dec 19 23:02:47 xen1 openais[2747]: [TOTEM] Transmit multicast socket send
> buffer size (262142 bytes).
> Dec 19 23:02:47 xen1 openais[2747]: [TOTEM] entering GATHER state from 2.
>
> [root at xen1 ~]# Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] entering GATHER
> state from 0.
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Creating commit token because I
> am the rep.
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Saving state aru 2f high seq
> received 2f
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] entering COMMIT state.
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] position [0] member
> 172.16.40.107: Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] previous ring
> seq 84 rep 172.16.40.107
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] aru 2f high delivered 2f
> received flag 0
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] position [1] member
> 172.16.40.108: Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] previous ring
> seq 84 rep 172.16.40.107
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] aru 2f high delivered 2f
> received flag 0
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Did not need to originate any
> messages in recovery.
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Storing new sequence id for
> ring 58
> Dec 19 23:02:52 xen1 kernel: dlm: closing connection to node 3
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] Sending initial ORF token
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:02:52 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:02:52 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:02:52 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
> Dec 19 23:02:52 xen1 openais[2747]: [CLM  ] got nodejoin message
> 172.16.40.107 Dec 19 23:02:53 xen1 openais[2747]: [CLM  ] got nodejoin
> message 172.16.40.108 Dec 19 23:02:53 xen1 openais[2747]: [CPG  ] got
> joinlist message from node 2 Dec 19 23:02:53 xen1 openais[2747]: [CPG  ]
> got joinlist message from node 1
>
> So far so good, xen3 is offline while it reboots...
>
> [root at xen1 ~]# clustat
> Member Status: Quorate
>
>   Member Name                        ID   Status
>   ------ ----                        ---- ------
>   xen1.dc.server.pt                      1 Online, Local
>   xen2.dc.server.pt                      2 Online
>   xen3.dc.server.pt                      3 Offline
>
> After it reboots i get node join in xen1 server logs:
>
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] entering GATHER state from 11.
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Creating commit token because I
> am the rep.
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Saving state aru 17 high seq
> received 17
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] entering COMMIT state.
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] position [0] member
> 172.16.40.107: Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] previous ring
> seq 88 rep 172.16.40.107
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] aru 17 high delivered 17
> received flag 0
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] position [1] member
> 172.16.40.108: Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] previous ring
> seq 88 rep 172.16.40.107
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] aru 17 high delivered 17
> received flag 0
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] position [2] member
> 172.16.40.116: Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] previous ring
> seq 4 rep 172.16.40.116
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] aru 9 high delivered 9 received
> flag 0
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Did not need to originate any
> messages in recovery.
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Storing new sequence id for
> ring 5c
> Dec 19 23:05:03 xen1 openais[2747]: [TOTEM] Sending initial ORF token
> Dec 19 23:05:03 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:05:03 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:05:03 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:05:03 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:05:04 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
> Dec 19 23:05:04 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:05:04 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
> Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] got nodejoin message
> 172.16.40.107 Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] got nodejoin
> message 172.16.40.108 Dec 19 23:05:04 xen1 openais[2747]: [CLM  ] got
> nodejoin message 172.16.40.116 Dec 19 23:05:04 xen1 openais[2747]: [CPG  ]
> got joinlist message from node 1 Dec 19 23:05:04 xen1 openais[2747]: [CPG 
> ] got joinlist message from node 2 Dec 19 23:05:12 xen1 kernel: dlm:
> connecting to 3
> Dec 19 23:05:12 xen1 kernel: dlm: got connection from 3
>
> Clustat also reports ok status:
>
> [root at xen1 ~]# clustat
> Member Status: Quorate
>
>   Member Name                        ID   Status
>   ------ ----                        ---- ------
>   xen1.dc.server.pt                      1 Online, Local
>   xen2.dc.server.pt                      2 Online
>   xen3.dc.server.pt                      3 Online
>
> Everything ok so far...
>
> Next i reboot xen2. When xen2 leaves xen1 complains that it can speak with
> xen3 and fences it.
>
> Dec 19 23:08:48 xen1 openais[2747]: [TOTEM] Retransmit List: 32
> Dec 19 23:08:48 xen1 openais[2747]: [TOTEM] Retransmit List: 32
> Dec 19 23:08:48 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:55 xen1 last message repeated 47 times
> Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:55 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:56 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:57 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:58 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Retransmit List: 32 33 34
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] FAILED TO RECEIVE
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering GATHER state from 6.
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering GATHER state from 11.
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Creating commit token because I
> am the rep.
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Saving state aru 34 high seq
> received 34
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering COMMIT state.
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] position [0] member
> 172.16.40.107: Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] previous ring
> seq 92 rep 172.16.40.107
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] aru 34 high delivered 34
> received flag 0
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] position [1] member
> 172.16.40.108: Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] previous ring
> seq 92 rep 172.16.40.107
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] aru 34 high delivered 34
> received flag 0
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Did not need to originate any
> messages in recovery.
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Storing new sequence id for
> ring 60
> Dec 19 23:08:59 xen1 openais[2747]: [TOTEM] Sending initial ORF token
> Dec 19 23:08:59 xen1 kernel: dlm: closing connection to node 3
>
>
> Dec 19 23:08:59 xen1 fenced[2763]: xen3.dc.aeiou.pt not a cluster member
> after 0 sec post_fail_delay
>
>
>
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:09:00 xen1 fenced[2763]: xen2.dc.aeiou.pt not a cluster member
> after 0 sec post_fail_delay
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:09:00 xen1 fenced[2763]: fencing node "xen3.dc.aeiou.pt"
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:09:00 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:09:00 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:09:00 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
> Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] got nodejoin message
> 172.16.40.107 Dec 19 23:09:00 xen1 openais[2747]: [CLM  ] got nodejoin
> message 172.16.40.108 Dec 19 23:09:00 xen1 openais[2747]: [CPG  ] got
> joinlist message from node 2 Dec 19 23:09:00 xen1 openais[2747]: [CPG  ]
> got joinlist message from node 1 Dec 19 23:09:05 xen1 openais[2747]:
> [TOTEM] entering GATHER state from 11. Dec 19 23:09:09 xen1 openais[2747]:
> [TOTEM] entering GATHER state from 0. Dec 19 23:09:09 xen1 openais[2747]:
> [TOTEM] Creating commit token because I am the rep.
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] Saving state aru 1a high seq
> received 1a
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] entering COMMIT state.
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] position [0] member
> 172.16.40.107: Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] previous ring
> seq 96 rep 172.16.40.107
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] aru 1a high delivered 1a
> received flag 0
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] position [1] member
> 172.16.40.116: Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] previous ring
> seq 92 rep 172.16.40.107
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] aru 31 high delivered 31
> received flag 0
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] Did not need to originate any
> messages in recovery.
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] Storing new sequence id for
> ring 64
> Dec 19 23:09:09 xen1 kernel: dlm: closing connection to node 2
> Dec 19 23:09:09 xen1 openais[2747]: [TOTEM] Sending initial ORF token
> Dec 19 23:09:09 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.108)
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:09:10 xen1 openais[2747]: [CMAN ] quorum lost, blocking activity
> Dec 19 23:09:10 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
> Dec 19 23:09:10 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:09:10 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
> Dec 19 23:09:10 xen1 openais[2747]: [MAIN ] Node xen3.dc.aeiou.pt not
> joined to cman because it has rejoined an inquorate cluster
> Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] got nodejoin message
> 172.16.40.107 Dec 19 23:09:10 xen1 openais[2747]: [CLM  ] got nodejoin
> message 172.16.40.116 Dec 19 23:09:10 xen1 openais[2747]: [CPG  ] got
> joinlist message from node 3 Dec 19 23:09:10 xen1 openais[2747]: [CPG  ]
> got joinlist message from node 1 Dec 19 23:09:14 xen1 ccsd[2740]: Cluster
> is not quorate.  Refusing connection. Dec 19 23:09:14 xen1 ccsd[2740]:
> Error while processing connect: Connection refused
> Dec 19 23:09:19 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:09:19 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
> Dec 19 23:09:24 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:09:24 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
> Dec 19 23:09:29 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:09:29 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
> Dec 19 23:09:34 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:09:34 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
> Dec 19 23:09:36 xen1 openais[2747]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Dec 19 23:09:36 xen1 openais[2747]: [TOTEM] Receive multicast socket recv
> buffer size (262142 bytes).
> Dec 19 23:09:36 xen1 openais[2747]: [TOTEM] Transmit multicast socket send
> buffer size (262142 bytes).
> Dec 19 23:09:36 xen1 openais[2747]: [TOTEM] entering GATHER state from 2.
> Dec 19 23:09:39 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:09:39 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] entering GATHER state from 0.
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Creating commit token because I
> am the rep.
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Saving state aru 18 high seq
> received 18
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] entering COMMIT state.
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] entering RECOVERY state.
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] position [0] member
> 172.16.40.107: Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] previous ring
> seq 100 rep 172.16.40.107
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] aru 18 high delivered 18
> received flag 0
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Did not need to originate any
> messages in recovery.
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Storing new sequence id for
> ring 68
> Dec 19 23:09:40 xen1 openais[2747]: [TOTEM] Sending initial ORF token
> Dec 19 23:09:40 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:09:40 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.116)
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:09:41 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] New Configuration:
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ]     r(0) ip(172.16.40.107)
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] Members Left:
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] Members Joined:
> Dec 19 23:09:41 xen1 openais[2747]: [SYNC ] This node is within the primary
> component and will provide service.
> Dec 19 23:09:41 xen1 openais[2747]: [TOTEM] entering OPERATIONAL state.
> Dec 19 23:09:41 xen1 openais[2747]: [CLM  ] got nodejoin message
> 172.16.40.107 Dec 19 23:09:41 xen1 openais[2747]: [CPG  ] got joinlist
> message from node 1 Dec 19 23:09:44 xen1 ccsd[2740]: Cluster is not
> quorate.  Refusing connection. Dec 19 23:09:44 xen1 ccsd[2740]: Error while
> processing connect: Connection refused
> Dec 19 23:09:49 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:09:49 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
> Dec 19 23:09:54 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:09:54 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
> Dec 19 23:09:59 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:09:59 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
> Dec 19 23:10:04 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:10:04 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
> Dec 19 23:10:09 xen1 ccsd[2740]: Cluster is not quorate.  Refusing
> connection. Dec 19 23:10:09 xen1 ccsd[2740]: Error while processing
> connect: Connection refused
>
> The last errors are ok because the cluster isn't quorate anymore. xen2 was
> rebooting and xen3 was fenced, so leaving xen1 alone creates an unquorate
> cluster...
>
> The unusual thing is that it only happens when one of the nodes is using
> rhel5 xen kernel. Maybe something in the bridge-utils bug and multicast?
> This problem happens if i reboot xen1 server with xen kernel or xen2
> server.
>
>
> Any intel?
>
> Thanks
> Nuno Fernandes



-- 
Nuno Pais Fernandes
Cisco Certified Network Associate
Oracle Certified Professional
Eurotux Informatica S.A.
Tel: +351 253257395
Fax: +351 253257396
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070403/af757812/attachment.sig>


More information about the Linux-cluster mailing list