[Linux-cluster] Node is randomly fenced

Digimer lists at alteeve.ca
Thu Jun 12 19:06:57 UTC 2014


Hrm, I'm not really sure that I am able to interpret this without making 
guesses. I'm cc'ing one of the devs (who I hope will poke the right 
person if he's not able to help at the moment). Lets see what he has to say.

I am curious now, too. :)

On 12/06/14 03:02 PM, Schaefer, Micah wrote:
> Node4 was fenced again, I was able to get some debug logs (below), a new
> message :
>
> "Jun 12 14:01:56 corosync [TOTEM ] The token was lost in the OPERATIONAL
> state.“
>
>
> Rest of corosync logs
>
> http://pastebin.com/iYFbkbhb
>
>
> Jun 12 14:44:49 corosync [TOTEM ] entering OPERATIONAL state.
> Jun 12 14:44:49 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jun 12 14:44:49 corosync [TOTEM ] waiting_trans_ack changed to 0
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] entering GATHER state from 12.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
> flushing membership messages.
> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33494 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
> flushing membership messages.
> Jun 12 14:44:50 corosync [TOTEM ] got commit token
> Jun 12 14:44:50 corosync [TOTEM ] Saving state aru 86 high seq received 86
> Jun 12 14:44:50 corosync [TOTEM ] Storing new sequence id for ring 6324
> Jun 12 14:44:50 corosync [TOTEM ] entering COMMIT state.
> Jun 12 14:44:50 corosync [TOTEM ] got commit token
> Jun 12 14:44:50 corosync [TOTEM ] entering RECOVERY state.
> Jun 12 14:44:50 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
> Jun 12 14:44:50 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
> Jun 12 14:44:50 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
> Jun 12 14:44:50 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
> Jun 12 14:44:50 corosync [TOTEM ] position [0] member 10.70.100.101:
> Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
> Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:50 corosync [TOTEM ] position [1] member 10.70.100.102:
> Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
> Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:50 corosync [TOTEM ] position [2] member 10.70.100.103:
> Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
> Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:50 corosync [TOTEM ] position [3] member 10.70.100.104:
> Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
> Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:50 corosync [TOTEM ] Did not need to originate any messages
> in recovery.
> Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 0, aru ffffffff
> Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 1, aru 0
> Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 2, aru 0
> Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 3, aru 0
> Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:50 corosync [TOTEM ] retrans flag count 4 token aru 0 install
> seq 0 aru 0 0
> Jun 12 14:44:50 corosync [TOTEM ] Resetting old ring state
> Jun 12 14:44:50 corosync [TOTEM ] recovery to regular 1-0
> Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 1
> Jun 12 14:44:50 corosync [TOTEM ] entering OPERATIONAL state.
> Jun 12 14:44:50 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 0
> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
> flushing membership messages.
> Jun 12 14:44:51 corosync [TOTEM ] entering GATHER state from 12.
> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
> flushing membership messages.
> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
> flushing membership messages.
> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
> flushing membership messages.
> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
> flushing membership messages.
> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
> flushing membership messages.
> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
> flushing membership messages.
> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
> flushing membership messages.
> Jun 12 14:44:51 corosync [TOTEM ] got commit token
> Jun 12 14:44:51 corosync [TOTEM ] Saving state aru 86 high seq received 86
> Jun 12 14:44:51 corosync [TOTEM ] Storing new sequence id for ring 6328
> Jun 12 14:44:51 corosync [TOTEM ] entering COMMIT state.
> Jun 12 14:44:51 corosync [TOTEM ] got commit token
> Jun 12 14:44:51 corosync [TOTEM ] entering RECOVERY state.
> Jun 12 14:44:51 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
> Jun 12 14:44:51 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
> Jun 12 14:44:51 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
> Jun 12 14:44:51 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
> Jun 12 14:44:51 corosync [TOTEM ] position [0] member 10.70.100.101:
> Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
> Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:51 corosync [TOTEM ] position [1] member 10.70.100.102:
> Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
> Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:51 corosync [TOTEM ] position [2] member 10.70.100.103:
> Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
> Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:51 corosync [TOTEM ] position [3] member 10.70.100.104:
> Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
> Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:51 corosync [TOTEM ] Did not need to originate any messages
> in recovery.
> Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 0, aru ffffffff
> Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 1, aru 0
> Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 2, aru 0
> Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 3, aru 0
> Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:51 corosync [TOTEM ] retrans flag count 4 token aru 0 install
> seq 0 aru 0 0
> Jun 12 14:44:51 corosync [TOTEM ] Resetting old ring state
> Jun 12 14:44:51 corosync [TOTEM ] recovery to regular 1-0
> Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 1
> Jun 12 14:44:51 corosync [TOTEM ] entering OPERATIONAL state.
> Jun 12 14:44:51 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 0
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] entering GATHER state from 12.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35455 ms,
> flushing membership messages.
> Jun 12 14:44:52 corosync [TOTEM ] got commit token
> Jun 12 14:44:52 corosync [TOTEM ] Saving state aru 86 high seq received 86
> Jun 12 14:44:52 corosync [TOTEM ] Storing new sequence id for ring 632c
> Jun 12 14:44:52 corosync [TOTEM ] entering COMMIT state.
> Jun 12 14:44:52 corosync [TOTEM ] got commit token
> Jun 12 14:44:52 corosync [TOTEM ] entering RECOVERY state.
> Jun 12 14:44:52 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
> Jun 12 14:44:52 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
> Jun 12 14:44:52 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
> Jun 12 14:44:52 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
> Jun 12 14:44:52 corosync [TOTEM ] position [0] member 10.70.100.101:
> Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
> Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:52 corosync [TOTEM ] position [1] member 10.70.100.102:
> Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
> Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:52 corosync [TOTEM ] position [2] member 10.70.100.103:
> Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
> Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:52 corosync [TOTEM ] position [3] member 10.70.100.104:
> Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
> Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:52 corosync [TOTEM ] Did not need to originate any messages
> in recovery.
> Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 0, aru ffffffff
> Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 1, aru 0
> Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 2, aru 0
> Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 3, aru 0
> Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:52 corosync [TOTEM ] retrans flag count 4 token aru 0 install
> seq 0 aru 0 0
> Jun 12 14:44:52 corosync [TOTEM ] Resetting old ring state
> Jun 12 14:44:52 corosync [TOTEM ] recovery to regular 1-0
> Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 1
> Jun 12 14:44:52 corosync [TOTEM ] entering OPERATIONAL state.
> Jun 12 14:44:52 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 0
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36223 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] entering GATHER state from 12.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36224 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
> flushing membership messages.
> Jun 12 14:44:53 corosync [TOTEM ] got commit token
> Jun 12 14:44:53 corosync [TOTEM ] Saving state aru 86 high seq received 86
> Jun 12 14:44:53 corosync [TOTEM ] Storing new sequence id for ring 6330
> Jun 12 14:44:53 corosync [TOTEM ] entering COMMIT state.
> Jun 12 14:44:53 corosync [TOTEM ] got commit token
> Jun 12 14:44:53 corosync [TOTEM ] entering RECOVERY state.
> Jun 12 14:44:53 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
> Jun 12 14:44:53 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
> Jun 12 14:44:53 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
> Jun 12 14:44:53 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
> Jun 12 14:44:53 corosync [TOTEM ] position [0] member 10.70.100.101:
> Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
> Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:53 corosync [TOTEM ] position [1] member 10.70.100.102:
> Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
> Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:53 corosync [TOTEM ] position [2] member 10.70.100.103:
> Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
> Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:53 corosync [TOTEM ] position [3] member 10.70.100.104:
> Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
> Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:53 corosync [TOTEM ] Did not need to originate any messages
> in recovery.
> Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 0, aru ffffffff
> Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 1, aru 0
> Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 2, aru 0
> Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 3, aru 0
> Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:53 corosync [TOTEM ] retrans flag count 4 token aru 0 install
> seq 0 aru 0 0
> Jun 12 14:44:53 corosync [TOTEM ] Resetting old ring state
> Jun 12 14:44:53 corosync [TOTEM ] recovery to regular 1-0
> Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 1
> Jun 12 14:44:53 corosync [TOTEM ] entering OPERATIONAL state.
> Jun 12 14:44:53 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 0
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms,
> flushing membership messages.
> Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12.
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms,
> flushing membership messages.
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms,
> flushing membership messages.
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms,
> flushing membership messages.
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms,
> flushing membership messages.
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms,
> flushing membership messages.
> Jun 12 14:44:54 corosync [TOTEM ] got commit token
> Jun 12 14:44:54 corosync [TOTEM ] Saving state aru 86 high seq received 86
> Jun 12 14:44:54 corosync [TOTEM ] Storing new sequence id for ring 6334
> Jun 12 14:44:54 corosync [TOTEM ] entering COMMIT state.
> Jun 12 14:44:54 corosync [TOTEM ] got commit token
> Jun 12 14:44:54 corosync [TOTEM ] entering RECOVERY state.
> Jun 12 14:44:54 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
> Jun 12 14:44:54 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
> Jun 12 14:44:54 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
> Jun 12 14:44:54 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
> Jun 12 14:44:54 corosync [TOTEM ] position [0] member 10.70.100.101:
> Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
> Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:54 corosync [TOTEM ] position [1] member 10.70.100.102:
> Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
> Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:54 corosync [TOTEM ] position [2] member 10.70.100.103:
> Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
> Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:54 corosync [TOTEM ] position [3] member 10.70.100.104:
> Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
> Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
> Jun 12 14:44:54 corosync [TOTEM ] Did not need to originate any messages
> in recovery.
> Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 0, aru ffffffff
> Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 1, aru 0
> Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 2, aru 0
> Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
> flag0 retrans queue empty 1 count 3, aru 0
> Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
> Jun 12 14:44:54 corosync [TOTEM ] retrans flag count 4 token aru 0 install
> seq 0 aru 0 0
> Jun 12 14:44:54 corosync [TOTEM ] Resetting old ring state
> Jun 12 14:44:54 corosync [TOTEM ] recovery to regular 1-0
> Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 1
> Jun 12 14:44:54 corosync [TOTEM ] entering OPERATIONAL state.
> Jun 12 14:44:54 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 0
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms,
> flushing membership messages.
> Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12.
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms,
> flushing membership messages.
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms,
> flushing membership messages.
> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38109 ms,
> flushing membership messages.
>
>
>
>
>
>
>
>
>
> On 6/12/14, 1:55 PM, "Schaefer, Micah" <Micah.Schaefer at jhuapl.edu> wrote:
>
>> I just found that the clock on node1 was off by about a minute and a half
>> compared to the rest of the nodes.
>>
>> I am running ntp, so not sure why the time wasn’t synced up. Wonder if
>> node1 being behind, would think it was not receiving updates from the
>> other nodes?
>>
>>
>>
>>
>>
>>
>>
>> On 6/12/14, 1:29 PM, "Digimer" <lists at alteeve.ca> wrote:
>>
>>> Even if the token changes stop the immediate fencing, don't leave it
>>> please. There is something fundamentally wrong that you need to
>>> identify/fix.
>>>
>>> Keep us posted!
>>>
>>> On 12/06/14 01:24 PM, Schaefer, Micah wrote:
>>>> The servers do not run any tasks other than the tasks in the cluster
>>>> service group.
>>>>
>>>> Nodes 3 and 4 are physical servers with a lot of horsepower and nodes 1
>>>> and 2 are virtual machines with much less resources available.
>>>>
>>>> I adjusted the token settings and will watch for any change.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 6/12/14, 1:08 PM, "Digimer" <lists at alteeve.ca> wrote:
>>>>
>>>>> On 12/06/14 12:48 PM, Schaefer, Micah wrote:
>>>>>> As far as the switch goes, both are Cisco Catalyst 6509-E, no
>>>>>> spanning
>>>>>> tree changes are happening and all the ports have port-fast enabled
>>>>>> for
>>>>>> these servers. My switch logging level is very high and I have no
>>>>>> messages
>>>>>> in relation to the time frames or ports.
>>>>>>
>>>>>> TOTEM reports that ³A processor joined or left the membershipŠ², but
>>>>>> that
>>>>>> isn¹t enough detail.
>>>>>>
>>>>>> Also note that I did not have these issues until adding new servers:
>>>>>> node3
>>>>>> and node4 to the cluster. Node1 and node2 do not fence each other
>>>>>> (unless
>>>>>> a real issue is there), and they are on different switches.
>>>>>
>>>>> Then I can't imagine it being network anymore. Seeing as both node 3
>>>>> and
>>>>> 4 get fenced, it's likely not hardware either. Are the workloads on 3
>>>>> and 4 much higher (or are the computers much slower) than 1 and 2? I'm
>>>>> wondering if the nodes are simply not keeping up with corosync
>>>>> traffic.
>>>>> You might try adjusting the corosync token timeout and retransmit
>>>>> counts
>>>>> to see if that reduces the node loses.
>>>>>
>>>>> --
>>>>> Digimer
>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>> What if the cure for cancer is trapped in the mind of a person without
>>>>> access to education?
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>
>>>
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without
>>> access to education?
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Linux-cluster mailing list