[Linux-cluster] Node is randomly fenced

Schaefer, Micah Micah.Schaefer at jhuapl.edu
Thu Jun 12 19:02:43 UTC 2014


Node4 was fenced again, I was able to get some debug logs (below), a new
message :

"Jun 12 14:01:56 corosync [TOTEM ] The token was lost in the OPERATIONAL
state.“


Rest of corosync logs

http://pastebin.com/iYFbkbhb


Jun 12 14:44:49 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:49 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:49 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33494 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] got commit token
Jun 12 14:44:50 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:50 corosync [TOTEM ] Storing new sequence id for ring 6324
Jun 12 14:44:50 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:50 corosync [TOTEM ] got commit token
Jun 12 14:44:50 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:50 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:50 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:50 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:50 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:50 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:50 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:50 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:50 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:50 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:50 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:50 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:50 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:50 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:50 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] got commit token
Jun 12 14:44:51 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:51 corosync [TOTEM ] Storing new sequence id for ring 6328
Jun 12 14:44:51 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:51 corosync [TOTEM ] got commit token
Jun 12 14:44:51 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:51 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:51 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:51 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:51 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:51 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:51 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:51 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:51 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:51 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:51 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:51 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:51 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:51 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:51 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35455 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] got commit token
Jun 12 14:44:52 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:52 corosync [TOTEM ] Storing new sequence id for ring 632c
Jun 12 14:44:52 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:52 corosync [TOTEM ] got commit token
Jun 12 14:44:52 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:52 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:52 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:52 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:52 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:52 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:52 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:52 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:52 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:52 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:52 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:52 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:52 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:52 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:52 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36223 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36224 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] got commit token
Jun 12 14:44:53 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:53 corosync [TOTEM ] Storing new sequence id for ring 6330
Jun 12 14:44:53 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:53 corosync [TOTEM ] got commit token
Jun 12 14:44:53 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:53 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:53 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:53 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:53 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:53 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:53 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:53 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:53 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:53 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:53 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:53 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:53 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:53 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:53 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] got commit token
Jun 12 14:44:54 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:54 corosync [TOTEM ] Storing new sequence id for ring 6334
Jun 12 14:44:54 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:54 corosync [TOTEM ] got commit token
Jun 12 14:44:54 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:54 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:54 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:54 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:54 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:54 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:54 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:54 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:54 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:54 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:54 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:54 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:54 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:54 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:54 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38109 ms,
flushing membership messages.









On 6/12/14, 1:55 PM, "Schaefer, Micah" <Micah.Schaefer at jhuapl.edu> wrote:

>I just found that the clock on node1 was off by about a minute and a half
>compared to the rest of the nodes.
>
>I am running ntp, so not sure why the time wasn’t synced up. Wonder if
>node1 being behind, would think it was not receiving updates from the
>other nodes?
>
>
>
>
>
>
>
>On 6/12/14, 1:29 PM, "Digimer" <lists at alteeve.ca> wrote:
>
>>Even if the token changes stop the immediate fencing, don't leave it
>>please. There is something fundamentally wrong that you need to
>>identify/fix.
>>
>>Keep us posted!
>>
>>On 12/06/14 01:24 PM, Schaefer, Micah wrote:
>>> The servers do not run any tasks other than the tasks in the cluster
>>> service group.
>>>
>>> Nodes 3 and 4 are physical servers with a lot of horsepower and nodes 1
>>> and 2 are virtual machines with much less resources available.
>>>
>>> I adjusted the token settings and will watch for any change.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 6/12/14, 1:08 PM, "Digimer" <lists at alteeve.ca> wrote:
>>>
>>>> On 12/06/14 12:48 PM, Schaefer, Micah wrote:
>>>>> As far as the switch goes, both are Cisco Catalyst 6509-E, no
>>>>>spanning
>>>>> tree changes are happening and all the ports have port-fast enabled
>>>>>for
>>>>> these servers. My switch logging level is very high and I have no
>>>>> messages
>>>>> in relation to the time frames or ports.
>>>>>
>>>>> TOTEM reports that ³A processor joined or left the membershipŠ², but
>>>>> that
>>>>> isn¹t enough detail.
>>>>>
>>>>> Also note that I did not have these issues until adding new servers:
>>>>> node3
>>>>> and node4 to the cluster. Node1 and node2 do not fence each other
>>>>> (unless
>>>>> a real issue is there), and they are on different switches.
>>>>
>>>> Then I can't imagine it being network anymore. Seeing as both node 3
>>>>and
>>>> 4 get fenced, it's likely not hardware either. Are the workloads on 3
>>>> and 4 much higher (or are the computers much slower) than 1 and 2? I'm
>>>> wondering if the nodes are simply not keeping up with corosync
>>>>traffic.
>>>> You might try adjusting the corosync token timeout and retransmit
>>>>counts
>>>> to see if that reduces the node loses.
>>>>
>>>> --
>>>> Digimer
>>>> Papers and Projects: https://alteeve.ca/w/
>>>> What if the cure for cancer is trapped in the mind of a person without
>>>> access to education?
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>
>>
>>--
>>Digimer
>>Papers and Projects: https://alteeve.ca/w/
>>What if the cure for cancer is trapped in the mind of a person without
>>access to education?
>>
>>--
>>Linux-cluster mailing list
>>Linux-cluster at redhat.com
>>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster





More information about the Linux-cluster mailing list