[Linux-cluster] can't re-join cluster after upgrade

Ramiro Blanco ramiblanco at gmail.com
Mon Mar 9 05:32:50 UTC 2009


Hi, I've just upgraded 1 of my 2-node cluster to RHEL 5.3 and now that
node can't join the cluster. Can i upgrade 1 node at a time?
here's the output of /var/log/messages:

...
Mar  9 03:26:34 web1 ccsd[29129]: Starting ccsd 2.0.98:
Mar  9 03:26:34 web1 ccsd[29129]:  Built: Dec  3 2008 16:32:30
Mar  9 03:26:34 web1 ccsd[29129]:  Copyright (C) Red Hat, Inc.  2004
All rights reserved.
Mar  9 03:26:34 web1 ccsd[29129]: cluster.conf (cluster name =
cluster_web, version = 3) found.
Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
quorate node.
Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
quorate node.
Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
quorate node.
Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
quorate node.
Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] AIS Executive Service
RELEASE 'subrev 1358 version 0.80.3'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Copyright (C) 2002-2006
MontaVista Software, Inc and contributors.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Copyright (C) 2006 Red
Hat, Inc.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] AIS Executive Service:
started and ready to provide service.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Using default multicast
address of 239.192.73.137
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_cpg loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais cluster closed process group service v1.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_cfg loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais configuration service'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_msg loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais message service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_lck loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais distributed locking service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_evt loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais event service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_ckpt loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais checkpoint service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_amf loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais availability management framework B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_clm loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais cluster membership service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_evs loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais extended virtual synchrony service'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_cman loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais CMAN membership service 2.01'
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Token Timeout (10000 ms)
retransmit timeout (495 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] token hold (386 ms)
retransmits before loss (20 retrans)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] join (60 ms) send_join (0
ms) consensus (4800 ms) merge (200 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] downcheck (1000 ms) fail
to recv const (50 msgs)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] seqno unchanged const (30
rotations) Maximum network MTU 1500
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] window size per rotation
(50 messages) maximum messages per rotation (17 messages)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] send threads (0 threads)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP token expired timeout
(495 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP token problem counter
(2000 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP threshold (10 problem
count)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP mode set to none.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM]
heartbeat_failures_allowed (0)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] max_network_delay (50 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] HeartBeat is Disabled. To
enable set heartbeat_failures_allowed > 0
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Receive multicast socket
recv buffer size (262142 bytes).
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Transmit multicast socket
send buffer size (262142 bytes).
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] The network interface
[192.168.10.3] is now up.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Created or loaded
sequence id 280.192.168.10.3 for this ring.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering GATHER state
from 15.
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais extended virtual synchrony service'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais cluster membership service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais availability management framework B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais checkpoint service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais event service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais distributed locking service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais message service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais configuration service'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais cluster closed process group service v1.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais CMAN membership service 2.01'
Mar  9 03:26:34 web1 openais[29135]: [CMAN ] CMAN 2.0.98 (built Dec  3
2008 16:32:34) started
Mar  9 03:26:34 web1 openais[29135]: [SYNC ] Not using a virtual
synchrony filter.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Creating commit token
because I am the rep.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Saving state aru 0 high
seq received 0
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Storing new sequence id
for ring 11c
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering COMMIT state.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering RECOVERY state.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] position [0] member
192.168.10.3:
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] previous ring seq 280 rep
192.168.10.3
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] aru 0 high delivered 0
received flag 1
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Did not need to originate
any messages in recovery.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Sending initial ORF token
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
Mar  9 03:26:34 web1 openais[29135]: [SYNC ] This node is within the
primary component and will provide service.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering OPERATIONAL state.
Mar  9 03:26:34 web1 openais[29135]: [CMAN ] quorum regained, resuming activity
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] got nodejoin message 192.168.10.3
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering GATHER state from 11.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Creating commit token
because I am the rep.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Saving state aru a high
seq received a
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Storing new sequence id
for ring 120
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering COMMIT state.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering RECOVERY state.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] position [0] member 192.168.10.3:
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] previous ring seq 284 rep
192.168.10.3
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] aru a high delivered a
received flag 1
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] position [1] member 192.168.10.4:
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] previous ring seq 284 rep
192.168.10.4
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] aru 8e high delivered 8e
received flag 1
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Did not need to originate
any messages in recovery.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Sending initial ORF token
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.4)
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.4)
Mar  9 03:26:34 web1 openais[29135]: [SYNC ] This node is within the
primary component and will provide service.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering OPERATIONAL state.
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] got nodejoin message 192.168.10.3
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] got nodejoin message 192.168.10.4
..

Any help would be appreciated.



-- 
Ramiro Blanco




More information about the Linux-cluster mailing list