[Linux-cluster] can't re-join cluster after upgrade

Gunther Schlegel schlegel at riege.com
Mon Mar 9 11:50:56 UTC 2009


openais from 5.2 and 5.3 cannot talk top each other. There is a bugzilla 
ticket on this (but I cannot find the id right now) and several requests 
to RH support.

While RH support still works on it preliminary information indicates 
that there won't be a fix for this, and I also dooubt that there will be 
a workaround.

Shutting down amd restarting the entire cluster solves the problem.

Installing openais from RHEL 5.2 will let the updated node join the 
cluster as well, if you want it up and can't shutdown node 2 as well.

best regards, Gunther


Ramiro Blanco wrote:
> Hi, I've just upgraded 1 of my 2-node cluster to RHEL 5.3 and now that
> node can't join the cluster. Can i upgrade 1 node at a time?
> here's the output of /var/log/messages:
> 
> ...
> Mar  9 03:26:34 web1 ccsd[29129]: Starting ccsd 2.0.98:
> Mar  9 03:26:34 web1 ccsd[29129]:  Built: Dec  3 2008 16:32:30
> Mar  9 03:26:34 web1 ccsd[29129]:  Copyright (C) Red Hat, Inc.  2004
> All rights reserved.
> Mar  9 03:26:34 web1 ccsd[29129]: cluster.conf (cluster name =
> cluster_web, version = 3) found.
> Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
> quorate node.
> Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
> Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
> Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
> quorate node.
> Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
> Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
> Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
> quorate node.
> Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
> Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
> Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
> quorate node.
> Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
> Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] AIS Executive Service
> RELEASE 'subrev 1358 version 0.80.3'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Copyright (C) 2002-2006
> MontaVista Software, Inc and contributors.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Copyright (C) 2006 Red
> Hat, Inc.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] AIS Executive Service:
> started and ready to provide service.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Using default multicast
> address of 239.192.73.137
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_cpg loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais cluster closed process group service v1.01'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_cfg loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais configuration service'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_msg loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais message service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_lck loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais distributed locking service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_evt loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais event service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_ckpt loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais checkpoint service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_amf loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais availability management framework B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_clm loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais cluster membership service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_evs loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais extended virtual synchrony service'
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
> openais_cman loaded.
> Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
> handler 'openais CMAN membership service 2.01'
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Token Timeout (10000 ms)
> retransmit timeout (495 ms)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] token hold (386 ms)
> retransmits before loss (20 retrans)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] join (60 ms) send_join (0
> ms) consensus (4800 ms) merge (200 ms)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] downcheck (1000 ms) fail
> to recv const (50 msgs)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] seqno unchanged const (30
> rotations) Maximum network MTU 1500
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] window size per rotation
> (50 messages) maximum messages per rotation (17 messages)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] send threads (0 threads)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP token expired timeout
> (495 ms)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP token problem counter
> (2000 ms)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP threshold (10 problem
> count)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP mode set to none.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM]
> heartbeat_failures_allowed (0)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] max_network_delay (50 ms)
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] HeartBeat is Disabled. To
> enable set heartbeat_failures_allowed > 0
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Receive multicast socket
> recv buffer size (262142 bytes).
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] The network interface
> [192.168.10.3] is now up.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Created or loaded
> sequence id 280.192.168.10.3 for this ring.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering GATHER state
> from 15.
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais extended virtual synchrony service'
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais cluster membership service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais availability management framework B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais checkpoint service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais event service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais distributed locking service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais message service B.01.01'
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais configuration service'
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais cluster closed process group service v1.01'
> Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
> handler 'openais CMAN membership service 2.01'
> Mar  9 03:26:34 web1 openais[29135]: [CMAN ] CMAN 2.0.98 (built Dec  3
> 2008 16:32:34) started
> Mar  9 03:26:34 web1 openais[29135]: [SYNC ] Not using a virtual
> synchrony filter.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Creating commit token
> because I am the rep.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Saving state aru 0 high
> seq received 0
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Storing new sequence id
> for ring 11c
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering COMMIT state.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering RECOVERY state.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] position [0] member
> 192.168.10.3:
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] previous ring seq 280 rep
> 192.168.10.3
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] aru 0 high delivered 0
> received flag 1
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Did not need to originate
> any messages in recovery.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Sending initial ORF token
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
> Mar  9 03:26:34 web1 openais[29135]: [SYNC ] This node is within the
> primary component and will provide service.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering OPERATIONAL state.
> Mar  9 03:26:34 web1 openais[29135]: [CMAN ] quorum regained, resuming activity
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] got nodejoin message 192.168.10.3
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering GATHER state from 11.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Creating commit token
> because I am the rep.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Saving state aru a high
> seq received a
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Storing new sequence id
> for ring 120
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering COMMIT state.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering RECOVERY state.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] position [0] member 192.168.10.3:
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] previous ring seq 284 rep
> 192.168.10.3
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] aru a high delivered a
> received flag 1
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] position [1] member 192.168.10.4:
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] previous ring seq 284 rep
> 192.168.10.4
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] aru 8e high delivered 8e
> received flag 1
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Did not need to originate
> any messages in recovery.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Sending initial ORF token
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.4)
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.4)
> Mar  9 03:26:34 web1 openais[29135]: [SYNC ] This node is within the
> primary component and will provide service.
> Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering OPERATIONAL state.
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] got nodejoin message 192.168.10.3
> Mar  9 03:26:34 web1 openais[29135]: [CLM  ] got nodejoin message 192.168.10.4
> ..
> 
> Any help would be appreciated.
> 
> 
> 

-- 
Gunther Schlegel
Manager IT Infrastructure


.............................................................
Riege Software International GmbH  Fon: +49 (2159) 9148 0
Mollsfeld 10                       Fax: +49 (2159) 9148 11
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel at riege.com
---                                ---
Handelsregister:                   Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
USt-ID-Nr.: DE120585842            Gabriele  Riege
                                   Johannes  Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          



-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 346 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090309/65009bce/attachment.vcf>


More information about the Linux-cluster mailing list