[Linux-cluster] cman startup after after update to 5.3
Gunther Schlegel
schlegel at riege.com
Fri Jan 30 13:56:53 UTC 2009
Rolling back to openais-0.80.3-15.el5 worked for me as well.
Though, this is an 5.3 update blocker, as it prevents rolling upgrades
-- and that is why you run a cluster, ins't it?
I also have no clue whether a native "nativwe" 5.3 /
openais-0.80.3-22el5 system will work. Can anyone confirm this?
regards, Gunther
Dave Costakos wrote:
> Confirmed. Same here. Seems like a bug to me still though. I would
> hope we have to ability to do rolling upgrades on openais in our RHEL
> clusters.
>
> 2009/1/28 Alan A <alan.zg at gmail.com <mailto:alan.zg at gmail.com>>
>
> Rolling back to previous openais package allowed me to restart cman.
> From openais-0.80.3-22el5 to
> openais-0.80.3-15.el5.
>
>
> 2009/1/28 Dave Costakos <david.costakos at gmail.com
> <mailto:david.costakos at gmail.com>>
>
> Like you, I've run into this same issue. I have 2 clusters that
> I'm trying to update in our lab. On one, I only updated the
> cman and rgmanager packages: this update was successful. On
> another I did a full update to 5.3 and ran into what appears to
> be this same problem. II've noticed that manually attempting to
> start cman via 'cman_tool -d join' prints out this message right
> before cman fails.
>
> aisexec: ckpt.c:3961: message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion `checkpoint != ((void *)0)' failed
>
>
>
>
>
> I suspect an openais issue, would someone be able to confirm that?
>
> Also, II'm going to try downgrading openais back to the version from RHEL 5.2 to see if that fixes it (though I won't get to that until the end of today). If that works, I'll report back.
>
>
>
>
>
> 2009/1/27 Alan A <alan.zg at gmail.com <mailto:alan.zg at gmail.com>>
>
> I just opened RHEL case number 1890184 regarding the same
> issue. First Kernel would not start due to the HP ILO driver
> conflict, but at the same time CMAN broke, and fencing
> fails. I rolled back cman rpm to the previous version but
> problem persists. Something else changed to affect CMAN not
> starting again.
>
> 2009/1/27 Gunther Schlegel <schlegel at riege.com
> <mailto:schlegel at riege.com>>
>
> Hello,
>
> I updated one node from 5.2 to 5.3 using yum update and
> now cman does not start up anymore -- looks like ccsd
> has some problems:
>
> [root at motel6 /]# /sbin/ccsd -4 -n
> Starting ccsd 2.0.98:
> Built: Dec 3 2008 16:32:30
> Copyright (C) Red Hat, Inc. 2004 All rights reserved.
> IP Protocol:: IPv4 only
> No Daemon:: SET
>
> Cluster is not quorate. Refusing connection.
> Error while processing connect: Connection refused
> Cluster is not quorate. Refusing connection.
> Error while processing connect: Connection refused
> Unable to connect to cluster infrastructure after 30
> seconds.
> Unable to connect to cluster infrastructure after 60
> seconds.
>
>
> When starting ccsd using /etc/init.d/cman it reports all
> three nodes to be on cluster.conf version 78, so I guess
> it is not a network connectivity problem.
>
> The other two nodes (still on 5.2z) of the cluster are
> up and running with quorum. Openais is talking to those
> 2 other nodes and it looks fine to me:
>
> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] Members
> Joined:
> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] #011r(0)
> ip(10.11.5.22)
> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] #011r(0)
> ip(10.11.5.23)
> Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node
> is within the primary component and will provide service.
> Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering
> OPERATIONAL state.
> Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum
> regained, resuming activity
> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] got
> nodejoin message 10.11.5.21
> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] got
> nodejoin message 10.11.5.22
> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] got
> nodejoin message 10.11.5.23
>
>
> I am a bit lost...
>
> cluster.conf:
> [root at motel6 init.d]# cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster alias="RSIXENCluster2" config_version="87"
> name="RSIXENCluster2">
> <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
> <clusternodes>
> <clusternode name="concorde.riege.de
> <http://concorde.riege.de>" nodeid="1" votes="1">
> <fence>
> <method name="1">
> <device
> name="Concorde_IPMI"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="motel6.riege.de
> <http://motel6.riege.de>" nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device
> name="Motel6_IPMI"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="mercure.riege.de
> <http://mercure.riege.de>" nodeid="3" votes="1">
> <fence>
> <method name="1">
> <device
> name="Mercure_IPMI"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <fencedevices>
> <fencedevice agent="fence_ipmilan"
> ipaddr="10.11.5.132" login="root" name="Concorde_IPMI"
> passwd="XXX"/>
> <fencedevice agent="fence_ipmilan"
> ipaddr="10.11.5.131" login="root" name="Motel6_IPMI"
> passwd="xxx"/>
> <fencedevice agent="fence_ipmilan"
> ipaddr="10.11.5.133" login="root" name="Mercure_IPMI"
> passwd="XXX"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="Earth"
> nofailback="1" ordered="1" restricted="1">
> <failoverdomainnode
> name="concorde.riege.de <http://concorde.riege.de>"
> priority="1"/>
> <failoverdomainnode
> name="motel6.riege.de <http://motel6.riege.de>"
> priority="1"/>
> <failoverdomainnode
> name="mercure.riege.de <http://mercure.riege.de>"
> priority="1"/>
> </failoverdomain>
> <failoverdomain name="Europe"
> nofailback="0" ordered="1" restricted="0">
> <failoverdomainnode
> name="concorde.riege.de <http://concorde.riege.de>"
> priority="2"/>
> </failoverdomain>
> <failoverdomain name="North
> America" nofailback="0" ordered="1" restricted="0">
> <failoverdomainnode
> name="motel6.riege.de <http://motel6.riege.de>"
> priority="2"/>
> </failoverdomain>
> <failoverdomain name="Africa"
> nofailback="0" ordered="1" restricted="0">
> <failoverdomainnode
> name="mercure.riege.de <http://mercure.riege.de>"
> priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <resources/>
> <vm autostart="1" domain="Africa"
> exclusive="0" migrate="live"
> name="vm64.test.riege.de_64" path="/etc/xen"
> recovery="restart"/>
> <vm autostart="1" domain="North America"
> exclusive="0" migrate="pause" name="rt.test.riege.de_32"
> path="/etc/xen" recovery="restart"/>
> <vm autostart="1" domain="Africa"
> exclusive="0" migrate="pause"
> name="poincare.riege.de_32" path="/etc/xen"
> recovery="restart"/>
> <vm autostart="1" domain="North America"
> exclusive="0" migrate="live"
> name="jboss.dev.riege.de_64" path="/etc/xen"
> recovery="relocate"/>
> <vm autostart="1" domain="Africa"
> exclusive="0" migrate="live"
> name="master.cc3.dev.riege.de_64" path="/etc/xen"
> recovery="relocate"/>
> <vm autostart="1" domain="Europe"
> exclusive="0" migrate="pause"
> name="test.alphatrans.scope.riege.com_32"
> path="/etc/xen" recovery="relocate"/>
> <vm autostart="1" domain="North America"
> exclusive="0" migrate="live"
> name="slave.cc3.dev.riege.de_64" path="/etc/xen"
> recovery="restart"/>
> <vm autostart="1" domain="North America"
> exclusive="0" migrate="live" name="webmail.riege.com_64"
> path="/etc/xen" recovery="relocate"/>
> <vm autostart="1" domain="Europe"
> exclusive="0" migrate="live"
> name="live.rsi.scope.riege.com_64" path="/etc/xen"
> recovery="relocate"/>
> <vm autostart="1" domain="Europe"
> exclusive="0" migrate="pause"
> name="qa-16.rsi.scope.riege.com_32" path="/etc/xen"
> recovery="relocate"/>
> <vm autostart="1" domain="Africa"
> exclusive="0" migrate="pause"
> name="qa-18.rsi.scope.riege.com_32" path="/etc/xen"
> recovery="relocate"/>
> <vm autostart="1" domain="Africa"
> exclusive="0" migrate="pause"
> name="vm32.test.riege.de_32" path="/etc/xen"
> recovery="restart"/>
> <vm autostart="1" domain="Europe"
> exclusive="0" migrate="pause"
> name="qa-head.rsi.scope.riege.com_32" path="/etc/xen"
> recovery="restart"/>
> <vm autostart="1" domain="North America"
> exclusive="0" migrate="live" name="mq.dev.riege.de_64"
> path="/etc/xen" recovery="relocate"/>
> <vm autostart="1" domain="Europe"
> exclusive="0" migrate="live"
> name="archive.dev.riege.de_64" path="/etc/xen"
> recovery="restart"/>
> </rm>
> <cman quorum_dev_poll="50000"/>
> <totem consensus="4800" join="60" token="60000"
> token_retransmits_before_loss_const="20"/>
> <quorumd device="/dev/mapper/Quorum_Partition"
> interval="3" min_score="1" tko="10" votes="2"/>
> </cluster>
>
> best regards, Gunther
>
> --
> .............................................................
> Riege Software International GmbH Fon: +49 (2159) 9148 0
> Mollsfeld 10 Fax: +49 (2159) 9148 11
> 40670 Meerbusch Web: www.riege.com
> <http://www.riege.com>
> Germany E-Mail:
> schlegel at riege.com <mailto:schlegel at riege.com>
> --- ---
> Handelsregister: Managing Directors:
> Amtsgericht Neuss HRB-NR 4207 Christian Riege
> USt-ID-Nr.: DE120585842 Gabriele Riege
> Johannes Riege
> .............................................................
> YOU CARE FOR FREIGHT, WE CARE FOR YOU
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> --
> Alan A.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> --
> Dave Costakos
> mailto:david.costakos at gmail.com <mailto:david.costakos at gmail.com>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> --
> Alan A.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> --
> Dave Costakos
> mailto:david.costakos at gmail.com <mailto:david.costakos at gmail.com>
>
>
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
--
Gunther Schlegel
Manager IT Infrastructure
.............................................................
Riege Software International GmbH Fon: +49 (2159) 9148 0
Mollsfeld 10 Fax: +49 (2159) 9148 11
40670 Meerbusch Web: www.riege.com
Germany E-Mail: schlegel at riege.com
--- ---
Handelsregister: Managing Directors:
Amtsgericht Neuss HRB-NR 4207 Christian Riege
USt-ID-Nr.: DE120585842 Gabriele Riege
Johannes Riege
.............................................................
YOU CARE FOR FREIGHT, WE CARE FOR YOU
-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 346 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090130/0232875f/attachment.vcf>
More information about the Linux-cluster
mailing list