[Linux-cluster] cman startup after after update to 5.3

Gunther Schlegel schlegel at riege.com
Fri Jan 30 13:56:53 UTC 2009


Rolling back to openais-0.80.3-15.el5 worked for me as well.

Though, this is an 5.3 update blocker, as it prevents rolling upgrades 
-- and that is why you run a cluster, ins't it?

I also have no clue whether a native "nativwe" 5.3 / 
openais-0.80.3-22el5 system will work. Can anyone confirm this?


regards, Gunther


Dave Costakos wrote:
> Confirmed.  Same here.  Seems like a bug to me still though.  I would 
> hope we have to ability to do rolling upgrades on openais in our RHEL 
> clusters. 
> 
> 2009/1/28 Alan A <alan.zg at gmail.com <mailto:alan.zg at gmail.com>>
> 
>     Rolling back to previous openais package allowed me to restart cman.
>     From openais-0.80.3-22el5 to
>     openais-0.80.3-15.el5.
> 
> 
>     2009/1/28 Dave Costakos <david.costakos at gmail.com
>     <mailto:david.costakos at gmail.com>>
> 
>         Like you, I've run into this same issue.  I have 2 clusters that
>         I'm trying to update in our lab.  On one, I only updated the
>         cman and rgmanager packages: this update was successful.  On
>         another I did a full update to 5.3 and ran into what appears to
>         be this same problem.  II've noticed that manually attempting to
>         start cman via 'cman_tool -d join' prints out this message right
>         before cman fails.
> 
>         aisexec: ckpt.c:3961: message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion `checkpoint != ((void *)0)' failed
> 
> 
> 
> 
> 
>         I suspect an openais issue, would someone be able to confirm that?
> 
>         Also, II'm going to try downgrading openais back to the version from RHEL 5.2 to see if that fixes it (though I won't get to that until the end of today).  If that works, I'll report back.
> 
> 
> 
> 
> 
>         2009/1/27 Alan A <alan.zg at gmail.com <mailto:alan.zg at gmail.com>>
> 
>             I just opened RHEL case number 1890184 regarding the same
>             issue. First Kernel would not start due to the HP ILO driver
>             conflict, but at the same time CMAN broke, and fencing
>             fails. I rolled back cman rpm to the previous version but
>             problem persists. Something else changed to affect CMAN not
>             starting again.
> 
>             2009/1/27 Gunther Schlegel <schlegel at riege.com
>             <mailto:schlegel at riege.com>>
> 
>                 Hello,
> 
>                 I updated one node from 5.2 to 5.3 using yum update and
>                 now cman does not start up anymore -- looks like ccsd
>                 has some problems:
> 
>                 [root at motel6 /]# /sbin/ccsd -4 -n
>                 Starting ccsd 2.0.98:
>                  Built: Dec  3 2008 16:32:30
>                  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
>                  IP Protocol:: IPv4 only
>                  No Daemon:: SET
> 
>                 Cluster is not quorate.  Refusing connection.
>                 Error while processing connect: Connection refused
>                 Cluster is not quorate.  Refusing connection.
>                 Error while processing connect: Connection refused
>                 Unable to connect to cluster infrastructure after 30
>                 seconds.
>                 Unable to connect to cluster infrastructure after 60
>                 seconds.
> 
> 
>                 When starting ccsd using /etc/init.d/cman it reports all
>                 three nodes to be on cluster.conf version 78, so I guess
>                 it is not a network connectivity problem.
> 
>                 The other two nodes (still on 5.2z) of the cluster are
>                 up and running with quorum. Openais is talking to those
>                 2 other nodes and it looks fine to me:
> 
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] Members
>                 Joined:
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0)
>                 ip(10.11.5.22)
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0)
>                 ip(10.11.5.23)
>                 Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node
>                 is within the primary component and will provide service.
>                 Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering
>                 OPERATIONAL state.
>                 Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum
>                 regained, resuming activity
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got
>                 nodejoin message 10.11.5.21
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got
>                 nodejoin message 10.11.5.22
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got
>                 nodejoin message 10.11.5.23
> 
> 
>                 I am a bit lost...
> 
>                 cluster.conf:
>                 [root at motel6 init.d]# cat /etc/cluster/cluster.conf
>                 <?xml version="1.0"?>
>                 <cluster alias="RSIXENCluster2" config_version="87"
>                 name="RSIXENCluster2">
>                        <fence_daemon clean_start="0" post_fail_delay="0"
>                 post_join_delay="3"/>
>                        <clusternodes>
>                                <clusternode name="concorde.riege.de
>                 <http://concorde.riege.de>" nodeid="1" votes="1">
>                                        <fence>
>                                                <method name="1">
>                                                        <device
>                 name="Concorde_IPMI"/>
>                                                </method>
>                                        </fence>
>                                </clusternode>
>                                <clusternode name="motel6.riege.de
>                 <http://motel6.riege.de>" nodeid="2" votes="1">
>                                        <fence>
>                                                <method name="1">
>                                                        <device
>                 name="Motel6_IPMI"/>
>                                                </method>
>                                        </fence>
>                                </clusternode>
>                                <clusternode name="mercure.riege.de
>                 <http://mercure.riege.de>" nodeid="3" votes="1">
>                                        <fence>
>                                                <method name="1">
>                                                        <device
>                 name="Mercure_IPMI"/>
>                                                </method>
>                                        </fence>
>                                </clusternode>
>                        </clusternodes>
>                        <fencedevices>
>                                <fencedevice agent="fence_ipmilan"
>                 ipaddr="10.11.5.132" login="root" name="Concorde_IPMI"
>                 passwd="XXX"/>
>                                <fencedevice agent="fence_ipmilan"
>                 ipaddr="10.11.5.131" login="root" name="Motel6_IPMI"
>                 passwd="xxx"/>
>                                <fencedevice agent="fence_ipmilan"
>                 ipaddr="10.11.5.133" login="root" name="Mercure_IPMI"
>                 passwd="XXX"/>
>                        </fencedevices>
>                        <rm>
>                                <failoverdomains>
>                                        <failoverdomain name="Earth"
>                 nofailback="1" ordered="1" restricted="1">
>                                                <failoverdomainnode
>                 name="concorde.riege.de <http://concorde.riege.de>"
>                 priority="1"/>
>                                                <failoverdomainnode
>                 name="motel6.riege.de <http://motel6.riege.de>"
>                 priority="1"/>
>                                                <failoverdomainnode
>                 name="mercure.riege.de <http://mercure.riege.de>"
>                 priority="1"/>
>                                        </failoverdomain>
>                                        <failoverdomain name="Europe"
>                 nofailback="0" ordered="1" restricted="0">
>                                                <failoverdomainnode
>                 name="concorde.riege.de <http://concorde.riege.de>"
>                 priority="2"/>
>                                        </failoverdomain>
>                                        <failoverdomain name="North
>                 America" nofailback="0" ordered="1" restricted="0">
>                                                <failoverdomainnode
>                 name="motel6.riege.de <http://motel6.riege.de>"
>                 priority="2"/>
>                                        </failoverdomain>
>                                        <failoverdomain name="Africa"
>                 nofailback="0" ordered="1" restricted="0">
>                                                <failoverdomainnode
>                 name="mercure.riege.de <http://mercure.riege.de>"
>                 priority="1"/>
>                                        </failoverdomain>
>                                </failoverdomains>
>                                <resources/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="live"
>                 name="vm64.test.riege.de_64" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="pause" name="rt.test.riege.de_32"
>                 path="/etc/xen" recovery="restart"/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="pause"
>                 name="poincare.riege.de_32" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="live"
>                 name="jboss.dev.riege.de_64" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="live"
>                 name="master.cc3.dev.riege.de_64" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="pause"
>                 name="test.alphatrans.scope.riege.com_32"
>                 path="/etc/xen" recovery="relocate"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="live"
>                 name="slave.cc3.dev.riege.de_64" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="live" name="webmail.riege.com_64"
>                 path="/etc/xen" recovery="relocate"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="live"
>                 name="live.rsi.scope.riege.com_64" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="pause"
>                 name="qa-16.rsi.scope.riege.com_32" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="pause"
>                 name="qa-18.rsi.scope.riege.com_32" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="pause"
>                 name="vm32.test.riege.de_32" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="pause"
>                 name="qa-head.rsi.scope.riege.com_32" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="live" name="mq.dev.riege.de_64"
>                 path="/etc/xen" recovery="relocate"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="live"
>                 name="archive.dev.riege.de_64" path="/etc/xen"
>                 recovery="restart"/>
>                        </rm>
>                        <cman quorum_dev_poll="50000"/>
>                        <totem consensus="4800" join="60" token="60000"
>                 token_retransmits_before_loss_const="20"/>
>                        <quorumd device="/dev/mapper/Quorum_Partition"
>                 interval="3" min_score="1" tko="10" votes="2"/>
>                 </cluster>
> 
>                 best regards, Gunther
> 
>                 -- 
>                 .............................................................
>                 Riege Software International GmbH  Fon: +49 (2159) 9148 0
>                 Mollsfeld 10                       Fax: +49 (2159) 9148 11
>                 40670 Meerbusch                    Web: www.riege.com
>                 <http://www.riege.com>
>                 Germany                            E-Mail:
>                 schlegel at riege.com <mailto:schlegel at riege.com>
>                 ---                                ---
>                 Handelsregister:                   Managing Directors:
>                 Amtsgericht Neuss HRB-NR 4207      Christian Riege
>                 USt-ID-Nr.: DE120585842            Gabriele  Riege
>                                                  Johannes  Riege
>                 .............................................................
>                          YOU CARE FOR FREIGHT, WE CARE FOR YOU          
> 
> 
> 
>                 --
>                 Linux-cluster mailing list
>                 Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>                 https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
>             -- 
>             Alan A.
> 
>             --
>             Linux-cluster mailing list
>             Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>             https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
>         -- 
>         Dave Costakos
>         mailto:david.costakos at gmail.com <mailto:david.costakos at gmail.com>
> 
>         --
>         Linux-cluster mailing list
>         Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>         https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
>     -- 
>     Alan A.
> 
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
> -- 
> Dave Costakos
> mailto:david.costakos at gmail.com <mailto:david.costakos at gmail.com>
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gunther Schlegel
Manager IT Infrastructure


.............................................................
Riege Software International GmbH  Fon: +49 (2159) 9148 0
Mollsfeld 10                       Fax: +49 (2159) 9148 11
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel at riege.com
---                                ---
Handelsregister:                   Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
USt-ID-Nr.: DE120585842            Gabriele  Riege
                                   Johannes  Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          



-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 346 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090130/0232875f/attachment.vcf>


More information about the Linux-cluster mailing list