[Linux-cluster] cman startup after after update to 5.3

Dave Costakos david.costakos at gmail.com
Wed Jan 28 23:26:45 UTC 2009


Confirmed.  Same here.  Seems like a bug to me still though.  I would hope
we have to ability to do rolling upgrades on openais in our RHEL clusters.

2009/1/28 Alan A <alan.zg at gmail.com>

> Rolling back to previous openais package allowed me to restart cman. From
> openais-0.80.3-22el5 to
> openais-0.80.3-15.el5.
>
>
> 2009/1/28 Dave Costakos <david.costakos at gmail.com>
>
> Like you, I've run into this same issue.  I have 2 clusters that I'm trying
>> to update in our lab.  On one, I only updated the cman and rgmanager
>> packages: this update was successful.  On another I did a full update to 5.3
>> and ran into what appears to be this same problem.  II've noticed that
>> manually attempting to start cman via 'cman_tool -d join' prints out this
>> message right before cman fails.
>>
>> aisexec: ckpt.c:3961: message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion `checkpoint != ((void *)0)' failed
>>
>>
>>
>> I suspect an openais issue, would someone be able to confirm that?
>>
>> Also, II'm going to try downgrading openais back to the version from RHEL 5.2 to see if that fixes it (though I won't get to that until the end of today).  If that works, I'll report back.
>>
>>
>>
>> 2009/1/27 Alan A <alan.zg at gmail.com>
>>
>> I just opened RHEL case number 1890184 regarding the same issue. First
>>> Kernel would not start due to the HP ILO driver conflict, but at the same
>>> time CMAN broke, and fencing fails. I rolled back cman rpm to the previous
>>> version but problem persists. Something else changed to affect CMAN not
>>> starting again.
>>>
>>> 2009/1/27 Gunther Schlegel <schlegel at riege.com>
>>>
>>>>  Hello,
>>>>
>>>> I updated one node from 5.2 to 5.3 using yum update and now cman does
>>>> not start up anymore -- looks like ccsd has some problems:
>>>>
>>>> [root at motel6 /]# /sbin/ccsd -4 -n
>>>> Starting ccsd 2.0.98:
>>>>  Built: Dec  3 2008 16:32:30
>>>>  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
>>>>  IP Protocol:: IPv4 only
>>>>  No Daemon:: SET
>>>>
>>>> Cluster is not quorate.  Refusing connection.
>>>> Error while processing connect: Connection refused
>>>> Cluster is not quorate.  Refusing connection.
>>>> Error while processing connect: Connection refused
>>>> Unable to connect to cluster infrastructure after 30 seconds.
>>>> Unable to connect to cluster infrastructure after 60 seconds.
>>>>
>>>>
>>>> When starting ccsd using /etc/init.d/cman it reports all three nodes to
>>>> be on cluster.conf version 78, so I guess it is not a network connectivity
>>>> problem.
>>>>
>>>> The other two nodes (still on 5.2z) of the cluster are up and running
>>>> with quorum. Openais is talking to those 2 other nodes and it looks fine to
>>>> me:
>>>>
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] Members Joined:
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.22)
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.23)
>>>> Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node is within the
>>>> primary component and will provide service.
>>>> Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering OPERATIONAL
>>>> state.
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum regained, resuming
>>>> activity
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>>>> 10.11.5.21
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>>>> 10.11.5.22
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>>>> 10.11.5.23
>>>>
>>>>
>>>> I am a bit lost...
>>>>
>>>> cluster.conf:
>>>> [root at motel6 init.d]# cat /etc/cluster/cluster.conf
>>>> <?xml version="1.0"?>
>>>> <cluster alias="RSIXENCluster2" config_version="87"
>>>> name="RSIXENCluster2">
>>>>        <fence_daemon clean_start="0" post_fail_delay="0"
>>>> post_join_delay="3"/>
>>>>        <clusternodes>
>>>>                <clusternode name="concorde.riege.de" nodeid="1"
>>>> votes="1">
>>>>                        <fence>
>>>>                                <method name="1">
>>>>                                        <device name="Concorde_IPMI"/>
>>>>                                </method>
>>>>                        </fence>
>>>>                </clusternode>
>>>>                <clusternode name="motel6.riege.de" nodeid="2"
>>>> votes="1">
>>>>                        <fence>
>>>>                                <method name="1">
>>>>                                        <device name="Motel6_IPMI"/>
>>>>                                </method>
>>>>                        </fence>
>>>>                </clusternode>
>>>>                <clusternode name="mercure.riege.de" nodeid="3"
>>>> votes="1">
>>>>                        <fence>
>>>>                                <method name="1">
>>>>                                        <device name="Mercure_IPMI"/>
>>>>                                </method>
>>>>                        </fence>
>>>>                </clusternode>
>>>>        </clusternodes>
>>>>        <fencedevices>
>>>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.132"
>>>> login="root" name="Concorde_IPMI" passwd="XXX"/>
>>>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.131"
>>>> login="root" name="Motel6_IPMI" passwd="xxx"/>
>>>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.133"
>>>> login="root" name="Mercure_IPMI" passwd="XXX"/>
>>>>        </fencedevices>
>>>>        <rm>
>>>>                <failoverdomains>
>>>>                        <failoverdomain name="Earth" nofailback="1"
>>>> ordered="1" restricted="1">
>>>>                                <failoverdomainnode name="
>>>> concorde.riege.de" priority="1"/>
>>>>                                <failoverdomainnode name="
>>>> motel6.riege.de" priority="1"/>
>>>>                                <failoverdomainnode name="
>>>> mercure.riege.de" priority="1"/>
>>>>                        </failoverdomain>
>>>>                        <failoverdomain name="Europe" nofailback="0"
>>>> ordered="1" restricted="0">
>>>>                                <failoverdomainnode name="
>>>> concorde.riege.de" priority="2"/>
>>>>                        </failoverdomain>
>>>>                        <failoverdomain name="North America"
>>>> nofailback="0" ordered="1" restricted="0">
>>>>                                <failoverdomainnode name="
>>>> motel6.riege.de" priority="2"/>
>>>>                        </failoverdomain>
>>>>                        <failoverdomain name="Africa" nofailback="0"
>>>> ordered="1" restricted="0">
>>>>                                <failoverdomainnode name="
>>>> mercure.riege.de" priority="1"/>
>>>>                        </failoverdomain>
>>>>                </failoverdomains>
>>>>                <resources/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="live" name="vm64.test.riege.de_64" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="pause" name="rt.test.riege.de_32" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="pause" name="poincare.riege.de_32" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="live" name="jboss.dev.riege.de_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="live" name="master.cc3.dev.riege.de_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="pause" name="test.alphatrans.scope.riege.com_32" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="live" name="slave.cc3.dev.riege.de_64" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="live" name="webmail.riege.com_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="live" name="live.rsi.scope.riege.com_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="pause" name="qa-16.rsi.scope.riege.com_32" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="pause" name="qa-18.rsi.scope.riege.com_32" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="pause" name="vm32.test.riege.de_32" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="pause" name="qa-head.rsi.scope.riege.com_32" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="live" name="mq.dev.riege.de_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="live" name="archive.dev.riege.de_64" path="/etc/xen"
>>>> recovery="restart"/>
>>>>        </rm>
>>>>        <cman quorum_dev_poll="50000"/>
>>>>        <totem consensus="4800" join="60" token="60000"
>>>> token_retransmits_before_loss_const="20"/>
>>>>        <quorumd device="/dev/mapper/Quorum_Partition" interval="3"
>>>> min_score="1" tko="10" votes="2"/>
>>>> </cluster>
>>>>
>>>> best regards, Gunther
>>>>
>>>> --
>>>> .............................................................
>>>> Riege Software International GmbH  Fon: +49 (2159) 9148 0
>>>> Mollsfeld 10                       Fax: +49 (2159) 9148 11
>>>> 40670 Meerbusch                    Web: www.riege.com
>>>> Germany                            E-Mail: schlegel at riege.com
>>>> ---                                ---
>>>> Handelsregister:                   Managing Directors:
>>>> Amtsgericht Neuss HRB-NR 4207      Christian Riege
>>>> USt-ID-Nr.: DE120585842            Gabriele  Riege
>>>>                                  Johannes  Riege
>>>> .............................................................
>>>>          YOU CARE FOR FREIGHT, WE CARE FOR YOU
>>>>
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>>
>>> --
>>> Alan A.
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>>
>> --
>> Dave Costakos
>> mailto:david.costakos at gmail.com
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Alan A.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Dave Costakos
mailto:david.costakos at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090128/12a1fe8d/attachment.htm>


More information about the Linux-cluster mailing list