[Linux-cluster] Starting two-node cluster with only one node

Marc - A. Dahlhaus mad at wol.de
Sat Jul 18 14:02:13 UTC 2009


Hello,

as your cluster worked well on centos 5.2 the networking hardware 
components couldn't be the culprit in this case but is still think that 
it is an cluster communication related problem.

It could be your iptables ruleset... Try to disable the firewall and 
check again...

You can use tshark to check this as well in this case by using something 
like this:

tshark -i <interface cluster is useing> -f 'host <multicast-ip cluster 
is useing>' -V | less

Have you checked that openais is still chkconfig off after your upgrade?

Abed-nego G. Escobal, Jr. schrieb:
> Thanks for giving the pointers!
>
> uname -r on both nodes
>
> 2.6.18-128.1.16.el5
>
> on node01
>
> rpm -q cman gfs-utils kmod-gfs modcluster ricci luci cluster-snmp iscsi-initiator-utils lvm2-cluster openais oddjob rgmanager
> cman-2.0.98-2chrissie
> gfs-utils-0.1.18-1.el5
> kmod-gfs-0.1.23-5.el5_2.4
> kmod-gfs-0.1.31-3.el5
> modcluster-0.12.1-2.el5.centos
> ricci-0.12.1-7.3.el5.centos.1
> luci-0.12.1-7.3.el5.centos.1
> cluster-snmp-0.12.1-2.el5.centos
> iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1
> lvm2-cluster-2.02.40-7.el5
> openais-0.80.3-22.el5_3.8
> oddjob-0.27-9.el5
> rgmanager-2.0.46-1.el5.centos.3
>
> on node02
>
> rpm -q cman gfs-utils kmod-gfs modcluster ricci luci cluster-snmp iscsi-initiator-utils lvm2-cluster openais oddjob rgmanager
> cman-2.0.98-2chrissie
> gfs-utils-0.1.18-1.el5
> kmod-gfs-0.1.31-3.el5
> modcluster-0.12.1-2.el5.centos
> ricci-0.12.1-7.3.el5.centos.1
> luci-0.12.1-7.3.el5.centos.1
> cluster-snmp-0.12.1-2.el5.centos
> iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1
> lvm2-cluster-2.02.40-7.el5
> openais-0.80.3-22.el5_3.8
> oddjob-0.27-9.el5
> rgmanager-2.0.46-1.el5.centos.3
>
> I used http://knowledgelayer.softlayer.com/questions/443/GFS+howto to configure my cluster. When it was still on 5.2 the cluster worked, but after the recent update to 5.3, it broke.
>
> On one of the threads that I have found in the archive, it states that there is a problem with the most current official version of cman, bug id 485026. I replaced the most current cman package with cman-2.0.98-2chrissie because I tested if this was my problem, seems not so I will be moving back to the official package.
> I also found on another thread that openais was the culprit, changed it back to openais-0.80.3-15.el5 even though the change log indicates a lot of bug fixes were done on the most current official package. After doing it, it still did not work. I tried clean_start="1" with caution. I unmounted the iscsi then started cman but still it did not work. The most recent is post_join_delay="-1", I did not noticed that there was a man for fenced, which is much safer than clean_start="1" but still it did not fixed it. The man pages that I have read over and over again is cman and cluster.conf. Some pages in the online manual is somewhat not suitable for my situation because I do not have X installed on the machines and some pages in the online manual used system-config-cluster.
>
> As I understand in the online manual and FAQ, qdisk is not required if I have two_nodes="1" so I did not create any. I have removed the fence_daemon tag since I only used it for trying the solutions that were suggested. The hosts are present in each others hosts with correct ips.
>
>
> The ping results
>
> ping node02.company.com
>
> --- node01.company.com ping statistics ---
> 10 packets transmitted, 10 received, 0% packet loss, time 8999ms
> rtt min/avg/max/mdev = 0.010/0.016/0.034/0.007 ms
>
> ping node01.company.com
>
> --- node01.company.com ping statistics ---
> 10 packets transmitted, 10 received, 0% packet loss, time 9003ms
> rtt min/avg/max/mdev = 0.341/0.668/1.084/0.273 ms
>
> According to the people in the data center, the switch supports multicast communication on all ports that are used for cluster communication because they are in the same VLAN.
>
> For the logs, I will sending fresh logs as soon as possible. Currently I have not enough time window to bring down the machine.
>
> For the wireshark, I will be reading the man pages on how to use it.
>
> Please advise if any other information is needed to solve this. I am very grateful for the very detailed pointers. Thank you very much! 
>
>
> --- On Fri, 7/17/09, Marc - A. Dahlhaus [ Administration | Westermann GmbH ] <mad at wol.de> wrote:
>
>   
>> From: Marc - A. Dahlhaus [ Administration | Westermann GmbH ] <mad at wol.de>
>> Subject: Re: [Linux-cluster] Starting two-node cluster with only one node
>> To: "linux clustering" <linux-cluster at redhat.com>
>> Date: Friday, 17 July, 2009, 5:56 PM
>> Hello,
>>
>>
>> can you give us some hard facts on what versions of
>> cluster-suite
>> packages you are using in your environment and also the
>> related logs?
>>
>> Have you read the corresponding parts of the cluster suites
>> manual, man
>> pages, FAQ and also searched the list-archives for similar
>> problems
>> already? If not -> do it, there are may good hints to
>> find there.
>>
>>
>> The nodes find each other and create a cluster very fast IF
>> they can
>> talk to each other. As no cluster networking is involved in
>> fencing a
>> remote node if the fencing node by itself is quorate this
>> could be your
>> problem.
>>
>> You should change to fence_manual and switch back to your
>> real fencing
>> devices after you have debuged your problem. Also get rid
>> of the
>> <fence_daemon ... /> tag in your cluster.conf as
>> fenced does the right
>> thing by default if the remaining configuration is right
>> and now it is
>> just hiding a part of the problem.
>>
>> Also the 5 minute break on cman start smells like a
>> DNS-lookup problem
>> or other network related problem to me.
>>
>> Here is a short check-list to be sure the nodes can talk to
>> each other:
>>
>> Can the individual nodes ping each other?
>>
>> Can the individual nodes dns-lookup the other node-names
>> (which you used
>> in your cluster.conf)? (Try to add them to your etc/hosts
>> file, that way
>> you have a working cluster even if your dns-system is going
>> on
>> vacation.)
>>
>> Is your switch allowing multicast communication on all
>> ports that are
>> used for cluster communication? (This is a prerequisite for
>> openais /
>> corosync based cman which would be anything >= RHEL 5.
>> Search the
>> archives on this if you need more info...)
>>
>> Can you trace (eg. with wiresharks tshark) incoming
>> cluster
>> communication from remote nodes? (If you don't changed your
>> fencing to
>> fence_manual your listening system will get fenced before
>> you can get
>> any useful information out of it. Try with and without
>> active firewall.)
>>
>> If all above could be answered with "yes" your cluster
>> should form just
>> fine. You could try to add a qdisk-device as tiebreaker
>> after that and
>> test it just to be sure you have a working last man
>> standing setup...
>>
>> Hope that helps,
>>
>> Marc
>>     




More information about the Linux-cluster mailing list