[Linux-cluster] Starting two-node cluster with only one node

Sat Jul 18 15:47:28 UTC 2009

Hi!

I am very sorry that I did not mention that when I am testing different suggestions on solving this, I always temporarily disable the firewall. Then turning it back on after the testing.

Thank you very much on the tip for the tshark! I will post the output as as soon as I get a maintenance window to restart the cman service.

With regards to openais, it is still off on both servers. Should it be turned on on boot? I am very sorry but I haven't read it in the manuals that it should be "on".

--- On Sat, 7/18/09, Marc - A. Dahlhaus <mad at wol.de> wrote:

> From: Marc - A. Dahlhaus <mad at wol.de>
> Subject: Re: [Linux-cluster] Starting two-node cluster with only one node
> To: "linux clustering" <linux-cluster at redhat.com>
> Date: Saturday, 18 July, 2009, 10:02 PM
> Hello,
> 
> as your cluster worked well on centos 5.2 the networking
> hardware 
> components couldn't be the culprit in this case but is
> still think that 
> it is an cluster communication related problem.
> 
> It could be your iptables ruleset... Try to disable the
> firewall and 
> check again...
> 
> You can use tshark to check this as well in this case by
> using something 
> like this:
> 
> tshark -i <interface cluster is useing> -f 'host
> <multicast-ip cluster 
> is useing>' -V | less
> 
> Have you checked that openais is still chkconfig off after
> your upgrade?
> 
> Abed-nego G. Escobal, Jr. schrieb:
> > Thanks for giving the pointers!
> >
> > uname -r on both nodes
> >
> > 2.6.18-128.1.16.el5
> >
> > on node01
> >
> > rpm -q cman gfs-utils kmod-gfs modcluster ricci luci
> cluster-snmp iscsi-initiator-utils lvm2-cluster openais
> oddjob rgmanager
> > cman-2.0.98-2chrissie
> > gfs-utils-0.1.18-1.el5
> > kmod-gfs-0.1.23-5.el5_2.4
> > kmod-gfs-0.1.31-3.el5
> > modcluster-0.12.1-2.el5.centos
> > ricci-0.12.1-7.3.el5.centos.1
> > luci-0.12.1-7.3.el5.centos.1
> > cluster-snmp-0.12.1-2.el5.centos
> > iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1
> > lvm2-cluster-2.02.40-7.el5
> > openais-0.80.3-22.el5_3.8
> > oddjob-0.27-9.el5
> > rgmanager-2.0.46-1.el5.centos.3
> >
> > on node02
> >
> > rpm -q cman gfs-utils kmod-gfs modcluster ricci luci
> cluster-snmp iscsi-initiator-utils lvm2-cluster openais
> oddjob rgmanager
> > cman-2.0.98-2chrissie
> > gfs-utils-0.1.18-1.el5
> > kmod-gfs-0.1.31-3.el5
> > modcluster-0.12.1-2.el5.centos
> > ricci-0.12.1-7.3.el5.centos.1
> > luci-0.12.1-7.3.el5.centos.1
> > cluster-snmp-0.12.1-2.el5.centos
> > iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1
> > lvm2-cluster-2.02.40-7.el5
> > openais-0.80.3-22.el5_3.8
> > oddjob-0.27-9.el5
> > rgmanager-2.0.46-1.el5.centos.3
> >
> > I used http://knowledgelayer.softlayer.com/questions/443/GFS+howto
> to configure my cluster. When it was still on 5.2 the
> cluster worked, but after the recent update to 5.3, it
> broke.
> >
> > On one of the threads that I have found in the
> archive, it states that there is a problem with the most
> current official version of cman, bug id 485026. I replaced
> the most current cman package with cman-2.0.98-2chrissie
> because I tested if this was my problem, seems not so I will
> be moving back to the official package.
> > I also found on another thread that openais was the
> culprit, changed it back to openais-0.80.3-15.el5 even
> though the change log indicates a lot of bug fixes were done
> on the most current official package. After doing it, it
> still did not work. I tried clean_start="1" with caution. I
> unmounted the iscsi then started cman but still it did not
> work. The most recent is post_join_delay="-1", I did not
> noticed that there was a man for fenced, which is much safer
> than clean_start="1" but still it did not fixed it. The man
> pages that I have read over and over again is cman and
> cluster.conf. Some pages in the online manual is somewhat
> not suitable for my situation because I do not have X
> installed on the machines and some pages in the online
> manual used system-config-cluster.
> >
> > As I understand in the online manual and FAQ, qdisk is
> not required if I have two_nodes="1" so I did not create
> any. I have removed the fence_daemon tag since I only used
> it for trying the solutions that were suggested. The hosts
> are present in each others hosts with correct ips.
> >
> >
> > The ping results
> >
> > ping node02.company.com
> >
> > --- node01.company.com ping statistics ---
> > 10 packets transmitted, 10 received, 0% packet loss,
> time 8999ms
> > rtt min/avg/max/mdev = 0.010/0.016/0.034/0.007 ms
> >
> > ping node01.company.com
> >
> > --- node01.company.com ping statistics ---
> > 10 packets transmitted, 10 received, 0% packet loss,
> time 9003ms
> > rtt min/avg/max/mdev = 0.341/0.668/1.084/0.273 ms
> >
> > According to the people in the data center, the switch
> supports multicast communication on all ports that are used
> for cluster communication because they are in the same
> VLAN.
> >
> > For the logs, I will sending fresh logs as soon as
> possible. Currently I have not enough time window to bring
> down the machine.
> >
> > For the wireshark, I will be reading the man pages on
> how to use it.
> >
> > Please advise if any other information is needed to
> solve this. I am very grateful for the very detailed
> pointers. Thank you very much! 
> >
> >
> > --- On Fri, 7/17/09, Marc - A. Dahlhaus [
> Administration | Westermann GmbH ] <mad at wol.de> wrote:
> >
> >   
> >> From: Marc - A. Dahlhaus [ Administration |
> Westermann GmbH ] <mad at wol.de>
> >> Subject: Re: [Linux-cluster] Starting two-node
> cluster with only one node
> >> To: "linux clustering" <linux-cluster at redhat.com>
> >> Date: Friday, 17 July, 2009, 5:56 PM
> >> Hello,
> >>
> >>
> >> can you give us some hard facts on what versions
> of
> >> cluster-suite
> >> packages you are using in your environment and
> also the
> >> related logs?
> >>
> >> Have you read the corresponding parts of the
> cluster suites
> >> manual, man
> >> pages, FAQ and also searched the list-archives for
> similar
> >> problems
> >> already? If not -> do it, there are may good
> hints to
> >> find there.
> >>
> >>
> >> The nodes find each other and create a cluster
> very fast IF
> >> they can
> >> talk to each other. As no cluster networking is
> involved in
> >> fencing a
> >> remote node if the fencing node by itself is
> quorate this
> >> could be your
> >> problem.
> >>
> >> You should change to fence_manual and switch back
> to your
> >> real fencing
> >> devices after you have debuged your problem. Also
> get rid
> >> of the
> >> <fence_daemon ... /> tag in your
> cluster.conf as
> >> fenced does the right
> >> thing by default if the remaining configuration is
> right
> >> and now it is
> >> just hiding a part of the problem.
> >>
> >> Also the 5 minute break on cman start smells like
> a
> >> DNS-lookup problem
> >> or other network related problem to me.
> >>
> >> Here is a short check-list to be sure the nodes
> can talk to
> >> each other:
> >>
> >> Can the individual nodes ping each other?
> >>
> >> Can the individual nodes dns-lookup the other
> node-names
> >> (which you used
> >> in your cluster.conf)? (Try to add them to your
> etc/hosts
> >> file, that way
> >> you have a working cluster even if your dns-system
> is going
> >> on
> >> vacation.)
> >>
> >> Is your switch allowing multicast communication on
> all
> >> ports that are
> >> used for cluster communication? (This is a
> prerequisite for
> >> openais /
> >> corosync based cman which would be anything >=
> RHEL 5.
> >> Search the
> >> archives on this if you need more info...)
> >>
> >> Can you trace (eg. with wiresharks tshark)
> incoming
> >> cluster
> >> communication from remote nodes? (If you don't
> changed your
> >> fencing to
> >> fence_manual your listening system will get fenced
> before
> >> you can get
> >> any useful information out of it. Try with and
> without
> >> active firewall.)
> >>
> >> If all above could be answered with "yes" your
> cluster
> >> should form just
> >> fine. You could try to add a qdisk-device as
> tiebreaker
> >> after that and
> >> test it just to be sure you have a working last
> man
> >> standing setup...
> >>
> >> Hope that helps,
> >>
> >> Marc
> >>     
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

      "Try the new FASTER Yahoo! Mail. Experience it today at http://ph.mail.yahoo.com"