[Linux-cluster] RE: [Openais] Basic cluster not starting

Steven Dake sdake at redhat.com
Mon Jul 9 20:11:39 UTC 2007


Explain crashes whole cluster?  Could you send cman_tool nodes after
fence but before the node restarts?  (ie: fence it then unplug its power
cord or use the power gui :)

Thanks
-steve


On Mon, 2007-07-09 at 12:47 -0400, james anderson wrote:
> Steve/Patrick,
>  
> Thanks for the replies :)
>  
> I found the following FC6 x86_64 updates and applied them to all 3
> nodes:
>   rpm -ivh xen-libs-3.0.3-9.fc6.x86_64.rpm
>   rpm -ivh --nodeps libvirt-0.2.3-1.fc6.x86_64.rpm
>   rpm -ivh bridge-utils-1.1-2.x86_64.rpm
>   rpm -ivh libvirt-python-0.2.3-1.fc6.x86_64.rpm
>   rpm -ivh python-virtinst-0.95.0-1.fc6.noarch.rpm
>   rpm -ivh xen-3.0.3-9.fc6.x86_64.rpm
>   rpm -Uvh cman-2.0.60-1.fc6.x86_64.rpm
> 
> After installing these I triple checked that the cluster.conf files
> are identical.  I then rebooted them all and restarted the cman
> service.  The good news is that the basic cluster now works!  The bad
> news: fencing a node crashes the whole cluster, also conga has some
> serious problems.  I will post those in seperate emails.
>  
> Just wanted to tie up this thread for anyone else encountering the
> same problem.  If anyone has had the same experience please post so my
> findings can be confirmed.
>  
> Cheers,
> James
> 
> 
> > Subject: Re: [Openais] Basic cluster not starting
> > From: sdake at redhat.com
> > To: jamesanderson1 at hotmail.com
> > CC: openais at lists.linux-foundation.org; linux-cluster at redhat.com
> > Date: Sat, 7 Jul 2007 18:06:07 -0700
> > 
> > James,
> > 
> > Let me speak with Patrick Caulfield on this topic Monday.
> > 
> > I have not seen this before in any of our testing, but it is
> possible
> > someone else using RHCS has. I've also copied the linux-cluster
> list.
> > 
> > The problem appears to be, however, with something relating to ccs
> or
> > the startup order. The opennais code doesn't know anything about the
> > ccsd node ids or parsing of the xml configuration file. That work is
> > done by ccsd and cman.
> > 
> > Did you try the cman init script?
> > 
> > Regards
> > -steve
> > 
> > On Thu, 2007-07-05 at 14:21 -0400, james anderson wrote:
> > > I am attempting to install GFS on FC6 64bit using RPMs.
> > > Below you will find my config and steps taken to get a GFS cluster
> > > running.
> > > I am unclear if the problem is with OpenAIS or RHCS.
> > > 
> > > 
> > > FC6 64bit RPMs
> > > --------------
> > > rpm -ivh openais-0.80.1-3.x86_64.rpm
> > > rpm -ivh perl-Net-Telnet-3.03-5.noarch.rpm
> > > rpm -ivh cman-2.0.18-2.fc6.x86_64.rpm
> > > System config cluster
> > > rpm -ivh system-config-cluster-1.0.29-1.0.noarch.rpm
> > > Luci
> > > rpm -ivh python-imaging-1.1.6-3.fc6.x86_64.rpm
> > > rpm -ivh zope-2.9.7-2.fc6.x86_64.rpm
> > > rpm -ivh plone-2.5.3-1.fc6.x86_64.rpm
> > > rpm -ivh luci-0.9.3-2.fc6.x86_64.rpm
> > > Ricci
> > > rpm -ivh --nodeps oddjob-libs-0.27-8.x86_64.rpm
> > > rpm -ivh oddjob-0.27-8.x86_64.rpm
> > > rpm -ivh modcluster-0.9.3-2.fc6.x86_64.rpm
> > > rpm -ivh ricci-0.9.3-2.fc6.x86_64.rpm
> > > 
> > > 
> > > /etc/cluster/cluster.conf
> > > -------------------------
> > > <?xml version="1.0"?>
> > > <cluster alias="alpha_cluster" config_version="8"
> > > name="alpha_cluster">
> > > <fence_daemon post_fail_delay="0" post_join_delay="3"/>
> > > <clusternodes>
> > > <clusternode name="node1" nodeid="1" votes="1">
> > > <multicast addr="239.192.196.121" interface="eth1"/>
> > > <fence>
> > > <method name="1">
> > > <device name="nps1" port="1" switch="1"/>
> > > </method>
> > > </fence>
> > > </clusternode>
> > > <clusternode name="node2" nodeid="2" votes="1">
> > > <multicast addr="239.192.196.121" interface="eth0"/>
> > > <fence>
> > > <method name="1">
> > > <device name="nps1" port="2" switch="1"/>
> > > </method>
> > > </fence>
> > > </clusternode>
> > > <clusternode name="node3" nodeid="3" votes="1">
> > > <multicast addr="239.192.196.121" interface="eth2"/>
> > > <fence>
> > > <method name="1">
> > > <device name="nps1" port="3" switch="1"/>
> > > </method>
> > > </fence>
> > > </clusternode>
> > > </clusternodes>
> > > <cman>
> > > <multicast addr="239.192.196.121"/>
> > > </cman>
> > > <fencedevices>
> > > <fencedevice agent="fence_apc" ipaddr="10.1.1.123" login="root"
> > > name="***" passwd="***"/>
> > > </fencedevices>
> > > <rm>
> > > <failoverdomains/>
> > > <resources/>
> > > </rm>
> > > </cluster>
> > > 
> > > 
> > > Commands
> > > --------
> > > # modprobe lock_dlm
> > > # modprobe dlm
> > > # mount -t configfs non /sys/kernel/config
> > > # ccsd
> > > # cman_tool join
> > > 
> > > 
> > > /var/log/messages
> > > -----------------
> > > 1 Jul 2 14:50:16 node1 ccsd[22457]: Starting ccsd 2.0.18:
> > > 2 Jul 2 14:50:16 node1 ccsd[22457]: Built: Oct 1 2006 17:18:46
> > > 3 Jul 2 14:50:16 node1 ccsd[22457]: Copyright (C) Red Hat, Inc.
> 2004
> > > All rights reserved.
> > > 4 Jul 2 14:50:45 node1 ccsd[22457]: Unable to connect to cluster
> > > infrastructure after 30 seconds.
> > > 5 Jul 2 14:51:15 node1 ccsd[22457]: Unable to connect to cluster
> > > infrastructure after 60 seconds.
> > > 6 Jul 2 14:51:39 node1 ccsd[22457]: cluster.conf (cluster name =
> > > alpha_cluster, version = 6) found.
> > > 7 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive
> Service
> > > RELEASE 'subrev 1204 version 0.80.1'
> > > 8 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C)
> 2002-2006
> > > MontaVista Software, Inc and contributors.
> > > 9 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C) 2006
> Red
> > > Hat, Inc.
> > > 10 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] No nodeid
> specified in
> > > cluster.conf
> > > 11 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Error reading CCS
> > > info, cannot start
> > > 12 Jul 2 14:51:41 node1 openais[22542]: [MAIN ]
> > > 13 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive
> exiting
> > > (-9).
> > > 14 Jul 2 14:51:45 node1 ccsd[22457]: Unable to connect to cluster
> > > infrastructure after 90 seconds.
> > > 15 Jul 2 14:52:15 node1 ccsd[22457]: Unable to connect to cluster
> > > infrastructure after 120 seconds.
> > > 16 Jul 2 14:52:44 node1 ccsd[22457]: Stopping ccsd, SIGTERM
> received.
> > > 
> > > Lines 1-6 are from running the "ccsd" command above.
> > > Lines 7-13 are from running the "cman_tool join" command above.
> > > 
> > > I also received the following error message:
> > > cman not started: CCS does not have a nodeid for this node, run
> > > 'ccs_tool addnodeids' to fix
> > > cman_tool: aisexec daemon didn't start
> > > 
> > > Yes I did try running the ccs_tool addnodeids. It did not help.
> Notice
> > > in the cluster.conf the nodeids were already in place. Any
> pointers to
> > > narrowing down my problem are appreciated.
> > > 
> > > Thanks,
> > > James
> > > 
> > > 
> > > 
> > >
> ______________________________________________________________________
> > > See what you’re getting into…before you go there. Check it out!
> > > _______________________________________________
> > > Openais mailing list
> > > Openais at lists.linux-foundation.org
> > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > 
> 
> 
> 
> ______________________________________________________________________
> Missed the show?  Watch videos of the Live Earth Concert on MSN. See
> them now!




More information about the Linux-cluster mailing list