[Linux-cluster] GFS on CentOS - cman unable to start

Sat Jan 7 00:30:19 UTC 2012

Hi, 
I think CMAN expect that the names of the cluster nodes be the same returned by the command "uname -n". 
For what you write your nodes hostnames are: test01.gdao.ucsc.edu and test02.gdao.ucsc.edu, but in cluster.conf you have declared only "test01" and "test02". 

------------------------------------ 
Patricio Bruna V. 
IT Linux Ltda. 
www.it-linux.cl 
Twitter 
Fono : (+56-2) 333 0578 
Móvil: (+56-9) 8899 6618 

----- Mensaje original -----

> These servers are currently on the same host, but may not be in the
> future. They are in a vm cluster (though honestly, I'm not sure what
> this means yet).

> SElinux is on, but disabled.
> Firewalling through iptables is turned off via
> system-config-securitylevel

> There is no line currently in the cluster.conf that deals with
> multicasting.

> Any other suggestions?

> Wes

> On 1/6/2012 12:05 PM, Luiz Gustavo Tonello wrote:
> > Hi,
> 

> > This servers is on VMware? At the same host?
> 
> > SElinux is disable? iptables have something?
> 

> > In my environment I had a problem to start GFS2 with servers in
> > differents hosts.
> 
> > To clustering servers, was need migrate one server to the same host
> > of the other, and restart this.
> 

> > I think, one of the problem was because the virtual switchs.
> 
> > To solve, I changed a multicast IP, to use 225.0.0.13 at
> > cluster.conf
> 
> > <multicast addr="225.0.0.13"/>
> 
> > And add a static route in both, to use default gateway.
> 

> > I don't know if it's correct, but this solve my problem.
> 

> > I hope that help you.
> 

> > Regards.
> 

> > On Fri, Jan 6, 2012 at 5:01 PM, Wes Modes < wmodes at ucsc.edu >
> > wrote:
> 

> > > Hi, Steven.
> > 
> 

> > > I've tried just about every possible combination of hostname and
> > 
> 
> > > cluster.conf.
> > 
> 

> > > ping to test01 resolves to 128.114.31.112
> > 
> 
> > > ping to test01.gdao.ucsc.edu resolves to 128.114.31.112
> > 
> 

> > > It feels like the right thing is being returned. This feels like
> > > it
> > 
> 
> > > might be a quirk (or bug possibly) of cman or openais.
> > 
> 

> > > There are some old bug reports around this, for example
> > 
> 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=488565 . It sounds
> > > like
> > > the
> > 
> 
> > > way that cman reports this error is anything but straightforward.
> > 
> 

> > > Is there anyone who has encountered this error and found a
> > > solution?
> > 
> 

> > > Wes
> > 
> 

> > > On 1/6/2012 2:00 AM, Steven Whitehouse wrote:
> > 
> 
> > > > Hi,
> > 
> 
> > > >
> > 
> 
> > > > On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote:
> > 
> 
> > > >> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS
> > > >> systems
> > 
> 
> > > >> running on vmWare. The GFS FS is on a Dell Equilogic SAN.
> > 
> 
> > > >>
> > 
> 
> > > >> I keep running into the same problem despite many
> > > >> differently-flavored
> > 
> 
> > > >> attempts to set up GFS. The problem comes when I try to start
> > > >> cman, the
> > 
> 
> > > >> cluster management software.
> > 
> 
> > > >>
> > 
> 
> > > >> [root at test01]# service cman start
> > 
> 
> > > >> Starting cluster:
> > 
> 
> > > >> Loading modules... done
> > 
> 
> > > >> Mounting configfs... done
> > 
> 
> > > >> Starting ccsd... done
> > 
> 
> > > >> Starting cman... failed
> > 
> 
> > > >> cman not started: Can't find local node name in cluster.conf
> > 
> 
> > > >> /usr/sbin/cman_tool: aisexec daemon didn't start
> > 
> 
> > > >> [FAILED]
> > 
> 
> > > >>
> > 
> 
> > > > This looks like what it says... whatever the node name is in
> > 
> 
> > > > cluster.conf, it doesn't exist when the name is looked up, or
> > > > possibly
> > 
> 
> > > > it does exist, but is mapped to the loopback address (it needs
> > > > to
> > > > map to
> > 
> 
> > > > an address which is valid cluster-wide)
> > 
> 
> > > >
> > 
> 
> > > > Since your config files look correct, the next thing to check
> > > > is
> > > > what
> > 
> 
> > > > the resolver is actually returning. Try (for example) a ping to
> > > > test01
> > 
> 
> > > > (you need to specify exactly the same form of the name as is
> > > > used
> > > > in
> > 
> 
> > > > cluster.conf) from test02 and see whether it uses the correct
> > > > ip
> > 
> 
> > > > address, just in case the wrong thing is being returned.
> > 
> 
> > > >
> > 
> 
> > > > Steve.
> > 
> 
> > > >
> > 
> 
> > > >> [root at test01]# tail /var/log/messages
> > 
> 
> > > >> Jan 5 13:39:40 testbench06 ccsd[13194]: Unable to connect to
> > 
> 
> > > >> cluster infrastructure after 1193640 seconds.
> > 
> 
> > > >> Jan 5 13:40:10 testbench06 ccsd[13194]: Unable to connect to
> > 
> 
> > > >> cluster infrastructure after 1193670 seconds.
> > 
> 
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
> > > >> Executive
> > 
> 
> > > >> Service RELEASE 'subrev 1887 version 0.80.6'
> > 
> 
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
> > > >> (C)
> > 
> 
> > > >> 2002-2006 MontaVista Software, Inc and contributors.
> > 
> 
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
> > > >> (C)
> > 
> 
> > > >> 2006 Red Hat, Inc.
> > 
> 
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
> > > >> Executive
> > 
> 
> > > >> Service: started and ready to provide service.
> > 
> 
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] local node
> > > >> name
> > 
> 
> > > >> " test01.gdao.ucsc.edu " not found in cluster.conf
> > 
> 
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
> > > >> reading
> > > >> CCS
> > 
> 
> > > >> info, cannot start
> > 
> 
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
> > > >> reading
> > 
> 
> > > >> config from CCS
> > 
> 
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
> > > >> Executive
> > 
> 
> > > >> exiting (reason: could not read the main configuration file).
> > 
> 
> > > >>
> > 
> 
> > > >> Here are details of my configuration:
> > 
> 
> > > >>
> > 
> 
> > > >> [root at test01]# rpm -qa | grep cman
> > 
> 
> > > >> cman-2.0.115-85.el5_7.2
> > 
> 
> > > >>
> > 
> 
> > > >> [root at test01]# echo $HOSTNAME
> > 
> 
> > > >> test01.gdao.ucsc.edu
> > 
> 
> > > >>
> > 
> 
> > > >> [root at test01]# hostname
> > 
> 
> > > >> test01.gdao.ucsc.edu
> > 
> 
> > > >>
> > 
> 
> > > >> [root at test01]# cat /etc/hosts
> > 
> 
> > > >> # Do not remove the following line, or various programs
> > 
> 
> > > >> # that require network functionality will fail.
> > 
> 
> > > >> 128.114.31.112 test01 test01.gdao test01.gdao.ucsc.edu
> > 
> 
> > > >> 128.114.31.113 test02 test02.gdao test02.gdao.ucsc.edu
> > 
> 
> > > >> 127.0.0.1 localhost.localdomain localhost
> > 
> 
> > > >> ::1 localhost6.localdomain6 localhost6
> > 
> 
> > > >>
> > 
> 
> > > >> [root at test01]# sestatus
> > 
> 
> > > >> SELinux status: enabled
> > 
> 
> > > >> SELinuxfs mount: /selinux
> > 
> 
> > > >> Current mode: permissive
> > 
> 
> > > >> Mode from config file: permissive
> > 
> 
> > > >> Policy version: 21
> > 
> 
> > > >> Policy from config file: targeted
> > 
> 
> > > >>
> > 
> 
> > > >> [root at test01]# cat /etc/cluster/cluster.conf
> > 
> 
> > > >> <?xml version="1.0"?>
> > 
> 
> > > >> <cluster config_version="25" name="gdao_cluster">
> > 
> 
> > > >> <fence_daemon post_fail_delay="0" post_join_delay="120"/>
> > 
> 
> > > >> <clusternodes>
> > 
> 
> > > >> <clusternode name="test01" nodeid="1" votes="1">
> > 
> 
> > > >> <fence>
> > 
> 
> > > >> <method name="single">
> > 
> 
> > > >> <device name="gfs_vmware"/>
> > 
> 
> > > >> </method>
> > 
> 
> > > >> </fence>
> > 
> 
> > > >> </clusternode>
> > 
> 
> > > >> <clusternode name="test02" nodeid="2" votes="1">
> > 
> 
> > > >> <fence>
> > 
> 
> > > >> <method name="single">
> > 
> 
> > > >> <device name="gfs_vmware"/>
> > 
> 
> > > >> </method>
> > 
> 
> > > >> </fence>
> > 
> 
> > > >> </clusternode>
> > 
> 
> > > >> </clusternodes>
> > 
> 
> > > >> <cman/>
> > 
> 
> > > >> <fencedevices>
> > 
> 
> > > >> <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
> > 
> 
> > > >> <fencedevice agent="fence_vmware" name="gfs_vmware"
> > 
> 
> > > >> ipaddr=" gdvcenter.ucsc.edu " login="root"
> > > >> passwd="1hateAmazon.com"
> > 
> 
> > > >> vmlogin="root" vmpasswd="esxpass"
> > 
> 
> > > >> port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
> > 
> 
> > > >> </fencedevices>
> > 
> 
> > > >> <rm>
> > 
> 
> > > >> <failoverdomains/>
> > 
> 
> > > >> </rm>
> > 
> 
> > > >> </cluster>
> > 
> 
> > > >>
> > 
> 
> > > >> I've seen much discussion of this problem, but no definitive
> > > >> solutions.
> > 
> 
> > > >> Any help you can provide will be welcome.
> > 
> 
> > > >>
> > 
> 
> > > >> Wes Modes
> > 
> 
> > > >>
> > 
> 
> > > >> --
> > 
> 
> > > >> Linux-cluster mailing list
> > 
> 
> > > >> Linux-cluster at redhat.com
> > 
> 
> > > >> https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> > > >
> > 
> 
> > > > --
> > 
> 
> > > > Linux-cluster mailing list
> > 
> 
> > > > Linux-cluster at redhat.com
> > 
> 
> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 

> > > --
> > 
> 
> > > Linux-cluster mailing list
> > 
> 
> > > Linux-cluster at redhat.com
> > 
> 
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 

> > --
> 
> > Luiz Gustavo P Tonello.
> 

> > --
> 
> > Linux-cluster mailing list Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120106/891358db/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zimbra_gold_partner.png
Type: image/png
Size: 2893 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120106/891358db/attachment.png>