[Linux-cluster] GFS on CentOS - cman unable to start
Patricio A. Bruna
pbruna at it-linux.cl
Sat Jan 7 00:30:19 UTC 2012
Hi,
I think CMAN expect that the names of the cluster nodes be the same returned by the command "uname -n".
For what you write your nodes hostnames are: test01.gdao.ucsc.edu and test02.gdao.ucsc.edu, but in cluster.conf you have declared only "test01" and "test02".
------------------------------------
Patricio Bruna V.
IT Linux Ltda.
www.it-linux.cl
Twitter
Fono : (+56-2) 333 0578
Móvil: (+56-9) 8899 6618
----- Mensaje original -----
> These servers are currently on the same host, but may not be in the
> future. They are in a vm cluster (though honestly, I'm not sure what
> this means yet).
> SElinux is on, but disabled.
> Firewalling through iptables is turned off via
> system-config-securitylevel
> There is no line currently in the cluster.conf that deals with
> multicasting.
> Any other suggestions?
> Wes
> On 1/6/2012 12:05 PM, Luiz Gustavo Tonello wrote:
> > Hi,
>
> > This servers is on VMware? At the same host?
>
> > SElinux is disable? iptables have something?
>
> > In my environment I had a problem to start GFS2 with servers in
> > differents hosts.
>
> > To clustering servers, was need migrate one server to the same host
> > of the other, and restart this.
>
> > I think, one of the problem was because the virtual switchs.
>
> > To solve, I changed a multicast IP, to use 225.0.0.13 at
> > cluster.conf
>
> > <multicast addr="225.0.0.13"/>
>
> > And add a static route in both, to use default gateway.
>
> > I don't know if it's correct, but this solve my problem.
>
> > I hope that help you.
>
> > Regards.
>
> > On Fri, Jan 6, 2012 at 5:01 PM, Wes Modes < wmodes at ucsc.edu >
> > wrote:
>
> > > Hi, Steven.
> >
>
> > > I've tried just about every possible combination of hostname and
> >
>
> > > cluster.conf.
> >
>
> > > ping to test01 resolves to 128.114.31.112
> >
>
> > > ping to test01.gdao.ucsc.edu resolves to 128.114.31.112
> >
>
> > > It feels like the right thing is being returned. This feels like
> > > it
> >
>
> > > might be a quirk (or bug possibly) of cman or openais.
> >
>
> > > There are some old bug reports around this, for example
> >
>
> > > https://bugzilla.redhat.com/show_bug.cgi?id=488565 . It sounds
> > > like
> > > the
> >
>
> > > way that cman reports this error is anything but straightforward.
> >
>
> > > Is there anyone who has encountered this error and found a
> > > solution?
> >
>
> > > Wes
> >
>
> > > On 1/6/2012 2:00 AM, Steven Whitehouse wrote:
> >
>
> > > > Hi,
> >
>
> > > >
> >
>
> > > > On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote:
> >
>
> > > >> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS
> > > >> systems
> >
>
> > > >> running on vmWare. The GFS FS is on a Dell Equilogic SAN.
> >
>
> > > >>
> >
>
> > > >> I keep running into the same problem despite many
> > > >> differently-flavored
> >
>
> > > >> attempts to set up GFS. The problem comes when I try to start
> > > >> cman, the
> >
>
> > > >> cluster management software.
> >
>
> > > >>
> >
>
> > > >> [root at test01]# service cman start
> >
>
> > > >> Starting cluster:
> >
>
> > > >> Loading modules... done
> >
>
> > > >> Mounting configfs... done
> >
>
> > > >> Starting ccsd... done
> >
>
> > > >> Starting cman... failed
> >
>
> > > >> cman not started: Can't find local node name in cluster.conf
> >
>
> > > >> /usr/sbin/cman_tool: aisexec daemon didn't start
> >
>
> > > >> [FAILED]
> >
>
> > > >>
> >
>
> > > > This looks like what it says... whatever the node name is in
> >
>
> > > > cluster.conf, it doesn't exist when the name is looked up, or
> > > > possibly
> >
>
> > > > it does exist, but is mapped to the loopback address (it needs
> > > > to
> > > > map to
> >
>
> > > > an address which is valid cluster-wide)
> >
>
> > > >
> >
>
> > > > Since your config files look correct, the next thing to check
> > > > is
> > > > what
> >
>
> > > > the resolver is actually returning. Try (for example) a ping to
> > > > test01
> >
>
> > > > (you need to specify exactly the same form of the name as is
> > > > used
> > > > in
> >
>
> > > > cluster.conf) from test02 and see whether it uses the correct
> > > > ip
> >
>
> > > > address, just in case the wrong thing is being returned.
> >
>
> > > >
> >
>
> > > > Steve.
> >
>
> > > >
> >
>
> > > >> [root at test01]# tail /var/log/messages
> >
>
> > > >> Jan 5 13:39:40 testbench06 ccsd[13194]: Unable to connect to
> >
>
> > > >> cluster infrastructure after 1193640 seconds.
> >
>
> > > >> Jan 5 13:40:10 testbench06 ccsd[13194]: Unable to connect to
> >
>
> > > >> cluster infrastructure after 1193670 seconds.
> >
>
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
> > > >> Executive
> >
>
> > > >> Service RELEASE 'subrev 1887 version 0.80.6'
> >
>
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
> > > >> (C)
> >
>
> > > >> 2002-2006 MontaVista Software, Inc and contributors.
> >
>
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
> > > >> (C)
> >
>
> > > >> 2006 Red Hat, Inc.
> >
>
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
> > > >> Executive
> >
>
> > > >> Service: started and ready to provide service.
> >
>
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] local node
> > > >> name
> >
>
> > > >> " test01.gdao.ucsc.edu " not found in cluster.conf
> >
>
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
> > > >> reading
> > > >> CCS
> >
>
> > > >> info, cannot start
> >
>
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
> > > >> reading
> >
>
> > > >> config from CCS
> >
>
> > > >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
> > > >> Executive
> >
>
> > > >> exiting (reason: could not read the main configuration file).
> >
>
> > > >>
> >
>
> > > >> Here are details of my configuration:
> >
>
> > > >>
> >
>
> > > >> [root at test01]# rpm -qa | grep cman
> >
>
> > > >> cman-2.0.115-85.el5_7.2
> >
>
> > > >>
> >
>
> > > >> [root at test01]# echo $HOSTNAME
> >
>
> > > >> test01.gdao.ucsc.edu
> >
>
> > > >>
> >
>
> > > >> [root at test01]# hostname
> >
>
> > > >> test01.gdao.ucsc.edu
> >
>
> > > >>
> >
>
> > > >> [root at test01]# cat /etc/hosts
> >
>
> > > >> # Do not remove the following line, or various programs
> >
>
> > > >> # that require network functionality will fail.
> >
>
> > > >> 128.114.31.112 test01 test01.gdao test01.gdao.ucsc.edu
> >
>
> > > >> 128.114.31.113 test02 test02.gdao test02.gdao.ucsc.edu
> >
>
> > > >> 127.0.0.1 localhost.localdomain localhost
> >
>
> > > >> ::1 localhost6.localdomain6 localhost6
> >
>
> > > >>
> >
>
> > > >> [root at test01]# sestatus
> >
>
> > > >> SELinux status: enabled
> >
>
> > > >> SELinuxfs mount: /selinux
> >
>
> > > >> Current mode: permissive
> >
>
> > > >> Mode from config file: permissive
> >
>
> > > >> Policy version: 21
> >
>
> > > >> Policy from config file: targeted
> >
>
> > > >>
> >
>
> > > >> [root at test01]# cat /etc/cluster/cluster.conf
> >
>
> > > >> <?xml version="1.0"?>
> >
>
> > > >> <cluster config_version="25" name="gdao_cluster">
> >
>
> > > >> <fence_daemon post_fail_delay="0" post_join_delay="120"/>
> >
>
> > > >> <clusternodes>
> >
>
> > > >> <clusternode name="test01" nodeid="1" votes="1">
> >
>
> > > >> <fence>
> >
>
> > > >> <method name="single">
> >
>
> > > >> <device name="gfs_vmware"/>
> >
>
> > > >> </method>
> >
>
> > > >> </fence>
> >
>
> > > >> </clusternode>
> >
>
> > > >> <clusternode name="test02" nodeid="2" votes="1">
> >
>
> > > >> <fence>
> >
>
> > > >> <method name="single">
> >
>
> > > >> <device name="gfs_vmware"/>
> >
>
> > > >> </method>
> >
>
> > > >> </fence>
> >
>
> > > >> </clusternode>
> >
>
> > > >> </clusternodes>
> >
>
> > > >> <cman/>
> >
>
> > > >> <fencedevices>
> >
>
> > > >> <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
> >
>
> > > >> <fencedevice agent="fence_vmware" name="gfs_vmware"
> >
>
> > > >> ipaddr=" gdvcenter.ucsc.edu " login="root"
> > > >> passwd="1hateAmazon.com"
> >
>
> > > >> vmlogin="root" vmpasswd="esxpass"
> >
>
> > > >> port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
> >
>
> > > >> </fencedevices>
> >
>
> > > >> <rm>
> >
>
> > > >> <failoverdomains/>
> >
>
> > > >> </rm>
> >
>
> > > >> </cluster>
> >
>
> > > >>
> >
>
> > > >> I've seen much discussion of this problem, but no definitive
> > > >> solutions.
> >
>
> > > >> Any help you can provide will be welcome.
> >
>
> > > >>
> >
>
> > > >> Wes Modes
> >
>
> > > >>
> >
>
> > > >> --
> >
>
> > > >> Linux-cluster mailing list
> >
>
> > > >> Linux-cluster at redhat.com
> >
>
> > > >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> > > >
> >
>
> > > > --
> >
>
> > > > Linux-cluster mailing list
> >
>
> > > > Linux-cluster at redhat.com
> >
>
> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> > > --
> >
>
> > > Linux-cluster mailing list
> >
>
> > > Linux-cluster at redhat.com
> >
>
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> > --
>
> > Luiz Gustavo P Tonello.
>
> > --
>
> > Linux-cluster mailing list Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120106/891358db/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zimbra_gold_partner.png
Type: image/png
Size: 2893 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120106/891358db/attachment.png>
More information about the Linux-cluster
mailing list