[Linux-cluster] GFS on CentOS - cman unable to start
Kaloyan Kovachev
kkovachev at varna.net
Mon Jan 9 11:08:25 UTC 2012
Hi,
check /etc/sysconfig/cman maybe there is a different name present as
NODENAME ... remove the file (if present) or try to create one as:
#CMAN_CLUSTER_TIMEOUT=120
#CMAN_QUORUM_TIMEOUT=0
#CMAN_SHUTDOWN_TIMEOUT=60
FENCED_START_TIMEOUT=120
##FENCE_JOIN=no
#LOCK_FILE="/var/lock/subsys/cman"
CLUSTERNAME=ClusterName
NODENAME=NodeName
On Sun, 08 Jan 2012 20:03:18 -0800, Wes Modes <wmodes at ucsc.edu> wrote:
> The behavior of cman's resolving of cluster node names is less than
> clear, as per the RHEL bugzilla report.
>
> The hostname and cluster.conf match, as does /etc/hosts and uname -n.
> The short names and FQDN ping. I believe all the node cluster.conf are
> in sync, and all nodes are accessible to each other using either short
> or long names.
>
> You'll have to trust that I've tried everything obvious, and every
> possible combination of FQDN and short names in cluster.conf and
> hostname. That said, it is totally possible I missed something obvious.
>
> I suspect, there is something else going on and I don't know how to get
> at it.
>
> Wes
>
>
> On 1/6/2012 6:06 PM, Kevin Stanton wrote:
>>
>> > Hi,
>>
>> > I think CMAN expect that the names of the cluster nodes be the same
>> returned by the command "uname -n".
>>
>> > For what you write your nodes hostnames are: test01.gdao.ucsc.edu
>> and test02.gdao.ucsc.edu, but in cluster.conf you have declared only
>> "test01" and "test02".
>>
>>
>>
>> I haven't found this to be the case in the past. I actually use a
>> separate short name to reference each node which is different than the
>> hostname of the server itself. All I've ever had to do is make sure
>> it resolves correctly. You can do this either in DNS and/or in
>> /etc/hosts. I have found that it's a good idea to do both in case
>> your DNS server is a virtual machine and is not running for some
>> reason. In that case with /etc/hosts you can still start cman.
>>
>>
>>
>> I would make sure whatever node names you use in the cluster.conf will
>> resolve when you try to ping it from all nodes in the cluster. Also
>> make sure your cluster.conf is in sync between all nodes.
>>
>>
>>
>> -Kevin
>>
>>
>>
>>
>>
>>
------------------------------------------------------------------------
>>
>> These servers are currently on the same host, but may not be in
>> the future. They are in a vm cluster (though honestly, I'm not
>> sure what this means yet).
>>
>> SElinux is on, but disabled.
>> Firewalling through iptables is turned off via
>> system-config-securitylevel
>>
>> There is no line currently in the cluster.conf that deals with
>> multicasting.
>>
>> Any other suggestions?
>>
>> Wes
>>
>> On 1/6/2012 12:05 PM, Luiz Gustavo Tonello wrote:
>>
>> Hi,
>>
>>
>>
>> This servers is on VMware? At the same host?
>>
>> SElinux is disable? iptables have something?
>>
>>
>>
>> In my environment I had a problem to start GFS2 with servers in
>> differents hosts.
>>
>> To clustering servers, was need migrate one server to the same
>> host of the other, and restart this.
>>
>>
>>
>> I think, one of the problem was because the virtual switchs.
>>
>> To solve, I changed a multicast IP, to use 225.0.0.13 at
>> cluster.conf
>>
>> <multicast addr="225.0.0.13"/>
>>
>> And add a static route in both, to use default gateway.
>>
>>
>>
>> I don't know if it's correct, but this solve my problem.
>>
>>
>>
>> I hope that help you.
>>
>>
>>
>> Regards.
>>
>>
>>
>> On Fri, Jan 6, 2012 at 5:01 PM, Wes Modes <wmodes at ucsc.edu
>> <mailto:wmodes at ucsc.edu>> wrote:
>>
>> Hi, Steven.
>>
>> I've tried just about every possible combination of hostname and
>> cluster.conf.
>>
>> ping to test01 resolves to 128.114.31.112
>> ping to test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>> resolves to 128.114.31.112
>>
>> It feels like the right thing is being returned. This feels like
it
>> might be a quirk (or bug possibly) of cman or openais.
>>
>> There are some old bug reports around this, for example
>> https://bugzilla.redhat.com/show_bug.cgi?id=488565. It sounds
>> like the
>> way that cman reports this error is anything but straightforward.
>>
>> Is there anyone who has encountered this error and found a
solution?
>>
>> Wes
>>
>>
>>
>> On 1/6/2012 2:00 AM, Steven Whitehouse wrote:
>> > Hi,
>> >
>> > On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote:
>> >> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS
>> systems
>> >> running on vmWare. The GFS FS is on a Dell Equilogic SAN.
>> >>
>> >> I keep running into the same problem despite many
>> differently-flavored
>> >> attempts to set up GFS. The problem comes when I try to start
>> cman, the
>> >> cluster management software.
>> >>
>> >> [root at test01]# service cman start
>> >> Starting cluster:
>> >> Loading modules... done
>> >> Mounting configfs... done
>> >> Starting ccsd... done
>> >> Starting cman... failed
>> >> cman not started: Can't find local node name in cluster.conf
>> >> /usr/sbin/cman_tool: aisexec daemon didn't start
>> >>
>> [FAILED]
>> >>
>> > This looks like what it says... whatever the node name is in
>> > cluster.conf, it doesn't exist when the name is looked up, or
>> possibly
>> > it does exist, but is mapped to the loopback address (it needs to
>> map to
>> > an address which is valid cluster-wide)
>> >
>> > Since your config files look correct, the next thing to check is
>> > what
>> > the resolver is actually returning. Try (for example) a ping to
>> test01
>> > (you need to specify exactly the same form of the name as is used
>> > in
>> > cluster.conf) from test02 and see whether it uses the correct ip
>> > address, just in case the wrong thing is being returned.
>> >
>> > Steve.
>> >
>> >> [root at test01]# tail /var/log/messages
>> >> Jan 5 13:39:40 testbench06 ccsd[13194]: Unable to connect
to
>> >> cluster infrastructure after 1193640 seconds.
>> >> Jan 5 13:40:10 testbench06 ccsd[13194]: Unable to connect
to
>> >> cluster infrastructure after 1193670 seconds.
>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>> >> Executive
>> >> Service RELEASE 'subrev 1887 version 0.80.6'
>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
>> >> (C)
>> >> 2002-2006 MontaVista Software, Inc and contributors.
>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
>> >> (C)
>> >> 2006 Red Hat, Inc.
>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>> >> Executive
>> >> Service: started and ready to provide service.
>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] local
>> node name
>> >> "test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>" not found
>> in cluster.conf
>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
>> reading CCS
>> >> info, cannot start
>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
>> >> reading
>> >> config from CCS
>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>> >> Executive
>> >> exiting (reason: could not read the main configuration file).
>> >>
>> >> Here are details of my configuration:
>> >>
>> >> [root at test01]# rpm -qa | grep cman
>> >> cman-2.0.115-85.el5_7.2
>> >>
>> >> [root at test01]# echo $HOSTNAME
>> >> test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>> >>
>> >> [root at test01]# hostname
>> >> test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>> >>
>> >> [root at test01]# cat /etc/hosts
>> >> # Do not remove the following line, or various programs
>> >> # that require network functionality will fail.
>> >> 128.114.31.112 test01 test01.gdao test01.gdao.ucsc.edu
>> <http://test01.gdao.ucsc.edu>
>> >> 128.114.31.113 test02 test02.gdao test02.gdao.ucsc.edu
>> <http://test02.gdao.ucsc.edu>
>> >> 127.0.0.1 localhost.localdomain localhost
>> >> ::1 localhost6.localdomain6 localhost6
>> >>
>> >> [root at test01]# sestatus
>> >> SELinux status: enabled
>> >> SELinuxfs mount: /selinux
>> >> Current mode: permissive
>> >> Mode from config file: permissive
>> >> Policy version: 21
>> >> Policy from config file: targeted
>> >>
>> >> [root at test01]# cat /etc/cluster/cluster.conf
>> >> <?xml version="1.0"?>
>> >> <cluster config_version="25" name="gdao_cluster">
>> >> <fence_daemon post_fail_delay="0"
post_join_delay="120"/>
>> >> <clusternodes>
>> >> <clusternode name="test01" nodeid="1" votes="1">
>> >> <fence>
>> >> <method name="single">
>> >> <device name="gfs_vmware"/>
>> >> </method>
>> >> </fence>
>> >> </clusternode>
>> >> <clusternode name="test02" nodeid="2" votes="1">
>> >> <fence>
>> >> <method name="single">
>> >> <device name="gfs_vmware"/>
>> >> </method>
>> >> </fence>
>> >> </clusternode>
>> >> </clusternodes>
>> >> <cman/>
>> >> <fencedevices>
>> >> <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
>> >> <fencedevice agent="fence_vmware" name="gfs_vmware"
>> >> ipaddr="gdvcenter.ucsc.edu <http://gdvcenter.ucsc.edu>"
>> login="root" passwd="1hateAmazon.com"
>> >> vmlogin="root" vmpasswd="esxpass"
>> >>
>>
port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
>> >> </fencedevices>
>> >> <rm>
>> >> <failoverdomains/>
>> >> </rm>
>> >> </cluster>
>> >>
>> >> I've seen much discussion of this problem, but no definitive
>> solutions.
>> >> Any help you can provide will be welcome.
>> >>
>> >> Wes Modes
>> >>
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>>
>>
>> --
>> Luiz Gustavo P Tonello.
>>
>>
>>
>> --
>>
>> Linux-cluster mailing list
>>
>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>>
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list