[Linux-cluster] GFS on CentOS - cman unable to start
Wes Modes
wmodes at ucsc.edu
Mon Jan 9 15:57:14 UTC 2012
Thanks, Kaloyan. Now we're talking. This is something I hadn't already
tried yet. I will try it as soon as I get in.
Wes
On 1/9/2012 3:08 AM, Kaloyan Kovachev wrote:
> Hi,
> check /etc/sysconfig/cman maybe there is a different name present as
> NODENAME ... remove the file (if present) or try to create one as:
>
> #CMAN_CLUSTER_TIMEOUT=120
> #CMAN_QUORUM_TIMEOUT=0
> #CMAN_SHUTDOWN_TIMEOUT=60
> FENCED_START_TIMEOUT=120
> ##FENCE_JOIN=no
> #LOCK_FILE="/var/lock/subsys/cman"
> CLUSTERNAME=ClusterName
> NODENAME=NodeName
>
>
> On Sun, 08 Jan 2012 20:03:18 -0800, Wes Modes <wmodes at ucsc.edu> wrote:
>> The behavior of cman's resolving of cluster node names is less than
>> clear, as per the RHEL bugzilla report.
>>
>> The hostname and cluster.conf match, as does /etc/hosts and uname -n.
>> The short names and FQDN ping. I believe all the node cluster.conf are
>> in sync, and all nodes are accessible to each other using either short
>> or long names.
>>
>> You'll have to trust that I've tried everything obvious, and every
>> possible combination of FQDN and short names in cluster.conf and
>> hostname. That said, it is totally possible I missed something obvious.
>>
>> I suspect, there is something else going on and I don't know how to get
>> at it.
>>
>> Wes
>>
>>
>> On 1/6/2012 6:06 PM, Kevin Stanton wrote:
>>>> Hi,
>>>> I think CMAN expect that the names of the cluster nodes be the same
>>> returned by the command "uname -n".
>>>
>>>> For what you write your nodes hostnames are: test01.gdao.ucsc.edu
>>> and test02.gdao.ucsc.edu, but in cluster.conf you have declared only
>>> "test01" and "test02".
>>>
>>>
>>>
>>> I haven't found this to be the case in the past. I actually use a
>>> separate short name to reference each node which is different than the
>>> hostname of the server itself. All I've ever had to do is make sure
>>> it resolves correctly. You can do this either in DNS and/or in
>>> /etc/hosts. I have found that it's a good idea to do both in case
>>> your DNS server is a virtual machine and is not running for some
>>> reason. In that case with /etc/hosts you can still start cman.
>>>
>>>
>>>
>>> I would make sure whatever node names you use in the cluster.conf will
>>> resolve when you try to ping it from all nodes in the cluster. Also
>>> make sure your cluster.conf is in sync between all nodes.
>>>
>>>
>>>
>>> -Kevin
>>>
>>>
>>>
>>>
>>>
>>>
> ------------------------------------------------------------------------
>>> These servers are currently on the same host, but may not be in
>>> the future. They are in a vm cluster (though honestly, I'm not
>>> sure what this means yet).
>>>
>>> SElinux is on, but disabled.
>>> Firewalling through iptables is turned off via
>>> system-config-securitylevel
>>>
>>> There is no line currently in the cluster.conf that deals with
>>> multicasting.
>>>
>>> Any other suggestions?
>>>
>>> Wes
>>>
>>> On 1/6/2012 12:05 PM, Luiz Gustavo Tonello wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> This servers is on VMware? At the same host?
>>>
>>> SElinux is disable? iptables have something?
>>>
>>>
>>>
>>> In my environment I had a problem to start GFS2 with servers in
>>> differents hosts.
>>>
>>> To clustering servers, was need migrate one server to the same
>>> host of the other, and restart this.
>>>
>>>
>>>
>>> I think, one of the problem was because the virtual switchs.
>>>
>>> To solve, I changed a multicast IP, to use 225.0.0.13 at
>>> cluster.conf
>>>
>>> <multicast addr="225.0.0.13"/>
>>>
>>> And add a static route in both, to use default gateway.
>>>
>>>
>>>
>>> I don't know if it's correct, but this solve my problem.
>>>
>>>
>>>
>>> I hope that help you.
>>>
>>>
>>>
>>> Regards.
>>>
>>>
>>>
>>> On Fri, Jan 6, 2012 at 5:01 PM, Wes Modes <wmodes at ucsc.edu
>>> <mailto:wmodes at ucsc.edu>> wrote:
>>>
>>> Hi, Steven.
>>>
>>> I've tried just about every possible combination of hostname and
>>> cluster.conf.
>>>
>>> ping to test01 resolves to 128.114.31.112
>>> ping to test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>>> resolves to 128.114.31.112
>>>
>>> It feels like the right thing is being returned. This feels like
> it
>>> might be a quirk (or bug possibly) of cman or openais.
>>>
>>> There are some old bug reports around this, for example
>>> https://bugzilla.redhat.com/show_bug.cgi?id=488565. It sounds
>>> like the
>>> way that cman reports this error is anything but straightforward.
>>>
>>> Is there anyone who has encountered this error and found a
> solution?
>>> Wes
>>>
>>>
>>>
>>> On 1/6/2012 2:00 AM, Steven Whitehouse wrote:
>>> > Hi,
>>> >
>>> > On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote:
>>> >> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS
>>> systems
>>> >> running on vmWare. The GFS FS is on a Dell Equilogic SAN.
>>> >>
>>> >> I keep running into the same problem despite many
>>> differently-flavored
>>> >> attempts to set up GFS. The problem comes when I try to start
>>> cman, the
>>> >> cluster management software.
>>> >>
>>> >> [root at test01]# service cman start
>>> >> Starting cluster:
>>> >> Loading modules... done
>>> >> Mounting configfs... done
>>> >> Starting ccsd... done
>>> >> Starting cman... failed
>>> >> cman not started: Can't find local node name in cluster.conf
>>> >> /usr/sbin/cman_tool: aisexec daemon didn't start
>>> >>
>>> [FAILED]
>>> >>
>>> > This looks like what it says... whatever the node name is in
>>> > cluster.conf, it doesn't exist when the name is looked up, or
>>> possibly
>>> > it does exist, but is mapped to the loopback address (it needs to
>>> map to
>>> > an address which is valid cluster-wide)
>>> >
>>> > Since your config files look correct, the next thing to check is
>>> > what
>>> > the resolver is actually returning. Try (for example) a ping to
>>> test01
>>> > (you need to specify exactly the same form of the name as is used
>>> > in
>>> > cluster.conf) from test02 and see whether it uses the correct ip
>>> > address, just in case the wrong thing is being returned.
>>> >
>>> > Steve.
>>> >
>>> >> [root at test01]# tail /var/log/messages
>>> >> Jan 5 13:39:40 testbench06 ccsd[13194]: Unable to connect
> to
>>> >> cluster infrastructure after 1193640 seconds.
>>> >> Jan 5 13:40:10 testbench06 ccsd[13194]: Unable to connect
> to
>>> >> cluster infrastructure after 1193670 seconds.
>>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>>> >> Executive
>>> >> Service RELEASE 'subrev 1887 version 0.80.6'
>>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
>>> >> (C)
>>> >> 2002-2006 MontaVista Software, Inc and contributors.
>>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
>>> >> (C)
>>> >> 2006 Red Hat, Inc.
>>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>>> >> Executive
>>> >> Service: started and ready to provide service.
>>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] local
>>> node name
>>> >> "test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>" not found
>>> in cluster.conf
>>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
>>> reading CCS
>>> >> info, cannot start
>>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
>>> >> reading
>>> >> config from CCS
>>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>>> >> Executive
>>> >> exiting (reason: could not read the main configuration file).
>>> >>
>>> >> Here are details of my configuration:
>>> >>
>>> >> [root at test01]# rpm -qa | grep cman
>>> >> cman-2.0.115-85.el5_7.2
>>> >>
>>> >> [root at test01]# echo $HOSTNAME
>>> >> test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>>> >>
>>> >> [root at test01]# hostname
>>> >> test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>>> >>
>>> >> [root at test01]# cat /etc/hosts
>>> >> # Do not remove the following line, or various programs
>>> >> # that require network functionality will fail.
>>> >> 128.114.31.112 test01 test01.gdao test01.gdao.ucsc.edu
>>> <http://test01.gdao.ucsc.edu>
>>> >> 128.114.31.113 test02 test02.gdao test02.gdao.ucsc.edu
>>> <http://test02.gdao.ucsc.edu>
>>> >> 127.0.0.1 localhost.localdomain localhost
>>> >> ::1 localhost6.localdomain6 localhost6
>>> >>
>>> >> [root at test01]# sestatus
>>> >> SELinux status: enabled
>>> >> SELinuxfs mount: /selinux
>>> >> Current mode: permissive
>>> >> Mode from config file: permissive
>>> >> Policy version: 21
>>> >> Policy from config file: targeted
>>> >>
>>> >> [root at test01]# cat /etc/cluster/cluster.conf
>>> >> <?xml version="1.0"?>
>>> >> <cluster config_version="25" name="gdao_cluster">
>>> >> <fence_daemon post_fail_delay="0"
> post_join_delay="120"/>
>>> >> <clusternodes>
>>> >> <clusternode name="test01" nodeid="1" votes="1">
>>> >> <fence>
>>> >> <method name="single">
>>> >> <device name="gfs_vmware"/>
>>> >> </method>
>>> >> </fence>
>>> >> </clusternode>
>>> >> <clusternode name="test02" nodeid="2" votes="1">
>>> >> <fence>
>>> >> <method name="single">
>>> >> <device name="gfs_vmware"/>
>>> >> </method>
>>> >> </fence>
>>> >> </clusternode>
>>> >> </clusternodes>
>>> >> <cman/>
>>> >> <fencedevices>
>>> >> <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
>>> >> <fencedevice agent="fence_vmware" name="gfs_vmware"
>>> >> ipaddr="gdvcenter.ucsc.edu <http://gdvcenter.ucsc.edu>"
>>> login="root" passwd="1hateAmazon.com"
>>> >> vmlogin="root" vmpasswd="esxpass"
>>> >>
>>>
> port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
>>> >> </fencedevices>
>>> >> <rm>
>>> >> <failoverdomains/>
>>> >> </rm>
>>> >> </cluster>
>>> >>
>>> >> I've seen much discussion of this problem, but no definitive
>>> solutions.
>>> >> Any help you can provide will be welcome.
>>> >>
>>> >> Wes Modes
>>> >>
>>> >> --
>>> >> Linux-cluster mailing list
>>> >> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >
>>> > --
>>> > Linux-cluster mailing list
>>> > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Luiz Gustavo P Tonello.
>>>
>>>
>>>
>>> --
>>>
>>> Linux-cluster mailing list
>>>
>>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>>
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list