[Linux-cluster] GFS on CentOS - cman unable to start

Mon Jan 9 15:57:14 UTC 2012

Thanks, Kaloyan.  Now we're talking.  This is something I hadn't already
tried yet.  I will try it as soon as I get in.

Wes

On 1/9/2012 3:08 AM, Kaloyan Kovachev wrote:
> Hi,
>  check /etc/sysconfig/cman maybe there is a different name present as
> NODENAME ... remove the file (if present) or try to create one as:
>
> #CMAN_CLUSTER_TIMEOUT=120
> #CMAN_QUORUM_TIMEOUT=0
> #CMAN_SHUTDOWN_TIMEOUT=60
> FENCED_START_TIMEOUT=120
> ##FENCE_JOIN=no
> #LOCK_FILE="/var/lock/subsys/cman"
> CLUSTERNAME=ClusterName
> NODENAME=NodeName
>
>
> On Sun, 08 Jan 2012 20:03:18 -0800, Wes Modes <wmodes at ucsc.edu> wrote:
>> The behavior of cman's resolving of cluster node names is less than
>> clear, as per the RHEL bugzilla report.
>>
>> The hostname and cluster.conf match, as does /etc/hosts and uname -n. 
>> The short names and FQDN ping.  I believe all the node cluster.conf are
>> in sync, and all nodes are accessible to each other using either short
>> or long names.
>>
>> You'll have to trust that I've tried everything obvious, and every
>> possible combination of FQDN and short names in cluster.conf and
>> hostname.  That said, it is totally possible I missed something obvious.
>>
>> I suspect, there is something else going on and I don't know how to get
>> at it.
>>
>> Wes
>>
>>
>> On 1/6/2012 6:06 PM, Kevin Stanton wrote:
>>>> Hi,
>>>> I think CMAN expect that the names of the cluster nodes be the same
>>> returned by the command "uname -n".
>>>
>>>> For what you write your nodes hostnames are: test01.gdao.ucsc.edu
>>> and test02.gdao.ucsc.edu, but in cluster.conf you have declared only
>>> "test01" and "test02".
>>>
>>>  
>>>
>>> I haven't found this to be the case in the past.  I actually use a
>>> separate short name to reference each node which is different than the
>>> hostname of the server itself.  All I've ever had to do is make sure
>>> it resolves correctly.  You can do this either in DNS and/or in
>>> /etc/hosts.  I have found that it's a good idea to do both in case
>>> your DNS server is a virtual machine and is not running for some
>>> reason.  In that case with /etc/hosts you can still start cman.  
>>>
>>>  
>>>
>>> I would make sure whatever node names you use in the cluster.conf will
>>> resolve when you try to ping it from all nodes in the cluster.  Also
>>> make sure your cluster.conf is in sync between all nodes.
>>>
>>>  
>>>
>>> -Kevin
>>>
>>>  
>>>
>>>  
>>>
>>>
> ------------------------------------------------------------------------
>>>     These servers are currently on the same host, but may not be in
>>>     the future.  They are in a vm cluster (though honestly, I'm not
>>>     sure what this means yet).
>>>
>>>     SElinux is on, but disabled.
>>>     Firewalling through iptables is turned off via
>>>     system-config-securitylevel
>>>
>>>     There is no line currently in the cluster.conf that deals with
>>>     multicasting.
>>>
>>>     Any other suggestions?
>>>
>>>     Wes
>>>
>>>     On 1/6/2012 12:05 PM, Luiz Gustavo Tonello wrote:
>>>
>>>     Hi,
>>>
>>>      
>>>
>>>     This servers is on VMware? At the same host?
>>>
>>>     SElinux is disable? iptables have something?
>>>
>>>      
>>>
>>>     In my environment I had a problem to start GFS2 with servers in
>>>     differents hosts.
>>>
>>>     To clustering servers, was need migrate one server to the same
>>>     host of the other, and restart this.
>>>
>>>      
>>>
>>>     I think, one of the problem was because the virtual switchs.
>>>
>>>     To solve, I changed a multicast IP, to use 225.0.0.13 at
>>>     cluster.conf
>>>
>>>       <multicast addr="225.0.0.13"/>
>>>
>>>     And add a static route in both, to use default gateway.
>>>
>>>      
>>>
>>>     I don't know if it's correct, but this solve my problem.
>>>
>>>      
>>>
>>>     I hope that help you.
>>>
>>>      
>>>
>>>     Regards.
>>>
>>>      
>>>
>>>     On Fri, Jan 6, 2012 at 5:01 PM, Wes Modes <wmodes at ucsc.edu
>>>     <mailto:wmodes at ucsc.edu>> wrote:
>>>
>>>     Hi, Steven.
>>>
>>>     I've tried just about every possible combination of hostname and
>>>     cluster.conf.
>>>
>>>     ping to test01 resolves to 128.114.31.112
>>>     ping to test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>>>     resolves to 128.114.31.112
>>>
>>>     It feels like the right thing is being returned.  This feels like
> it
>>>     might be a quirk (or bug possibly) of cman or openais.
>>>
>>>     There are some old bug reports around this, for example
>>>     https://bugzilla.redhat.com/show_bug.cgi?id=488565.  It sounds
>>>     like the
>>>     way that cman reports this error is anything but straightforward.
>>>
>>>     Is there anyone who has encountered this error and found a
> solution?
>>>     Wes
>>>
>>>
>>>
>>>     On 1/6/2012 2:00 AM, Steven Whitehouse wrote:
>>>     > Hi,
>>>     >
>>>     > On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote:
>>>     >> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS
>>>     systems
>>>     >> running on vmWare. The GFS FS is on a Dell Equilogic SAN.
>>>     >>
>>>     >> I keep running into the same problem despite many
>>>     differently-flavored
>>>     >> attempts to set up GFS. The problem comes when I try to start
>>>     cman, the
>>>     >> cluster management software.
>>>     >>
>>>     >>     [root at test01]# service cman start
>>>     >>     Starting cluster:
>>>     >>        Loading modules... done
>>>     >>        Mounting configfs... done
>>>     >>        Starting ccsd... done
>>>     >>        Starting cman... failed
>>>     >>     cman not started: Can't find local node name in cluster.conf
>>>     >> /usr/sbin/cman_tool: aisexec daemon didn't start
>>>     >>                                                              
>>>      [FAILED]
>>>     >>
>>>     > This looks like what it says... whatever the node name is in
>>>     > cluster.conf, it doesn't exist when the name is looked up, or
>>>     possibly
>>>     > it does exist, but is mapped to the loopback address (it needs to
>>>     map to
>>>     > an address which is valid cluster-wide)
>>>     >
>>>     > Since your config files look correct, the next thing to check is
>>>     > what
>>>     > the resolver is actually returning. Try (for example) a ping to
>>>     test01
>>>     > (you need to specify exactly the same form of the name as is used
>>>     > in
>>>     > cluster.conf) from test02 and see whether it uses the correct ip
>>>     > address, just in case the wrong thing is being returned.
>>>     >
>>>     > Steve.
>>>     >
>>>     >>     [root at test01]# tail /var/log/messages
>>>     >>     Jan  5 13:39:40 testbench06 ccsd[13194]: Unable to connect
> to
>>>     >> cluster infrastructure after 1193640 seconds.
>>>     >>     Jan  5 13:40:10 testbench06 ccsd[13194]: Unable to connect
> to
>>>     >> cluster infrastructure after 1193670 seconds.
>>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>>>     >>     Executive
>>>     >> Service RELEASE 'subrev 1887 version 0.80.6'
>>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
>>>     >>     (C)
>>>     >> 2002-2006 MontaVista Software, Inc and contributors.
>>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
>>>     >>     (C)
>>>     >> 2006 Red Hat, Inc.
>>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>>>     >>     Executive
>>>     >> Service: started and ready to provide service.
>>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] local
>>>     node name
>>>     >> "test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>" not found
>>>     in cluster.conf
>>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
>>>     reading CCS
>>>     >> info, cannot start
>>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
>>>     >>     reading
>>>     >> config from CCS
>>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>>>     >>     Executive
>>>     >> exiting (reason: could not read the main configuration file).
>>>     >>
>>>     >> Here are details of my configuration:
>>>     >>
>>>     >>     [root at test01]# rpm -qa | grep cman
>>>     >>     cman-2.0.115-85.el5_7.2
>>>     >>
>>>     >>     [root at test01]# echo $HOSTNAME
>>>     >>     test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>>>     >>
>>>     >>     [root at test01]# hostname
>>>     >>     test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>>>     >>
>>>     >>     [root at test01]# cat /etc/hosts
>>>     >>     # Do not remove the following line, or various programs
>>>     >>     # that require network functionality will fail.
>>>     >>     128.114.31.112      test01 test01.gdao test01.gdao.ucsc.edu
>>>     <http://test01.gdao.ucsc.edu>
>>>     >>     128.114.31.113      test02 test02.gdao test02.gdao.ucsc.edu
>>>     <http://test02.gdao.ucsc.edu>
>>>     >>     127.0.0.1               localhost.localdomain localhost
>>>     >>     ::1             localhost6.localdomain6 localhost6
>>>     >>
>>>     >>     [root at test01]# sestatus
>>>     >>     SELinux status:                 enabled
>>>     >>     SELinuxfs mount:                /selinux
>>>     >>     Current mode:                   permissive
>>>     >>     Mode from config file:          permissive
>>>     >>     Policy version:                 21
>>>     >>     Policy from config file:        targeted
>>>     >>
>>>     >>     [root at test01]# cat /etc/cluster/cluster.conf
>>>     >>     <?xml version="1.0"?>
>>>     >>     <cluster config_version="25" name="gdao_cluster">
>>>     >>         <fence_daemon post_fail_delay="0"
> post_join_delay="120"/>
>>>     >>         <clusternodes>
>>>     >>             <clusternode name="test01" nodeid="1" votes="1">
>>>     >>                 <fence>
>>>     >>                     <method name="single">
>>>     >>                         <device name="gfs_vmware"/>
>>>     >>                     </method>
>>>     >>                 </fence>
>>>     >>             </clusternode>
>>>     >>             <clusternode name="test02" nodeid="2" votes="1">
>>>     >>                 <fence>
>>>     >>                     <method name="single">
>>>     >>                         <device name="gfs_vmware"/>
>>>     >>                     </method>
>>>     >>                 </fence>
>>>     >>             </clusternode>
>>>     >>         </clusternodes>
>>>     >>         <cman/>
>>>     >>         <fencedevices>
>>>     >>             <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
>>>     >>             <fencedevice agent="fence_vmware" name="gfs_vmware"
>>>     >> ipaddr="gdvcenter.ucsc.edu <http://gdvcenter.ucsc.edu>"
>>>     login="root" passwd="1hateAmazon.com"
>>>     >> vmlogin="root" vmpasswd="esxpass"
>>>     >>
>>>    
> port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
>>>     >>         </fencedevices>
>>>     >>         <rm>
>>>     >>         <failoverdomains/>
>>>     >>         </rm>
>>>     >>     </cluster>
>>>     >>
>>>     >> I've seen much discussion of this problem, but no definitive
>>>     solutions.
>>>     >> Any help you can provide will be welcome.
>>>     >>
>>>     >> Wes Modes
>>>     >>
>>>     >> --
>>>     >> Linux-cluster mailing list
>>>     >> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>>     >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>     >
>>>     > --
>>>     > Linux-cluster mailing list
>>>     > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>>     > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>     --
>>>     Linux-cluster mailing list
>>>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>>     https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>>
>>>      
>>>
>>>     -- 
>>>     Luiz Gustavo P Tonello.
>>>
>>>
>>>
>>>     --
>>>
>>>     Linux-cluster mailing list
>>>
>>>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>>
>>>     https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>>     --
>>>     Linux-cluster mailing list
>>>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>>     https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>  
>>>
>>>  
>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster