[Linux-cluster] GFS on CentOS - cman unable to start

Mon Jan 9 04:03:18 UTC 2012

The behavior of cman's resolving of cluster node names is less than
clear, as per the RHEL bugzilla report.

The hostname and cluster.conf match, as does /etc/hosts and uname -n. 
The short names and FQDN ping.  I believe all the node cluster.conf are
in sync, and all nodes are accessible to each other using either short
or long names.

You'll have to trust that I've tried everything obvious, and every
possible combination of FQDN and short names in cluster.conf and
hostname.  That said, it is totally possible I missed something obvious.

I suspect, there is something else going on and I don't know how to get
at it.

Wes

On 1/6/2012 6:06 PM, Kevin Stanton wrote:
>
> > Hi,
>
> > I think CMAN expect that the names of the cluster nodes be the same
> returned by the command "uname -n".
>
> > For what you write your nodes hostnames are: test01.gdao.ucsc.edu
> and test02.gdao.ucsc.edu, but in cluster.conf you have declared only
> "test01" and "test02".
>
>  
>
> I haven't found this to be the case in the past.  I actually use a
> separate short name to reference each node which is different than the
> hostname of the server itself.  All I've ever had to do is make sure
> it resolves correctly.  You can do this either in DNS and/or in
> /etc/hosts.  I have found that it's a good idea to do both in case
> your DNS server is a virtual machine and is not running for some
> reason.  In that case with /etc/hosts you can still start cman.  
>
>  
>
> I would make sure whatever node names you use in the cluster.conf will
> resolve when you try to ping it from all nodes in the cluster.  Also
> make sure your cluster.conf is in sync between all nodes.
>
>  
>
> -Kevin
>
>  
>
>  
>
> ------------------------------------------------------------------------
>
>     These servers are currently on the same host, but may not be in
>     the future.  They are in a vm cluster (though honestly, I'm not
>     sure what this means yet).
>
>     SElinux is on, but disabled.
>     Firewalling through iptables is turned off via
>     system-config-securitylevel
>
>     There is no line currently in the cluster.conf that deals with
>     multicasting.
>
>     Any other suggestions?
>
>     Wes
>
>     On 1/6/2012 12:05 PM, Luiz Gustavo Tonello wrote:
>
>     Hi,
>
>      
>
>     This servers is on VMware? At the same host?
>
>     SElinux is disable? iptables have something?
>
>      
>
>     In my environment I had a problem to start GFS2 with servers in
>     differents hosts.
>
>     To clustering servers, was need migrate one server to the same
>     host of the other, and restart this.
>
>      
>
>     I think, one of the problem was because the virtual switchs.
>
>     To solve, I changed a multicast IP, to use 225.0.0.13 at cluster.conf 
>
>       <multicast addr="225.0.0.13"/>
>
>     And add a static route in both, to use default gateway.
>
>      
>
>     I don't know if it's correct, but this solve my problem.
>
>      
>
>     I hope that help you.
>
>      
>
>     Regards.
>
>      
>
>     On Fri, Jan 6, 2012 at 5:01 PM, Wes Modes <wmodes at ucsc.edu
>     <mailto:wmodes at ucsc.edu>> wrote:
>
>     Hi, Steven.
>
>     I've tried just about every possible combination of hostname and
>     cluster.conf.
>
>     ping to test01 resolves to 128.114.31.112
>     ping to test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>     resolves to 128.114.31.112
>
>     It feels like the right thing is being returned.  This feels like it
>     might be a quirk (or bug possibly) of cman or openais.
>
>     There are some old bug reports around this, for example
>     https://bugzilla.redhat.com/show_bug.cgi?id=488565.  It sounds
>     like the
>     way that cman reports this error is anything but straightforward.
>
>     Is there anyone who has encountered this error and found a solution?
>
>     Wes
>
>
>
>     On 1/6/2012 2:00 AM, Steven Whitehouse wrote:
>     > Hi,
>     >
>     > On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote:
>     >> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS
>     systems
>     >> running on vmWare. The GFS FS is on a Dell Equilogic SAN.
>     >>
>     >> I keep running into the same problem despite many
>     differently-flavored
>     >> attempts to set up GFS. The problem comes when I try to start
>     cman, the
>     >> cluster management software.
>     >>
>     >>     [root at test01]# service cman start
>     >>     Starting cluster:
>     >>        Loading modules... done
>     >>        Mounting configfs... done
>     >>        Starting ccsd... done
>     >>        Starting cman... failed
>     >>     cman not started: Can't find local node name in cluster.conf
>     >> /usr/sbin/cman_tool: aisexec daemon didn't start
>     >>                                                              
>      [FAILED]
>     >>
>     > This looks like what it says... whatever the node name is in
>     > cluster.conf, it doesn't exist when the name is looked up, or
>     possibly
>     > it does exist, but is mapped to the loopback address (it needs to
>     map to
>     > an address which is valid cluster-wide)
>     >
>     > Since your config files look correct, the next thing to check is what
>     > the resolver is actually returning. Try (for example) a ping to
>     test01
>     > (you need to specify exactly the same form of the name as is used in
>     > cluster.conf) from test02 and see whether it uses the correct ip
>     > address, just in case the wrong thing is being returned.
>     >
>     > Steve.
>     >
>     >>     [root at test01]# tail /var/log/messages
>     >>     Jan  5 13:39:40 testbench06 ccsd[13194]: Unable to connect to
>     >> cluster infrastructure after 1193640 seconds.
>     >>     Jan  5 13:40:10 testbench06 ccsd[13194]: Unable to connect to
>     >> cluster infrastructure after 1193670 seconds.
>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
>     >> Service RELEASE 'subrev 1887 version 0.80.6'
>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright (C)
>     >> 2002-2006 MontaVista Software, Inc and contributors.
>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright (C)
>     >> 2006 Red Hat, Inc.
>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
>     >> Service: started and ready to provide service.
>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] local
>     node name
>     >> "test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>" not found
>     in cluster.conf
>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
>     reading CCS
>     >> info, cannot start
>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error reading
>     >> config from CCS
>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
>     >> exiting (reason: could not read the main configuration file).
>     >>
>     >> Here are details of my configuration:
>     >>
>     >>     [root at test01]# rpm -qa | grep cman
>     >>     cman-2.0.115-85.el5_7.2
>     >>
>     >>     [root at test01]# echo $HOSTNAME
>     >>     test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>     >>
>     >>     [root at test01]# hostname
>     >>     test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>     >>
>     >>     [root at test01]# cat /etc/hosts
>     >>     # Do not remove the following line, or various programs
>     >>     # that require network functionality will fail.
>     >>     128.114.31.112      test01 test01.gdao test01.gdao.ucsc.edu
>     <http://test01.gdao.ucsc.edu>
>     >>     128.114.31.113      test02 test02.gdao test02.gdao.ucsc.edu
>     <http://test02.gdao.ucsc.edu>
>     >>     127.0.0.1               localhost.localdomain localhost
>     >>     ::1             localhost6.localdomain6 localhost6
>     >>
>     >>     [root at test01]# sestatus
>     >>     SELinux status:                 enabled
>     >>     SELinuxfs mount:                /selinux
>     >>     Current mode:                   permissive
>     >>     Mode from config file:          permissive
>     >>     Policy version:                 21
>     >>     Policy from config file:        targeted
>     >>
>     >>     [root at test01]# cat /etc/cluster/cluster.conf
>     >>     <?xml version="1.0"?>
>     >>     <cluster config_version="25" name="gdao_cluster">
>     >>         <fence_daemon post_fail_delay="0" post_join_delay="120"/>
>     >>         <clusternodes>
>     >>             <clusternode name="test01" nodeid="1" votes="1">
>     >>                 <fence>
>     >>                     <method name="single">
>     >>                         <device name="gfs_vmware"/>
>     >>                     </method>
>     >>                 </fence>
>     >>             </clusternode>
>     >>             <clusternode name="test02" nodeid="2" votes="1">
>     >>                 <fence>
>     >>                     <method name="single">
>     >>                         <device name="gfs_vmware"/>
>     >>                     </method>
>     >>                 </fence>
>     >>             </clusternode>
>     >>         </clusternodes>
>     >>         <cman/>
>     >>         <fencedevices>
>     >>             <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
>     >>             <fencedevice agent="fence_vmware" name="gfs_vmware"
>     >> ipaddr="gdvcenter.ucsc.edu <http://gdvcenter.ucsc.edu>"
>     login="root" passwd="1hateAmazon.com"
>     >> vmlogin="root" vmpasswd="esxpass"
>     >>
>     port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
>     >>         </fencedevices>
>     >>         <rm>
>     >>         <failoverdomains/>
>     >>         </rm>
>     >>     </cluster>
>     >>
>     >> I've seen much discussion of this problem, but no definitive
>     solutions.
>     >> Any help you can provide will be welcome.
>     >>
>     >> Wes Modes
>     >>
>     >> --
>     >> Linux-cluster mailing list
>     >> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     >> https://www.redhat.com/mailman/listinfo/linux-cluster
>     >
>     > --
>     > Linux-cluster mailing list
>     > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>      
>
>     -- 
>     Luiz Gustavo P Tonello.
>
>
>
>     --
>
>     Linux-cluster mailing list
>
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>  
>
>  
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120108/707d1029/attachment.htm>