[Linux-cluster] GFS on CentOS - cman unable to start

Steven Whitehouse swhiteho at redhat.com
Fri Jan 6 10:00:29 UTC 2012


Hi,

On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote:
> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS systems
> running on vmWare. The GFS FS is on a Dell Equilogic SAN.
> 
> I keep running into the same problem despite many differently-flavored
> attempts to set up GFS. The problem comes when I try to start cman, the
> cluster management software.
> 
>     [root at test01]# service cman start
>     Starting cluster:
>        Loading modules... done
>        Mounting configfs... done
>        Starting ccsd... done
>        Starting cman... failed
>     cman not started: Can't find local node name in cluster.conf
> /usr/sbin/cman_tool: aisexec daemon didn't start
>                                                                [FAILED]
> 
This looks like what it says... whatever the node name is in
cluster.conf, it doesn't exist when the name is looked up, or possibly
it does exist, but is mapped to the loopback address (it needs to map to
an address which is valid cluster-wide)

Since your config files look correct, the next thing to check is what
the resolver is actually returning. Try (for example) a ping to test01
(you need to specify exactly the same form of the name as is used in
cluster.conf) from test02 and see whether it uses the correct ip
address, just in case the wrong thing is being returned.

Steve.

>     [root at test01]# tail /var/log/messages
>     Jan  5 13:39:40 testbench06 ccsd[13194]: Unable to connect to
> cluster infrastructure after 1193640 seconds.
>     Jan  5 13:40:10 testbench06 ccsd[13194]: Unable to connect to
> cluster infrastructure after 1193670 seconds.
>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
> Service RELEASE 'subrev 1887 version 0.80.6'
>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright (C)
> 2002-2006 MontaVista Software, Inc and contributors.
>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright (C)
> 2006 Red Hat, Inc.
>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
> Service: started and ready to provide service.
>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] local node name
> "test01.gdao.ucsc.edu" not found in cluster.conf
>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error reading CCS
> info, cannot start
>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error reading
> config from CCS
>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
> exiting (reason: could not read the main configuration file).
> 
> Here are details of my configuration:
> 
>     [root at test01]# rpm -qa | grep cman
>     cman-2.0.115-85.el5_7.2
> 
>     [root at test01]# echo $HOSTNAME
>     test01.gdao.ucsc.edu
> 
>     [root at test01]# hostname
>     test01.gdao.ucsc.edu
> 
>     [root at test01]# cat /etc/hosts
>     # Do not remove the following line, or various programs
>     # that require network functionality will fail.
>     128.114.31.112      test01 test01.gdao test01.gdao.ucsc.edu
>     128.114.31.113      test02 test02.gdao test02.gdao.ucsc.edu
>     127.0.0.1               localhost.localdomain localhost
>     ::1             localhost6.localdomain6 localhost6
> 
>     [root at test01]# sestatus
>     SELinux status:                 enabled
>     SELinuxfs mount:                /selinux
>     Current mode:                   permissive
>     Mode from config file:          permissive
>     Policy version:                 21
>     Policy from config file:        targeted
> 
>     [root at test01]# cat /etc/cluster/cluster.conf
>     <?xml version="1.0"?>
>     <cluster config_version="25" name="gdao_cluster">
>         <fence_daemon post_fail_delay="0" post_join_delay="120"/>
>         <clusternodes>
>             <clusternode name="test01" nodeid="1" votes="1">
>                 <fence>
>                     <method name="single">
>                         <device name="gfs_vmware"/>
>                     </method>
>                 </fence>
>             </clusternode>
>             <clusternode name="test02" nodeid="2" votes="1">
>                 <fence>
>                     <method name="single">
>                         <device name="gfs_vmware"/>
>                     </method>
>                 </fence>
>             </clusternode>
>         </clusternodes>
>         <cman/>
>         <fencedevices>
>             <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
>             <fencedevice agent="fence_vmware" name="gfs_vmware"
> ipaddr="gdvcenter.ucsc.edu" login="root" passwd="1hateAmazon.com"
> vmlogin="root" vmpasswd="esxpass"
> port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
>         </fencedevices>
>         <rm>
>         <failoverdomains/>
>         </rm>
>     </cluster>
> 
> I've seen much discussion of this problem, but no definitive solutions. 
> Any help you can provide will be welcome.
> 
> Wes Modes
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster





More information about the Linux-cluster mailing list