[Linux-cluster] clusterfs.sh returning generic error (1)

Ryan Thomson thomsonr at ucalgary.ca
Wed Nov 9 17:17:49 UTC 2005


Hi list,

I'm having some issues setting up a GFS mount w/ NFS export on RHEL4 using
the latest cluster suite packages from RHN. I'm using GFS CVS (RHEL4) and
LVM2 (clvmd) from source tarball (2.2.01.09) if that makes any difference.

The problem I am having is this: I setup a service with a GFS resource, an
NFS export resource and an NFS client resource. The service starts fine
and I can mount the NFS export over the network from clients. After one
minute and each minute after that I'm seeing some errors in my logs and
the service is restarted. I looked and clusterfs.sh and saw that it's
supposed to be doing a "isMounted" check every minute... but how is that
failing if I can access everything just fine, locally and over NFS?

Here is the error as I am seeing it in /var/log/messages:

Nov  9 10:00:59 wolverine clurgmgrd[6901]: <notice> status on clusterfs
"people" returned 1 (generic error)
Nov  9 10:00:59 wolverine clurgmgrd[6901]: <notice> Stopping service NFS
people
Nov  9 10:00:59 wolverine clurgmgrd: [6901]: <info> Removing IPv4 address
136.159.***.*** from eth0
Nov  9 10:00:59 wolverine clurgmgrd: [6901]: <info> Removing export:
136.159.***.0/24:/people
Nov  9 10:00:59 wolverine clurgmgrd: [6901]: <info> unmounting
/dev/mapper/BIOCOMP-people (/people)
Nov  9 10:00:59 wolverine clurgmgrd[6901]: <notice> Service NFS people is
recovering
Nov  9 10:00:59 wolverine clurgmgrd[6901]: <notice> Recovering failed
service NFS people
Nov  9 10:01:00 wolverine kernel: GFS: Trying to join cluster
"lock_nolock", ""
Nov  9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: Joined cluster. Now
mounting FS...
Nov  9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Trying to
acquire journal lock...
Nov  9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Looking at
journal...
Nov  9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Done
Nov  9 10:01:00 wolverine clurmtabd[27592]: <err> #20: Failed set log level
Nov  9 10:01:00 wolverine clurgmgrd: [6901]: <info> Adding export:
136.159.***.0/24:/people (rw,sync)
Nov  9 10:01:00 wolverine clurgmgrd: [6901]: <info> Adding IPv4 address
136.159.***.*** to eth0
Nov  9 10:01:01 wolverine clurgmgrd[6901]: <notice> Service NFS people
started


And here is my cluster.conf file:

<?xml version="1.0"?>
<cluster config_version="28" name="biocomp_cluster">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="wolverine" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="apcfence" port="1"
switch="0"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="skunk" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="apcfence" port="2"
switch="0"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="cottontail" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="apcfence" port="3"
switch="0"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="10.1.1.54"
login="fence_user" name="apcfence" passwd="*****"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="NFS Failover" ordered="1"
restricted="1">
                                <failoverdomainnode name="wolverine"
priority="3"/>
                                <failoverdomainnode name="skunk"
priority="2"/>
                                <failoverdomainnode name="cottontail"
priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="Cluster Failover"
ordered="0" restricted="1">
                                <failoverdomainnode name="wolverine"
priority="1"/>
                                <failoverdomainnode name="skunk"
priority="1"/>
                                <failoverdomainnode name="cottontail"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <clusterfs device="/dev/BIOCOMP/people"
force_unmount="1" fstype="gfs"
mountpoint="/people" name="people" options=""/>
                        <nfsclient name="people-client" options="rw,sync"
target="136.159.***.0/24"/>
                        <nfsexport name="people-export"/>
                        <nfsclient name="projects-client"
options="rw,sync" target="136.159.***.0/24"/>
                        <nfsexport name="projects-export"/>
                </resources>
                <service autostart="1" domain="Cluster Failover"
name="cluster NAT">
                        <ip address="10.1.1.1" monitor_link="1"/>
                        <script file="/cluster/scripts/cluster_nat"
name="cluster NAT script"/>
                </service>
                <service autostart="1" domain="Cluster Failover" name="NFS
people">
                        <ip address="136.159.***.***" monitor_link="1"/>
                        <clusterfs ref="people">
                                <nfsexport ref="people-export">
                                        <nfsclient ref="people-client"/>
                                </nfsexport>
                        </clusterfs>
                </service>
                <service autostart="1" domain="Cluster Failover" name="NFS
projects">
                        <ip address="136.159.***.***" monitor_link="1"/>
                        <clusterfs device="/dev/BIOCOMP/RT_testproject"
force_unmount="1" fstype="gfs"
mountpoint="/projects/RT_testproject"
name="RT_testproject" options="">
                                <nfsexport ref="projects-export">
                                        <nfsclient ref="projects-client"/>
                                </nfsexport>
                        </clusterfs>
                </service>
        </rm>
</cluster>


Am I doing something wrong here? I tried looking through
/usr/share/cluster/clusterfs.sh to see where it is returning 1 from but I
can't seem to be able to debug this issue on my own.

Thoughts, Ideas, Suggestions?

-- 
Ryan




More information about the Linux-cluster mailing list