[Linux-cluster] Re: clusterfs.sh returning generic error (1)

Axel Thimm Axel.Thimm at ATrpms.net
Wed Nov 9 18:46:12 UTC 2005


On Wed, Nov 09, 2005 at 10:17:49AM -0700, Ryan Thomson wrote:
> Hi list,
> 
> I'm having some issues setting up a GFS mount w/ NFS export on RHEL4 using
> the latest cluster suite packages from RHN. I'm using GFS CVS (RHEL4) and
> LVM2 (clvmd) from source tarball (2.2.01.09) if that makes any difference.
> 
> The problem I am having is this: I setup a service with a GFS resource, an
> NFS export resource and an NFS client resource. The service starts fine
> and I can mount the NFS export over the network from clients. After one
> minute and each minute after that I'm seeing some errors in my logs and
> the service is restarted. I looked and clusterfs.sh and saw that it's
> supposed to be doing a "isMounted" check every minute... but how is that
> failing if I can access everything just fine, locally and over NFS?

The status check is failing, see
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172066 for
an explanation and a mini-patch.

> Here is the error as I am seeing it in /var/log/messages:
> 
> Nov  9 10:00:59 wolverine clurgmgrd[6901]: <notice> status on clusterfs
> "people" returned 1 (generic error)
> Nov  9 10:00:59 wolverine clurgmgrd[6901]: <notice> Stopping service NFS
> people
> Nov  9 10:00:59 wolverine clurgmgrd: [6901]: <info> Removing IPv4 address
> 136.159.***.*** from eth0
> Nov  9 10:00:59 wolverine clurgmgrd: [6901]: <info> Removing export:
> 136.159.***.0/24:/people
> Nov  9 10:00:59 wolverine clurgmgrd: [6901]: <info> unmounting
> /dev/mapper/BIOCOMP-people (/people)
> Nov  9 10:00:59 wolverine clurgmgrd[6901]: <notice> Service NFS people is
> recovering
> Nov  9 10:00:59 wolverine clurgmgrd[6901]: <notice> Recovering failed
> service NFS people
> Nov  9 10:01:00 wolverine kernel: GFS: Trying to join cluster
> "lock_nolock", ""
> Nov  9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: Joined cluster. Now
> mounting FS...
> Nov  9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Trying to
> acquire journal lock...
> Nov  9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Looking at
> journal...
> Nov  9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Done
> Nov  9 10:01:00 wolverine clurmtabd[27592]: <err> #20: Failed set log level
> Nov  9 10:01:00 wolverine clurgmgrd: [6901]: <info> Adding export:
> 136.159.***.0/24:/people (rw,sync)
> Nov  9 10:01:00 wolverine clurgmgrd: [6901]: <info> Adding IPv4 address
> 136.159.***.*** to eth0
> Nov  9 10:01:01 wolverine clurgmgrd[6901]: <notice> Service NFS people
> started
> 
> 
> And here is my cluster.conf file:
> 
> <?xml version="1.0"?>
> <cluster config_version="28" name="biocomp_cluster">
>         <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="wolverine" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="apcfence" port="1"
> switch="0"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="skunk" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="apcfence" port="2"
> switch="0"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="cottontail" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="apcfence" port="3"
> switch="0"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman/>
>         <fencedevices>
>                 <fencedevice agent="fence_apc" ipaddr="10.1.1.54"
> login="fence_user" name="apcfence" passwd="*****"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="NFS Failover" ordered="1"
> restricted="1">
>                                 <failoverdomainnode name="wolverine"
> priority="3"/>
>                                 <failoverdomainnode name="skunk"
> priority="2"/>
>                                 <failoverdomainnode name="cottontail"
> priority="1"/>
>                         </failoverdomain>
>                         <failoverdomain name="Cluster Failover"
> ordered="0" restricted="1">
>                                 <failoverdomainnode name="wolverine"
> priority="1"/>
>                                 <failoverdomainnode name="skunk"
> priority="1"/>
>                                 <failoverdomainnode name="cottontail"
> priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <clusterfs device="/dev/BIOCOMP/people"
> force_unmount="1" fstype="gfs"
> mountpoint="/people" name="people" options=""/>
>                         <nfsclient name="people-client" options="rw,sync"
> target="136.159.***.0/24"/>
>                         <nfsexport name="people-export"/>
>                         <nfsclient name="projects-client"
> options="rw,sync" target="136.159.***.0/24"/>
>                         <nfsexport name="projects-export"/>
>                 </resources>
>                 <service autostart="1" domain="Cluster Failover"
> name="cluster NAT">
>                         <ip address="10.1.1.1" monitor_link="1"/>
>                         <script file="/cluster/scripts/cluster_nat"
> name="cluster NAT script"/>
>                 </service>
>                 <service autostart="1" domain="Cluster Failover" name="NFS
> people">
>                         <ip address="136.159.***.***" monitor_link="1"/>
>                         <clusterfs ref="people">
>                                 <nfsexport ref="people-export">
>                                         <nfsclient ref="people-client"/>
>                                 </nfsexport>
>                         </clusterfs>
>                 </service>
>                 <service autostart="1" domain="Cluster Failover" name="NFS
> projects">
>                         <ip address="136.159.***.***" monitor_link="1"/>
>                         <clusterfs device="/dev/BIOCOMP/RT_testproject"
> force_unmount="1" fstype="gfs"
> mountpoint="/projects/RT_testproject"
> name="RT_testproject" options="">
>                                 <nfsexport ref="projects-export">
>                                         <nfsclient ref="projects-client"/>
>                                 </nfsexport>
>                         </clusterfs>
>                 </service>
>         </rm>
> </cluster>
> 
> 
> Am I doing something wrong here? I tried looking through
> /usr/share/cluster/clusterfs.sh to see where it is returning 1 from but I
> can't seem to be able to debug this issue on my own.
> 
> Thoughts, Ideas, Suggestions?
> 

-- 
Axel.Thimm at ATrpms.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20051109/5625efe5/attachment.sig>


More information about the Linux-cluster mailing list