[Linux-cluster] Re: clusterfs.sh returning generic error (1)
Axel Thimm
Axel.Thimm at ATrpms.net
Wed Nov 9 18:46:12 UTC 2005
On Wed, Nov 09, 2005 at 10:17:49AM -0700, Ryan Thomson wrote:
> Hi list,
>
> I'm having some issues setting up a GFS mount w/ NFS export on RHEL4 using
> the latest cluster suite packages from RHN. I'm using GFS CVS (RHEL4) and
> LVM2 (clvmd) from source tarball (2.2.01.09) if that makes any difference.
>
> The problem I am having is this: I setup a service with a GFS resource, an
> NFS export resource and an NFS client resource. The service starts fine
> and I can mount the NFS export over the network from clients. After one
> minute and each minute after that I'm seeing some errors in my logs and
> the service is restarted. I looked and clusterfs.sh and saw that it's
> supposed to be doing a "isMounted" check every minute... but how is that
> failing if I can access everything just fine, locally and over NFS?
The status check is failing, see
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172066 for
an explanation and a mini-patch.
> Here is the error as I am seeing it in /var/log/messages:
>
> Nov 9 10:00:59 wolverine clurgmgrd[6901]: <notice> status on clusterfs
> "people" returned 1 (generic error)
> Nov 9 10:00:59 wolverine clurgmgrd[6901]: <notice> Stopping service NFS
> people
> Nov 9 10:00:59 wolverine clurgmgrd: [6901]: <info> Removing IPv4 address
> 136.159.***.*** from eth0
> Nov 9 10:00:59 wolverine clurgmgrd: [6901]: <info> Removing export:
> 136.159.***.0/24:/people
> Nov 9 10:00:59 wolverine clurgmgrd: [6901]: <info> unmounting
> /dev/mapper/BIOCOMP-people (/people)
> Nov 9 10:00:59 wolverine clurgmgrd[6901]: <notice> Service NFS people is
> recovering
> Nov 9 10:00:59 wolverine clurgmgrd[6901]: <notice> Recovering failed
> service NFS people
> Nov 9 10:01:00 wolverine kernel: GFS: Trying to join cluster
> "lock_nolock", ""
> Nov 9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: Joined cluster. Now
> mounting FS...
> Nov 9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Trying to
> acquire journal lock...
> Nov 9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Looking at
> journal...
> Nov 9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Done
> Nov 9 10:01:00 wolverine clurmtabd[27592]: <err> #20: Failed set log level
> Nov 9 10:01:00 wolverine clurgmgrd: [6901]: <info> Adding export:
> 136.159.***.0/24:/people (rw,sync)
> Nov 9 10:01:00 wolverine clurgmgrd: [6901]: <info> Adding IPv4 address
> 136.159.***.*** to eth0
> Nov 9 10:01:01 wolverine clurgmgrd[6901]: <notice> Service NFS people
> started
>
>
> And here is my cluster.conf file:
>
> <?xml version="1.0"?>
> <cluster config_version="28" name="biocomp_cluster">
> <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
> <clusternodes>
> <clusternode name="wolverine" votes="1">
> <fence>
> <method name="1">
> <device name="apcfence" port="1"
> switch="0"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="skunk" votes="1">
> <fence>
> <method name="1">
> <device name="apcfence" port="2"
> switch="0"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="cottontail" votes="1">
> <fence>
> <method name="1">
> <device name="apcfence" port="3"
> switch="0"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman/>
> <fencedevices>
> <fencedevice agent="fence_apc" ipaddr="10.1.1.54"
> login="fence_user" name="apcfence" passwd="*****"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="NFS Failover" ordered="1"
> restricted="1">
> <failoverdomainnode name="wolverine"
> priority="3"/>
> <failoverdomainnode name="skunk"
> priority="2"/>
> <failoverdomainnode name="cottontail"
> priority="1"/>
> </failoverdomain>
> <failoverdomain name="Cluster Failover"
> ordered="0" restricted="1">
> <failoverdomainnode name="wolverine"
> priority="1"/>
> <failoverdomainnode name="skunk"
> priority="1"/>
> <failoverdomainnode name="cottontail"
> priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <resources>
> <clusterfs device="/dev/BIOCOMP/people"
> force_unmount="1" fstype="gfs"
> mountpoint="/people" name="people" options=""/>
> <nfsclient name="people-client" options="rw,sync"
> target="136.159.***.0/24"/>
> <nfsexport name="people-export"/>
> <nfsclient name="projects-client"
> options="rw,sync" target="136.159.***.0/24"/>
> <nfsexport name="projects-export"/>
> </resources>
> <service autostart="1" domain="Cluster Failover"
> name="cluster NAT">
> <ip address="10.1.1.1" monitor_link="1"/>
> <script file="/cluster/scripts/cluster_nat"
> name="cluster NAT script"/>
> </service>
> <service autostart="1" domain="Cluster Failover" name="NFS
> people">
> <ip address="136.159.***.***" monitor_link="1"/>
> <clusterfs ref="people">
> <nfsexport ref="people-export">
> <nfsclient ref="people-client"/>
> </nfsexport>
> </clusterfs>
> </service>
> <service autostart="1" domain="Cluster Failover" name="NFS
> projects">
> <ip address="136.159.***.***" monitor_link="1"/>
> <clusterfs device="/dev/BIOCOMP/RT_testproject"
> force_unmount="1" fstype="gfs"
> mountpoint="/projects/RT_testproject"
> name="RT_testproject" options="">
> <nfsexport ref="projects-export">
> <nfsclient ref="projects-client"/>
> </nfsexport>
> </clusterfs>
> </service>
> </rm>
> </cluster>
>
>
> Am I doing something wrong here? I tried looking through
> /usr/share/cluster/clusterfs.sh to see where it is returning 1 from but I
> can't seem to be able to debug this issue on my own.
>
> Thoughts, Ideas, Suggestions?
>
--
Axel.Thimm at ATrpms.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20051109/5625efe5/attachment.sig>
More information about the Linux-cluster
mailing list