[Linux-cluster] NFS failover problem

Thu Aug 16 15:35:08 UTC 2007

Kieran,

I'm currently experiencing a similar problem with an HA NFS server that I 
just built on RHEL4 with GFS. 

I have two different linux clients, one running RHEL3 U8, the other 
running RHEL4 U5 (same as the HA NFS servers)

If I use the same standard mount options on both clients (e.g. mount 
SERVER:/exportfs    /mountpoint -t nfs -o rw,noatime ) then everything 
works fine until I perform a failover.  At that point the RHEL 3 client is 
OK but the RHEL 4 client can no longer stat the filesystem (df hangs).  If 
I move the service back the hung df command completes.  I don't see an I/O 
error per say but any copies to and from that mountpoint are inactive 
until I relocate the service back.

I tried other versions of Unix and found that all of them could stat the 
file system after failover except the RHEL4 U5 version.  The only way 
round this I've found so far is to use the udp protocol instead of tcp 
with version 3 nfs.

So my mount commands look something more like this:

# mount SERVER:/exportfs /mountpoint  -t nfs -o rw,noatime,udp,nfsvers=3

I dont know if you can tolerate udp in your environment but it might be 
worth playing around with.

Regards,

Paul

kieran JOYEUX <kjoyeux at jouy.inra.fr> 
Sent by: linux-cluster-bounces at redhat.com
08/16/2007 03:15 AM
Please respond to
linux clustering <linux-cluster at redhat.com>

To
Linux-cluster at redhat.com
cc

Subject
[Linux-cluster] NFS failover problem

Hi guys,

I am implementing a two node cluster sharing via NFS, their local 
storage to one client.
At the moment, i am simulating a failover during a copy from the NFS 
server to the local client disk.

The first time i got a NFS file handle error. I tried to use a 
Filesystem ID (fsid) on the mount parameter of the client but now here 
is my issue:

[root@**** ****]# time cp 1Go.t* /usr
cp: reading `1Go.tyt': Input/output error

My cluster.conf :
<?xml version="1.0"?>
<cluster alias="mig_nfs" config_version="128" name="mig_nfs">
      <fence_daemon post_fail_delay="0" post_join_delay="3"/>
      <clusternodes>
              <clusternode name="ha1" votes="1">
                      <fence>
                              <method name="1">
                                      <device name="barriere" 
nodename="ha1"/>
                              </method>
                      </fence>
              </clusternode>
              <clusternode name="ha2" votes="1">
                      <fence>
                              <method name="1">
                                      <device name="barriere" 
nodename="ha2"/>
                              </method>
                      </fence>
              </clusternode>
      </clusternodes>
      <cman expected_votes="1" two_node="1"/>
      <fencedevices>
              <fencedevice agent="fence_manual" name="barriere"/>
      </fencedevices>
      <rm>
              <failoverdomains>
                      <failoverdomain name="mig_fod" ordered="1" 
restricted="0">
                              <failoverdomainnode name="ha1" 
priority="2"/>
                              <failoverdomainnode name="ha2" 
priority="1"/>
                      </failoverdomain>
              </failoverdomains>
              <resources>
                      <ip address="138.102.22.33" monitor_link="1"/>
                      <nfsexport name="/usr/local/genome"/>
                      <nfsclient name="mig" options="ro,fsid=20" 
path="/usr/local/genome" target="138.102.22.0/255.255.192.0"/>
                      <nfsclient name="mig213" options="fsid=213,ro" 
path="/usr/local/genome" target="138.102.22.213"/>
                      <nfsclient name="mig217" options="ro,fsid=217" 
path="/usr/local/genome" target="138.102.22.217"/>
              </resources>
              <service autostart="1" domain="mig_fod" name="nfs">
                      <ip ref="138.102.22.33"/>
                      <nfsexport ref="/usr/local/genome"/>
                      <nfsclient ref="mig"/>
              </service>
      </rm>
</cluster>

If you have any ideas or remarks, i would love to hear them. Thanks a lot.

Kieran

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070816/989e54bc/attachment.htm>