[Linux-cluster] Asking information about NFS /CS4 Cookbook

Wendy Cheng wcheng at redhat.com
Thu May 24 14:00:43 UTC 2007


Fajar A. Nugraha wrote:

>Hi Wendy,
>
>  
>
>Please help me go through this summary from the bugzilla
>
>Before we complete the work, for NFS v2/V3, RHEL 4.4 has the following
>restrictions:
>
>==> Is this still valid for RHEL 4.5 and RHEL5?
>  
>

NFS failover most likely will work, except the documented corner cases. 
Our customers normally find the restrictions workable. One thing has to 
be made clear is that these are all inherent linux kernel issues. RHCS 
has been doing a good job to workaround a large portion of them. 
Occasionally you'll find ESTABLE or EPERM though. The fixes didn't make 
into RHEL 4.5 nor RHEL 5.

>B-1: Unless NFS client applications can tolerate ESTALE and/or EPERM errors,
>     IO activities on the failover ip interface must be temporarily quiesced
>     until active-active failover transition completes. This is to avoid
>     non-idempotent NFS operation failure on the new server. (check out
>     "Why NFS Sucks" by Olaf Kirch, placed as "kirch-reprint.pdf" in 2006
>     OLS proceeding).
>
>==> What does this mean, exactly? For example, does this mean that I
>should not use RHCS-nfs-mounted storage for
>busy-accessed-all-the-time-web-servers because I'd likely get
>ESTALE/EPERM during failover?
>  
>
NFS V2/V3 failover has been a difficult subject regardless which 
platform you're on. Assume a flawless failover is a naive assumption. 
NFS V4 (where NFS client is required to play a helping role) is 
developed to remedy the issues.

>B-2: With various possible base kernel bugs outside RHCS' control, there
>     are possibilities that local filesystem (such as ext3) umount could
>     fail. To ensure data integrity, RHCS will abort the failover. Admin
>     could specify the self-fence (reboot taken-over server) option
>     to force failover (via cluster.conf file).
>
>==> In short, it'd be better using GFS, right?
>  
>
GFS certainly works better in this arena.

>B-3: If nfs client invokes NLM locking call, the subject nfs servers (both
>     taken-over and take-over) will enter a global 90-second (tunable)
>     locking grace period for every nfs service on the servers.
>
>==> What does "locking grace" mean? Does it mean read-write access
>allowed but no locks, or no acess at all?
>  
>
If it is a new lock request, the lock call will hang until grace period 
is over. This is to allow existing lock holders to reclaim their locks. 
This has been part of the NFS-NLM protocol. Read and write can keep 
going without restrictions.

>B-4: If NFS-TCP is involved, failover should not be issued on the same pair
>     of machines multiple times within 30-minute period; for example,
>     failing over from node A to B, then immediately failing from B back to
>     A would hang the connection. This is to avoid TCP TIME_WAIT issue.
>
>==> So what does this mean currently in TCP vs UDP world? Does it mean
>nfs v3 UDP is the preferred method?
>  
>
No. TCP is definitely a better protocol. Read the sentence carefully - 
"failing over from node A to B, then immediately failing from B back to 
A again will hang the connection".

-- Wendy




More information about the Linux-cluster mailing list