[Linux-cluster] How to set up NFS HA service

Tue Apr 19 13:08:18 UTC 2005

Debugging a cluster setup with this software could have been easier 
given better error messages from the components, but I'm getting there...

I thought I'd just mount my gfs file systems outside the resource 
manager's control to have them present all the time and just use the 
resource manager to move over the IP address and do the NFS magic. That 
seems impossible, as I couldn't get any exports to happen when I defined 
them in cluster.conf without a surrounding <fs>. I could define the 
exports in /etc/exports, but then I would have to synch files. So in the 
end I put all my gfs file systems into cluster.conf.

It almost works. I get mounts, and they get exported. But I have some 
error messages in the log file and the exports take a loooong time. Only 
2 of the 3 exports defined seem to show up.

I'm also a bit puzzled about why the file systems don't get unmounted 
when I disable all services.

As for file locking:
I copied /etc/init.d/nfslock to /etc/init.d/nfslock-svc and made some 
changes.
First, I added a little code to enable nfslock to read a variable 
STATD_STATEDIR for the -p option from the config file in /etc/sysconfig. 
I think this should get propagated back to upcoming fedora releases if 
someone who knows how would bother to do it... I then changed 
nfslock-svc to read a different config file (/etc/sysconfig/nfs-svc) and 
to do 'service nfslock stop' at the top of the start section and 
'service nfslock start' at the bottom of the stop section.
This enables me to have statd running as e.g. 'server1' on the cluster 
node until it takes over the nfs service. At takeover, statd gets 
restarted with statedir on a cluster file system (so it can take over 
lock info belonging to the service) and with the name of the NFS service 
IP address. Does this sound reasonable? I know I'll loose any locks the 
cluster node may have had (as NFS client) when it takes over the nfs 
service, but I cannot see any reason why the cluster node should have 
nfs locks (or nfs mounts for that matter) except when doing admin work. 
I think I could fix it by copying /var/lib/nfs/statd/sm* into the 
clustered file system right after the 'service nfslock stop' I put in.

I have appended part of my messages file and my cluster.conf file. Any 
help with my NFS export issues will be appreciated.

-- 
birger

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 2950 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050419/bd2d3168/attachment.xml>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: messages
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050419/bd2d3168/attachment.ksh>