[Linux-cluster] Issue with NFS exports and "loopback" NFS mounts as RHCS resources on attempted resource release

Thu Oct 27 12:30:12 UTC 2011

Hi Cluster Wizards,

I have a severe issue with NFS exports and mounts on an RHCS
2-node cluster (active/passive) within a clustered SAP service.

The migration of a clustered SAP from an HP-UX MC/ServiceGuard
cluster to an RHCS cluster,  what I thought to be an easy job,
now turns out to become a nightmare because I should provide the
clustered service ready with no leeway for further testing and
booting the nodes for me after 17:00 today.
In the old HP MC/SC environment this works smoothly and the
cluster doesn't care at all whether there are still claims from
clients on the NFS exports or dangling locks etc, as it should
be.
In RHCS clurgmrd or the agents seem to be far more capricious in
this respect.

Unfortunately, this kind of reference doc 
https://access.redhat.com/sites/default/files/sap_ha_reference_ar
chitecture.pdf
doesn't address NFS at all, but according to our SAP admins these
are absolutely required in our SAP setup.
What makes it even worse is the fact that the NFS shares should
be exported out of the cluster as well to a myriad of external
SAP servers that aren't cluster nodes.
However, to make testing easier I so far have those three
required NFS shares only exported to the cluster node IPs and the
floating VIP through the nfsexport/nfsclient RAs of standard
RHCS.
But later it should be exported to the whole LAN.
Also the Node that is running the SAP service needs to mount
those shares itself locally, what I thus called "loopback" NFS
mounts in the subject line.

I hope my ordering of resources is in order.
But I cannot imagine how else they should be ordered for it to
work.

For now I let alone SAPDatabase and SAPInstance (which I know are
managed correctly by their respective agents as I tested it prior
without the NFS resources).

This is the starting sequence of lvm resource and its child
resources which contain the crappy NFS stuff:

[root at alsterneu:/etc/cluster]
# rg_test noop /etc/cluster/cluster.conf start lvm vg_san0
Running in test mode.
Starting vg_san0...
[start] lvm:vg_san0
[start] fs:oracle_Z01
[start] fs:oracle_Z01_sapreorg
[start] fs:oracle_Z01_oraarch
[start] fs:oracle_Z01_origlogA
[start] fs:oracle_Z01_origlogB
[start] fs:oracle_Z01_mirrlogA
[start] fs:oracle_Z01_mirrlogB
[start] fs:oracle_Z01_sapdata1
[start] ip:10.25.101.244
[start] fs:export_Z01
[start] nfsexport:exports
[start] nfsclient:client_alster
[start] nfsclient:client_warnow
[start] nfsclient:client_lena
[start] netfs:nfs_Z01
[start] fs:export_sapmnt_Z01
[start] nfsexport:exports
[start] nfsclient:client_alster
[start] nfsclient:client_warnow
[start] nfsclient:client_lena
[start] netfs:nfs_sapmnt_Z01
[start] fs:usr_sap_Z01_DVEBMGS01
[start] fs:export_audit
[start] nfsexport:exports
[start] nfsclient:client_alster
[start] nfsclient:client_warnow
[start] nfsclient:client_lena
[start] netfs:nfs_audit
Start of vg_san0 complete

and this is the stopping sequence (in reverse order, what a
surprise):

[root at alsterneu:/etc/cluster]
# rg_test noop /etc/cluster/cluster.conf stop lvm vg_san0
Running in test mode.
Stopping vg_san0...
[stop] netfs:nfs_audit
[stop] nfsclient:client_lena
[stop] nfsclient:client_warnow
[stop] nfsclient:client_alster
[stop] nfsexport:exports
[stop] fs:export_audit
[stop] fs:usr_sap_Z01_DVEBMGS01
[stop] netfs:nfs_sapmnt_Z01
[stop] nfsclient:client_lena
[stop] nfsclient:client_warnow
[stop] nfsclient:client_alster
[stop] nfsexport:exports
[stop] fs:export_sapmnt_Z01
[stop] netfs:nfs_Z01
[stop] nfsclient:client_lena
[stop] nfsclient:client_warnow
[stop] nfsclient:client_alster
[stop] nfsexport:exports
[stop] fs:export_Z01
[stop] ip:10.25.101.244
[stop] fs:oracle_Z01_sapdata1
[stop] fs:oracle_Z01_mirrlogB
[stop] fs:oracle_Z01_mirrlogA
[stop] fs:oracle_Z01_origlogB
[stop] fs:oracle_Z01_origlogA
[stop] fs:oracle_Z01_oraarch
[stop] fs:oracle_Z01_sapreorg
[stop] fs:oracle_Z01
[stop] lvm:vg_san0
Stop of vg_san0 complete

First make sure the service really is disabled

[root at alsterneu:/etc/cluster]
# clustat -s z01
 Service Name                                   Owner (Last)
State         
 ------- ----                                   ----- ------
-----         
 service:z01                                    (none)
disabled      

and no HA-LVM tag on the shared VG

[root at alsterneu:/etc/cluster]
# vgs -o +tags vg_san0
  VG      #PV #LV #SN Attr   VSize VFree  VG Tags
  vg_san0   5  14   0 wz--n- 2.13T 12.00M        

Now while the start works fine...

[root at alsterneu:/etc/cluster]
# rg_test test /etc/cluster/cluster.conf start lvm vg_san0
Running in test mode.
Starting vg_san0...
  volume_list=["vg_root", "vg_local", "@alstera"]
<info> Starting volume group, vg_san0
<info> I can claim this volume group
  Volume group "vg_san0" successfully changed
<info> New tag "alstera" added to vg_san0
  Internal error: Maps lock 14598144 < unlock 14868480
  14 logical volume(s) in volume group "vg_san0" now active
<info> mounting /dev/mapper/vg_san0-lv_ora_z01 on /oracle/Z01
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_ora_z01 /oracle/Z01
<info> mounting /dev/mapper/vg_san0-lv_ora_z01_sapreorg on
/oracle/Z01/sapreorg
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_ora_z01_sapreorg
/oracle/Z01/sapreorg
<info> mounting /dev/mapper/vg_san0-lv_ora_z01_oraarch on
/oracle/Z01/oraarch
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_ora_z01_oraarch
/oracle/Z01/oraarch
<info> mounting /dev/mapper/vg_san0-lv_ora_z01_origloga on
/oracle/Z01/origlogA
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_ora_z01_origloga
/oracle/Z01/origlogA
<info> mounting /dev/mapper/vg_san0-lv_ora_z01_origlogb on
/oracle/Z01/origlogB
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_ora_z01_origlogb
/oracle/Z01/origlogB
<info> mounting /dev/mapper/vg_san0-lv_ora_z01_mirrloga on
/oracle/Z01/mirrlogA
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_ora_z01_mirrloga
/oracle/Z01/mirrlogA
<info> mounting /dev/mapper/vg_san0-lv_ora_z01_mirrlogb on
/oracle/Z01/mirrlogB
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_ora_z01_mirrlogb
/oracle/Z01/mirrlogB
<info> mounting /dev/mapper/vg_san0-lv_ora_z01_sapdata1 on
/oracle/Z01/sapdata1
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_ora_z01_sapdata1
/oracle/Z01/sapdata1
<debug> Link for bond2: Detected
<info> Adding IPv4 address 10.25.101.244/24 to bond2
<debug> Pinging addr 10.25.101.244 from dev bond2
<debug> Sending gratuitous ARP: 10.25.101.244 00:17:a4:77:d0:c4
brd ff:ff:ff:ff:ff:ff
<info> mounting /dev/mapper/vg_san0-lv_export_z01 on /export/Z01
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_export_z01
/export/Z01
<info> Adding export: 10.25.101.231:/export/Z01
(fsid=110,rw,sync,wdelay,insecure,no_root_squash)
<info> Adding export: 10.25.101.232:/export/Z01
(fsid=110,rw,sync,wdelay,insecure,no_root_squash)
<info> Adding export: 10.25.101.244:/export/Z01
(fsid=110,rw,sync,wdelay,insecure,no_root_squash)
<debug> mount  -o rw,fg,hard,intr 10.25.101.244:/export/Z01 /Z01
<info> mounting /dev/mapper/vg_san0-lv_export_sapmnt_z01 on
/export/sapmnt/Z01
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_export_sapmnt_z01
/export/sapmnt/Z01
<info> Adding export: 10.25.101.231:/export/sapmnt/Z01
(fsid=111,rw,sync,wdelay,insecure,no_root_squash)
<info> Adding export: 10.25.101.232:/export/sapmnt/Z01
(fsid=111,rw,sync,wdelay,insecure,no_root_squash)
<info> Adding export: 10.25.101.244:/export/sapmnt/Z01
(fsid=111,rw,sync,wdelay,insecure,no_root_squash)
<debug> mount  -o rw,fg,hard,intr
10.25.101.244:/export/sapmnt/Z01 /sapmnt/Z01
<info> mounting /dev/mapper/vg_san0-lv_usr_sap_z01_dvebmgs01 on
/usr/sap/Z01/DVEBMGS01
<debug> mount -t ext3
/dev/mapper/vg_san0-lv_usr_sap_z01_dvebmgs01
/usr/sap/Z01/DVEBMGS01
<info> mounting /dev/mapper/vg_san0-lv_export_audit on
/export/audit
<debug> mount -t ext3  /dev/mapper/vg_san0-lv_export_audit
/export/audit
<info> Adding export: 10.25.101.231:/export/audit
(fsid=114,rw,sync,wdelay,insecure,no_root_squash)
<info> Adding export: 10.25.101.232:/export/audit
(fsid=114,rw,sync,wdelay,insecure,no_root_squash)
<info> Adding export: 10.25.101.244:/export/audit
(fsid=114,rw,sync,wdelay,insecure,no_root_squash)
<debug> mount  -o rw,fg,hard,intr 10.25.101.244:/export/audit
/audit
Start of vg_san0 complete

(n.b. verify the mounts are present)

[root at alsterneu:/etc/cluster]
# df -Ph|grep vg_san0
/dev/mapper/vg_san0-lv_ora_z01  7.9G  4.7G  2.9G  63% /oracle/Z01
/dev/mapper/vg_san0-lv_ora_z01_sapreorg  5.0G  158M  4.6G   4%
/oracle/Z01/sapreorg
/dev/mapper/vg_san0-lv_ora_z01_oraarch   20G  1.7G   18G   9%
/oracle/Z01/oraarch
/dev/mapper/vg_san0-lv_ora_z01_origloga 1008M  935M   23M  98%
/oracle/Z01/origlogA
/dev/mapper/vg_san0-lv_ora_z01_origlogb 1008M  935M   23M  98%
/oracle/Z01/origlogB
/dev/mapper/vg_san0-lv_ora_z01_mirrloga 1008M  935M   23M  98%
/oracle/Z01/mirrlogA
/dev/mapper/vg_san0-lv_ora_z01_mirrlogb 1008M  935M   23M  98%
/oracle/Z01/mirrlogB
/dev/mapper/vg_san0-lv_ora_z01_sapdata1  2.1T  1.3T  696G  65%
/oracle/Z01/sapdata1
/dev/mapper/vg_san0-lv_export_z01   37G  177M   35G   1%
/export/Z01
/dev/mapper/vg_san0-lv_export_sapmnt_z01  3.0G 1010M  1.9G  36%
/export/sapmnt/Z01
/dev/mapper/vg_san0-lv_usr_sap_z01_dvebmgs01  3.0G  130M  2.7G
5% /usr/sap/Z01/DVEBMGS01
/dev/mapper/vg_san0-lv_export_audit 1008M   34M  924M   4%
/export/audit

[root at alsterneu:/etc/cluster]
# exportfs 
/export/sapmnt/Z01
                10.25.101.244
/export/sapmnt/Z01
                10.25.101.232
/export/sapmnt/Z01
                10.25.101.231
/export/audit   10.25.101.244
/export/audit   10.25.101.232
/export/audit   10.25.101.231
/oracle/stage   10.25.101.0/24
/export/Z01     10.25.101.244
/export/Z01     10.25.101.232
/export/Z01     10.25.101.231

[root at alsterneu:/etc/cluster]
# df -Ph -t nfs
Filesystem            Size  Used Avail Use% Mounted on
10.25.101.244:/export/Z01   37G  176M   35G   1% /Z01
10.25.101.244:/export/sapmnt/Z01  3.0G 1010M  1.9G  36%
/sapmnt/Z01
10.25.101.244:/export/audit 1008M   34M  924M   4% /audit

...the shutdown fails miserably:

[root at alsterneu:/etc/cluster]
# rg_test test /etc/cluster/cluster.conf stop lvm vg_san0
Running in test mode.
Stopping vg_san0...
<info> unmounting /audit
<info> Removing export: 10.25.101.244:/export/audit
<info> Removing export: 10.25.101.232:/export/audit
<info> Removing export: 10.25.101.231:/export/audit
<info> unmounting /export/audit
<info> unmounting /usr/sap/Z01/DVEBMGS01
<info> unmounting /sapmnt/Z01
<info> Removing export: 10.25.101.244:/export/sapmnt/Z01
<info> Removing export: 10.25.101.232:/export/sapmnt/Z01
<info> Removing export: 10.25.101.231:/export/sapmnt/Z01
<info> unmounting /export/sapmnt/Z01
<info> unmounting /Z01
<info> Removing export: 10.25.101.244:/export/Z01
<info> Removing export: 10.25.101.232:/export/Z01
<info> Removing export: 10.25.101.231:/export/Z01
<info> unmounting /export/Z01
umount: /export/Z01: device is busy
umount: /export/Z01: device is busy
<warning> Dropping node-wide NFS locks
<info> unmounting /export/Z01
umount: /export/Z01: device is busy
umount: /export/Z01: device is busy
<info> unmounting /export/Z01
umount: /export/Z01: device is busy
umount: /export/Z01: device is busy
<info> Asking lockd to drop locks (pid 6607)
<debug> No hosts to notify
<debug> No hosts to notify
<debug> No hosts to notify
<err> 'umount /export/Z01' failed, error=0
<info> Removing IPv4 address 10.25.101.244/24 from bond2
<info> unmounting /oracle/Z01/sapdata1
<info> unmounting /oracle/Z01/mirrlogB
<info> unmounting /oracle/Z01/mirrlogA
<info> unmounting /oracle/Z01/origlogB
<info> unmounting /oracle/Z01/origlogA
<info> unmounting /oracle/Z01/oraarch
<info> unmounting /oracle/Z01/sapreorg
<info> unmounting /oracle/Z01
  volume_list=["vg_root", "vg_local", "@alstera"]
  Can't deactivate volume group "vg_san0" with 1 open logical
volume(s)
<err> Logical volume vg_san0/lv_export_audit failed to shutdown

[root at alsterneu:/etc/cluster]
# df -Ph|grep vg_san0
/dev/mapper/vg_san0-lv_export_z01   37G  177M   35G   1%
/export/Z01
[root at alsterneu:/etc/cluster]
# df -Ph -t nfs
Filesystem            Size  Used Avail Use% Mounted on
[root at alsterneu:/etc/cluster]
# fuser -m /export/Z01
[root at alsterneu:/etc/cluster]
# lsof +D /export/Z01
[root at alsterneu:/etc/cluster]
# umount /export/Z01
umount: /export/Z01: device is busy
umount: /export/Z01: device is busy
[root at alsterneu:/etc/cluster]
# umount -f /export/Z01
umount2: Device or resource busy
umount: /export/Z01: device is busy
umount2: Device or resource busy
umount: /export/Z01: device is busy

How on earth can I discover what is keeping this stale mount
busy?
Neither fuser nor lsof report any procs accessing the filesystem
(see above)
And also showmount shows no NFS client who could have a lock on
it.

[root at alsterneu:/etc/cluster]
# showmount -a|grep -c /export/Z01
0

After an hour or so the busy devices aren't so anymore and
unmountable
So that finally the shared VG can be deactivated and untagged.

[root at alsterneu:/etc/cluster]
# while ! umount -f /export/Z01; do sleep 15;done
umount2: Device or resource busy
umount: /export/Z01: device is busy
umount2: Device or resource busy
umount: /export/Z01: device is busy
umount2: Device or resource busy
umount: /export/Z01: device is busy
umount2: Device or resource busy
umount: /export/Z01: device is busy

...

[root at alsterneu:/etc/cluster]
# umount /export/Z01
[root at alsterneu:/etc/cluster]
# vgs -o +tags vg_san0
  VG      #PV #LV #SN Attr   VSize VFree  VG Tags
  vg_san0   5  14   0 wz--n- 2.13T 12.00M alstera
[root at alsterneu:/etc/cluster]
# vgchange -an vg_san0
  0 logical volume(s) in volume group "vg_san0" now active
[root at alsterneu:/etc/cluster]
# vgchange --deltag alstera vg_san0
  Volume group "vg_san0" successfully changed

I wonder if someone of you has encountered similar problems when
providing NFS exports as clustered resource and hopefully knows
of a workaround.

Many thanks for your patience and kind notice

Ralph