[Linux-cluster] self_fence for FS resource in RHEL 6.x operational?
Robert Hayden
rhayden.public at gmail.com
Thu Jan 24 17:28:38 UTC 2013
On Tue, Jan 22, 2013 at 12:38 PM, Fabio M. Di Nitto <fdinitto at redhat.com> wrote:
>
> On 01/22/2013 06:22 PM, Robert Hayden wrote:
> > I am testing RHCS 6.3 and found that the self_fence option for a file
> > system resource will now longer function as expected. Before I log an
> > SR with RH, I was wondering if the design changed between RHEL 5 and
> > RHEL 6.
> >
> > In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a
> > "reboot -fn" command on a self_fence logic. In RHEL 6, there is little
> > to no logic around self_fence in the fs.sh file.
>
> The logic has just been moved to a common file shared by all *fs
> resources (fs-lib)
>
>
>
> >
> > Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6:
> > if [ -n "$umount_failed" ]; then
> > ocf_log err "'umount $mp' failed, error=$ret_val"
> >
> > if [ "$self_fence" ]; then
> > ocf_log alert "umount failed - REBOOTING"
> > sync
> > reboot -fn
> > fi
> > return $FAIL
> > else
> > return $SUCCESS
> > fi
>
> same code, just different file.
>
> >
> >
> >
> > To test in RHEL 6, I simply create a file system (e.g. /test/data)
> > resource with self_fence="1" or self_fence="on" (as added by Conga).
> > Then mount a small ISO image on top of the file system. This mount will
> > cause the file system resource to be unable to unmount itself and should
> > trigger a self_fence scenario.
> >
> > Testing RHEL 6, I see the following in /var/log/messages:
> >
> > Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data
> > Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to
> > processes on /test/data
> > Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data
> > Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to
> > processes on /test/data
> > Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data"
> > returned 1 (generic error)
>
> Looks like a bug in force_umount option.
>
> Please file a ticket with RH GSS.
I will log a ticket in a few days when I can build a simple test case
for support.
>
> As workaround try to disable force_umount.
The workaround of have force_umount=0 and self_fence=1 worked with the
ISO image mount test.
>
> As far as I can tell, but I haven't verify it:
> ocf_log warning "Sending SIGKILL to processes on $mp"
> fuser -kvm "$mp"
>
> case $? in
> 0)
> ;;
> 1)
> return $OCF_ERR_GENERIC
> ;;
> 2)
> break
> ;;
> esac
>
> the issue is the was fuser error is handled in force_umount path, that
> would match the log you are posting.
>
I have learned that "fuser" command will not find the sub-mounted iso
image that causes the umount to fail. So, my test case using the iso
image to test self_fence may need to be updated.
[root at techval16]# df -k | grep data
/dev/mapper/share16vg-tv16_mq_data
806288 17200 748128 3% /test/data
352 352 0 100% /test/data/mnt
[root at techval16]# fuser -kvm /test/data
[root at techval16]# echo $?
1
[root at techval16]# umount /test/data
umount: /test/data: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
[root at techval16]#
Unsure if the logic in fs-lib needs to be updated to handle
sub-mounted file systems. That is what the Support Ticket will
determine, I suppose.
> I think the correct way would be to check if self_fence is enabled or
> not and then return/reboot later on the script.
>
> Fabio
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list