[Linux-cluster] self_fence for FS resource in RHEL 6.x operational?

Wed Feb 6 20:39:00 UTC 2013

On Thu, Jan 24, 2013 at 11:28 AM, Robert Hayden
<rhayden.public at gmail.com> wrote:
> On Tue, Jan 22, 2013 at 12:38 PM, Fabio M. Di Nitto <fdinitto at redhat.com> wrote:
>>
>> On 01/22/2013 06:22 PM, Robert Hayden wrote:
>> > I am testing RHCS 6.3 and found that the self_fence option for a file
>> > system resource will now longer function as expected.  Before I log an
>> > SR with RH, I was wondering if the design changed between RHEL 5 and
>> > RHEL 6.
>> >
>> > In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a
>> > "reboot -fn" command on a self_fence logic.  In RHEL 6, there is little
>> > to no logic around self_fence in the fs.sh file.
>>
>> The logic has just been moved to a common file shared by all *fs
>> resources (fs-lib)
>>
>>
>>
>> >
>> > Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6:
>> >         if [ -n "$umount_failed" ]; then
>> >                 ocf_log err "'umount $mp' failed, error=$ret_val"
>> >
>> >                 if [ "$self_fence" ]; then
>> >                         ocf_log alert "umount failed - REBOOTING"
>> >                         sync
>> >                         reboot -fn
>> >                 fi
>> >                 return $FAIL
>> >         else
>> >                 return $SUCCESS
>> >         fi
>>
>> same code, just different file.
>>
>> >
>> >
>> >
>> > To test in RHEL 6, I simply create a file system (e.g. /test/data)
>> > resource with self_fence="1" or self_fence="on" (as added by Conga).
>> > Then mount a small ISO image on top of the file system.  This mount will
>> > cause the file system resource to be unable to unmount itself and should
>> > trigger a self_fence scenario.
>> >
>> > Testing RHEL 6, I see the following in /var/log/messages:
>> >
>> > Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data
>> > Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to
>> > processes on /test/data
>> > Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data
>> > Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to
>> > processes on /test/data
>> > Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data"
>> > returned 1 (generic error)
>>
>> Looks like a bug in force_umount option.
>>
>> Please file a ticket with RH GSS.
>
> I will log a ticket in a few days when I can build a simple test case
> for support.
>

I thought I would provide a follow-up for the community

A (private, sorry) bugzilla has been created
https://bugzilla.redhat.com/show_bug.cgi?id=908457

For those with Red Hat Network access, a KB article has been created
https://access.redhat.com/knowledge/solutions/306483


>>
>> As workaround try to disable force_umount.
>
> The workaround of have force_umount=0 and self_fence=1 worked with the
> ISO image mount test.
>
>
>>
>> As far as I can tell, but I haven't verify it:
>> ocf_log warning "Sending SIGKILL to processes on $mp"
>>                         fuser -kvm "$mp"
>>
>>                         case $? in
>>                         0)
>>                                 ;;
>>                         1)
>>                                 return $OCF_ERR_GENERIC
>>                                 ;;
>>                         2)
>>                                 break
>>                                 ;;
>>                         esac
>>
>> the issue is the was fuser error is handled in force_umount path, that
>> would match the log you are posting.
>>
>
> I have learned that "fuser" command will not find the sub-mounted iso
> image that causes the umount to fail.  So, my test case using the iso
> image to test self_fence may need to be updated.
>
> [root at techval16]# df -k | grep data
> /dev/mapper/share16vg-tv16_mq_data
>                         806288     17200    748128   3% /test/data
>                            352       352                 0 100% /test/data/mnt
> [root at techval16]# fuser -kvm /test/data
> [root at techval16]# echo $?
> 1
> [root at techval16]# umount /test/data
> umount: /test/data: device is busy.
>         (In some cases useful info about processes that use
>          the device is found by lsof(8) or fuser(1))
> [root at techval16]#
>
> Unsure if the logic in fs-lib needs to be updated to handle
> sub-mounted file systems.  That is what the Support Ticket will
> determine, I suppose.
>
>> I think the correct way would be to check if self_fence is enabled or
>> not and then return/reboot later on the script.
>>
>> Fabio
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster