[dm-devel] Re: Root device multipathed host freeze with the latest upstream multipath-tools package

Martin George marting at netapp.com
Wed Jan 30 15:20:33 UTC 2008


Kiyoshi,

I made the suggested changes to the script (removing 'sleep' & using an 
empty while loop instead) and it worked fine. Preliminary IO runs with 
FCP path faults also look good.

Thanks,
-Martin

Kiyoshi Ueda wrote:
> Hi Martin,
> 
> On Wed, 23 Jan 2008 16:28:16 +0530, Martin George wrote:
>  > Kiyoshi Ueda wrote:
>  > > Hi Martin,
>  > >
>  > > Thank you for your testing.
>  > > Please see my comments below.
>  > >
>  > > On Tue, 22 Jan 2008 21:56:13 +0530, "Sarraf, Ritesh" wrote:
>  > >  > Hi Kiyoshi,
>  > >  >
>  > >  > I took the latest upstream multipath-tools package (Jan 15, 
> 2008) and
>  > >  > installed it on my RHEL 5.1 host to verify the libprio fix. To 
> simulate
>  > >  > the FCP path faults, I ran your script (as attached in the mail) 
> which
>  > >  > alternately offlined/onlined the corresponding SCSI paths of the 
> root dm
>  > >  > device in the syfs. Listing my observations below:
>  > >  >
>  > >  > 1) The freeze was still reproducible. On checking the sysrq 
> dumps (as
>  > >  > attached), I could see it was the script itself i.e. test.sh 
> which seems
>  > >  > to have stalled on the exec () system call perhaps waiting for inode
>  > >  > write out for updated access time (the script resides on my root dm
>  > >  > device itself). As suggested by you in the bugzilla, I remounted the
>  > >  > root device using the noatime option and then reran the script - 
> I have
>  > >  > not hit the freeze yet. Is this the expected behavior?
>  > >
>  > > As for your script, it is the expected behavior.
>  > > I found that you added some sleep commands to my original script
>  > > posted by the following email.
>  > >
>  > >     http://marc.info/?l=dm-devel&m=119465024621783&w=2 
> <http://marc.info/?l=dm-devel&m=119465024621783&w=2>
>  > > <http://marc.info/?l=dm-devel&m=119465024621783&w=2 
> <http://marc.info/?l=dm-devel&m=119465024621783&w=2>>
>  > >
>  > > sleep is not shell build-in command, so need to access the root device.
>  > > I guess that is the reason of the freeze.
>  >
>  > So does that mean you should never access the root partition in such a
>  > scenario? What about utilities like syslogd which may access the root to
>  > log messages? There could be many such utilities for that matter which
>  > accesses the root and all would have to be stopped.
> 
> No.
> Generally you can access the root.
> But you can't in your single-threaded test script.
> 
> On your testing scenario, only your script would online/offline paths
> for the root like this:
>     while true; do
>         <offline all paths>
>         sleep
>         <online all paths>
>         sleep
>     done
> 
> So if your script accesses to the root after it offlines all paths,
> it is freezed and nobody will online the paths.
> So you must avoid your script to be freezed.
> Other utilities accessing to the root don't matter.
> 
> I guess that your script is freezed at the sleep after the offline.
> 
> Thanks,
> Kiyoshi Ueda
> 




More information about the dm-devel mailing list