EXT3 filesystem on scsi device becoming readonly

Mon Aug 28 16:44:30 UTC 2006

On Mon, 2006-08-28 at 12:24 -0400, Theodore Tso wrote:
> On Mon, Aug 28, 2006 at 08:49:36AM -0500, James Bottomley wrote:
> > On Mon, 2006-08-28 at 09:31 -0400, Theodore Tso wrote:
> > > IMHO the right thing is for the device driver to retry for some amount
> > > of time (maybe measured in seconds or perhaps a single digit number of
> > > minutes), and in the meantime, pass a signal to the rest of the kernel
> > > that any process that attempt to write to the filesystem should be
> > > frozen while we wait for the disk to come back.  
> > 
> > Actually, for this exact case, there's a feature propagating through the
> > transport classes called the dev loss timer.  It's job, for pluggable
> > transports like FC, is to allow the user time to unplug and replug
> > cables before the system declares the device lost and starts erroring
> > requests (which is what causes the fs to go read only).  Since the
> > original reporter seemed to be using fibre, it sounds like this would
> > suit.  Beware:  the dev loss timer shouldn't be much longer than the
> > SCSI command timeout (say ~30s) or nasty things may happen.
> 
> Yes, that sounds ideal.  Does the dev loss timer need to be
> configured, or is it going to be enabled with an appropriate-
> for-most-systems defalut valaue (such as the SCSI command timeout).

It's configurable via the fc transport class rports
(in /sys/class/fc_rport_class, value dev_loss_tmo) the default value is
60s

> Also, when did this get added to the various transport classes?  I
> assume it's not going to be of much help for the original reporter he
> heeds it to work on a RHEL 3 AS Update 6 kernel, but hopefully it will
> be in SLES 10 / RHEL 5?  Or is this something that is just going into
> the 2.6 mainline now?

Erm, pass.  It predates git, so at least 2.6.12-rc2

James