ext3 file system becoming read only

Wed Sep 26 02:56:09 UTC 2007

Hi,

As I explained in my first posting that the 'read-only' issue is not for one
server, it is happening for few servers which are generally 'oracle' database
oriented. Very recently it happned to an 'oracle' application server. For
temporary basis , we are re-mounting the file system and also doing fsck.   
While searching the redhat knowledge base, found the following url, the problem
they were explaining it is similar to our issues, 

https://bugzilla.redhat.com/show_bug.cgi?id=213921

It is telling that it is the bug of the kernel..

Not sure whether we will proceed for the higher version of kernel or not,
please advice.

Thanks

--- tweeks <tweeks at rackspace.com> wrote:

> The EL4 kernel is wacky when it comes the the I/O scheduler locking up and
> and 
> causing ext3 to remount RO.  Various hardware hiccups can cause it to go RO. 
> 
> And when it does.. you need to tread lightly or you could lose everything.
> 
> If your ext3 filesystem had problems and remounted read-only, I would
> strongly 
> advise /against/ simply fscking it.  Often times when your filesystem has 
> gone RO, it may have been that way for 30 minutes or more.  Just rebooting ro
> 
> fscking is a great way to lose everything (i.e. everything being dumped 
> into /lost+found/"
> 
> Instead, I would recommend:
> 1) rebooting into a rescue CD environment (not allowing the rescue
> environment 
> to mount or fsck your filesystems).
> 2) Nuke the ext3 journal:
> 	tune2fs -O ^has_journal /dev/<rootfs>
>  (possibly doing the same for other problem partitions)
> 3) Do a fake fsck to see the extent of damage:
> 	fsck -fn /dev/<rootfs>
>   (after checking things out.. use "-fy" once you're sure that it's safe)
> 4) Rebuild the journal w, "tune2fs -j /dev/<rootfs>
>   (rerun at least once until "clean" result is repeatable)
> 5) Mount and check things out, 
> 	"mkdir /mnt/tmp && mount -t ext3 /dev/<rootfs> /mnt/tmp"
> 6) Gracefully umount & reboot:
> 	"umount /mnt/tmp  && shutdown -rf now && exit"
> 
> Tweeks
> 
> On Tuesday 25 September 2007 11:47, Swapana Ghosh wrote:
> > Hi Jordi,
> >
> > Thanks for your reply.  I will test the way you suggested.
> >
> > Thanks
> > -swapna
> >
> > --- Jordi Prats <jprats at cesca.es> wrote:
> > > Hi,
> > > It seems like what it happened to me. I did this to solve this issue:
> > >
> > > Mark the filesystem as it does not have a journal (take it to ext2)
> > >
> > > tune2fs -O ^has_journal /dev/cciss/c0d0p2
> > >
> > > fsck it to delete the journal:
> > >
> > > e2fsck /dev/cciss/c0d0p2
> > >
> > > Create the journal (take it back to ext3)
> > >
> > > tune2fs -j /dev/cciss/c0d0p2
> > >
> > > and finaly, remount it.
> > >
> > > In my case it was with a local disk, but with your SAN disk should be
> > > the same.
> > >
> > > Jordi
> > >
> > > Swapana Ghosh wrote:
> > > > Hi
> > > >
> > > > In our office environment few servers mostly  database servers and
> > >
> > > yesterday it
> > >
> > > > happened
> > > > for one application server(first time) the partion is getting "read
> > > > only".
> > > >
> > > > I was checking the archives, found may be similar kind of issues in the
> > > > 2007-July archives.
> > > > But how it has been solved if someone describes me that will be really
> > >
> > > helpful.
> > >
> > > > In our case, just at the problem started found the line in log file as
> > >
> > > follows:
> > > >      EXT3-fs error (device dm-12): edxt3_find_entry: reading directory
> > >
> > > #2015496
> > >
> > > > offset 2
> > > >
> > > > Then one blank line
> > > > Then the line is
> > > >
> > > >     Aborting journal on device dm-12.
> > > >     ext3_abort called
> > > >
> > > >     Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected
> > > > aborted journal
> > > >     Remounting filesysem read-only
> > > >
> > > > Then the continuous line as follows:
> > > >
> > > >
> > > >     EXT3-fs error (device dm-12) in start_transaction: Journal has
> > > > aborted
> > > >
> > > >
> > > >
> > > > The above message is continuous  until we remount the filesystem and
> > >
> > > partion
> > >
> > > > becomes
> > > > 'read-write'.
> > > >
> > > > We could not figure it out what is the root cause of the system.
> > > >
> > > > We are using individual EMC luns and are configured with LVM volume
> > > > groups
> > >
> > > and
> > >
> > > > then mounted on logical
> > > > volumes.
> > > >
> > > > Here i am giving the server description:
> > > >
> > > > ____________________________________________________________
> > > >
> > > > [root at server ~]# lsmod |grep -i qla
> > > > qla2300               130304  0
> > > > qla2xxx_conf          305924  0
> > > > qla2xxx               307448  21 qla2300
> > > > scsi_mod              117709  5 sg,emcp,qla2xxx,cciss,sd_mod
> > > >
> > > > ____________________________________________________________
> > > > [root at server ~]# cat /etc/modprobe.conf
> > > > alias eth0 tg3
> > > > alias eth1 tg3
> > > > alias eth2 e1000
> > > > alias eth3 e1000
> > > > alias eth4 e1000
> > > > alias eth5 e1000
> > > > alias bond0 bonding
> > > > alias scsi_hostadapter cciss
> > > > options bond0 max_bonds=2 miimon=100 mode=1
> > > > alias scsi_hostadapter1 qla2xxx
> > > > alias scsi_hostadapter2 qla2xxx_conf
> > > > #alias scsi_hostadapter3 qla6312
> > > > options qla2xxx  ql2xmaxqdepth=16 qlport_down_retry=64
> > > > ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0
> > > > install qla2xxx /sbin/modprobe qla2xxx_conf; /sbin/modprobe
> > > > --ignore-install qla2xxx
> > > > remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx
> > > > && { /sbin/modprobe -r --ignore-remove qla2xxx_conf; }
> > > > ###BEGINPP
> > > > include /etc/modprobe.conf.pp
> > > > ###ENDPP
> > > > ###BEGINPP
> > > > include /etc/modprobe.conf.pp
> > > > ###ENDPP
> > > > ###BEGINPP
> > > > include /etc/modprobe.conf.pp
> > > > ###ENDPP
> > > >
> > > > ________________________________________________
> > > > [root at server ~]# rpm -qa |grep -i EMC
> > > > EMCpower.LINUX-4.5.1-022
> > > >
> > > > ________________________________________________
> > > > [root at server ~]# rpm -qa|grep -i scli
> > > > scli-1.06.16-57
> > > >
> > > > ________________________________________________
> > > > [root at server ~]# rpm -qa|grep -i nav
> > > > naviagentcli-6.19.1.3.0-1
> > > >
> > > > ________________________________________________
> > > >  product: QLA2312 Fibre Channel Adapter
> > > >
> > > > ________________________________________________
> > > > [root at server ~]# rpm -qa|grep -i lvm
> > > > lvm2-2.02.06-6.0.RHEL4
> > > > system-config-lvm-1.0.19-1.0
> > > >
> > > > ________________________________________________
> > > >
> > > > If I missed any info, pl. let me know.
> > > >
> > > > It would be really appreciated if I get some hints to solve the issues
> > > >
> > > > Thanks in advance
> > > > -swapana
> >
> > ___________________________________________________________________________
> >_________
> >
> > > > Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's
> > > > updated
> > >
> > > for today's economy) at Yahoo! Games.
> 
=== message truncated ===

      ____________________________________________________________________________________
Tonight's top picks. What will you watch tonight? Preview the hottest shows on Yahoo! TV.
http://tv.yahoo.com/