Filesystem remounted read-only WAS: RE: Recent unexplained quota problems

Wed Sep 12 14:33:33 UTC 2007

> 

Mazda Motor Logistics Europe NV, Blaasveldstraat 162, B-2830 Willebroek
VAT BE 406.024.281, RPR Mechelen, ING  310-0092504-52, IBAN : BE64 3100 0925 0452, SWIFT : BBRUBEBB

-----Original Message-----
> From: redhat-list-bounces at redhat.com 
> [mailto:redhat-list-bounces at redhat.com] On Behalf Of Chris St. Pierre
> Sent: woensdag 12 september 2007 16:09
> To: General Red Hat Linux discussion list
> Subject: Re: Filesystem remounted read-only WAS: RE: Recent 
> unexplained quota problems
> 
> On Wed, 12 Sep 2007, Mertens, Bram wrote:
> > As a matter of fact I just noticed very similar behavior on 
> one of our
> > servers just now.  /var was remounted read-only while mount 
> indicated
> > nothing special.
> >
> > Dmesg however showed the following errors:
> [...snip...]
> > Info fld=0x0, Current sdb: sense key Aborted Command
> [...snip...]
> > EXT3-fs error (device dm-5) in start_transaction: Journal 
> has aborted
> >
> > This is on a virtual server (VMWare ESX) with the datafiles 
> on a SAN, so
> > it's likely to have been an issue with the SAN-connection.
> >
> > I was unable to unmount /var even after going to runlevel 
> 1.  After a
> > reboot everything appeared to be fine, just like in th OP's 
> message. But
> > I rebooted to runlevel 1 and unmounted /var to run a file 
> system check.
> >
> > E2fsck -p /dev/logvg/var indicates the filesystem is clean, as does
> > e2fsck -b 32768 -p /dev/logvg/var.
> >
> > Can we trust this filesystem?
> 
> I'm not sure.  The "sense key Aborted Command" error means that the
> target aborted the write; I don't think that's the error you'd see if
> you simply dropped connectivity.  OTOH, the fact that the fs is clean
> suggests that it was in a consistent state when the write was aborted,
> so it's not a problem that's caused any data corruption.
> 
> It could still be due to a issue on the SAN -- perhaps some dying
> media? -- so I'd see if you can get any logs off your SAN; if you can
> find proof of a dropped connection, then you're good, but you might
> want to look into it further.

Thanks,

Apparentely one of the servers involved went down for a backup and this
might have caused a so called "path failover".  The VMWare article
explicitely mentions this kind of behavior after a path failover or even
busy I/O retry but I'll follow this up with the SAP maintainers further.

And I'll keep an eye on this machine!

Regards

Bram