ext3 filesystem corruption - more info
Damian Menscher
menscher at uiuc.edu
Thu Apr 13 00:06:11 UTC 2006
I've seen similar errors when attempting to have a >2TB filesystem on a
32-bit RHEL3 machine. We have since implemented a 3.5TB filesystem on a
64-bit RHEL4 machine.
It would help if you could answer the question Andreas Dilger posed:
"Does this imply you have a 6TB ext3 filesystem?"
Damian
On Wed, 12 Apr 2006, Sev Binello wrote:
>
> Hi -
>
> In case this helps,
> we got the following messages from EXT3 before the filesystem went
> Does anyone recognize these.....
>
> //seems to mount okay
> Mar 25 17:52:30 acnlin82 kernel: EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,33),
> internal journal
> Mar 25 17:52:30 acnlin82 kernel: EXT3-fs: recovery complete.
> Mar 26 00:04:01 acnlin82 kernel: EXT3-fs: mounted filesystem with ordered data
> mode.
>
> //soon as nfs clients start get a TON of errors like this
> Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks:
> Freeing blocks not in datazone - block = 3443589120, count = 1
> Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks:
> Freeing blocks not in datazone - block = 2113834232, count = 1
> Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)): ext3_free_blocks:
> bit already cleared for block 49125
>
> //interspersed with some of these
> Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
> Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980, limit=1722264358
> Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
> Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576, limit=1722264358
> Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
>
> Then we had to reboot and basically filesystem is shot
>
> Thanks
> -Sev
>
> Sev Binello wrote:
> Hi -
>
> We have had 3 rather major occurances of ext3 filesystem corruption
> lately,
> i.e. so bad we couldn't event mount, and fsck didn't help.
>
> I am looking for pointers, that could help us investigate the root
> cause.
>
> In general...
> We are running RedHat WS 3 Update 6, 2.4.21-40.2.ELsmp or
> 2.4.21-37.ELsmp
>
> We have a small SAN system that looks like this
> 3 NFS servers each containing 2 Qlocic hba's connected to 2
> qlogic switches
> connected to an nstor (now xyratex) 6TB raid system containing 2
> (active-active) controllers.
>
> On the first 2 occasions one of the controllers was failed over.
> On a 3rd occasion both SAN switches lost power, and the hosts and raid
> lost communication.
>
>
> On all occasions the qlocic failover driver tried to start up on the
> alternate HBA.
>
> On the first 2 instances we sort of tried to blame the controller.
> On the 3rd, that was harder to do since the raid system and the hosts
> stayed up
> but lost communication.
>
> I can provide more detail if anyone as any info on how to proceed.
>
> Thanks
> -Sev
>
>
>
> --
>
> Sev Binello
> Brookhaven National Laboratory
> Upton, New York
> 631-344-5647
> sev at bnl.gov
>
>
Damian Menscher
--
-=#| <menscher at uiuc.edu> www.uiuc.edu/~menscher/ Ofc:(650)253-2757 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-
More information about the Ext3-users
mailing list