Problem with ext3 filesystem

Thu Dec 28 16:49:34 UTC 2006

Jan,

I did notice that you are using a recent kernel so this may not be relevant:

http://thread.gmane.org/gmane.comp.file-systems.ext3.user/2351/focus=2358

Is a thread from 2005 about block aliasing on large arrays. Specifically read 
the last two posts from Andreas and Stephen. The ideal would be that you are 
seeing the corruption after the filesystem filled to a certain capacity. 
Cause is possibly that a block pointer (in the device driver or VFS layer) 
wrapped and is now referring to the wrong block on the device causing 
corruption.

Though possibly you did find a bad memory module using memtest. It is possible 
that other modules may be bad as well and memtest isn't detecting it. Try 
removing all but one or two modules (RAM will decrease, but be sufficient for 
testing) and restest. 

At minimal, I would get a backup of the data as soon as possible so you don't 
lose anything.

Thanks,
Jeremy

On Thursday 28 December 2006 03:05, Jan wrote:
> The machine is used mainly as fileserver with samba and netatalk. this
> should be the only server applications which are placing data on the
> drive. For testing I disabled netatalk yet. I can do an fsck and the
> filesystem is fine after that. I do a remount and copy witch cp a few
> GB, do an unmount and the fsck will have errors again in the target
> directory of the copied files. In this test there are no samba or
> netatalk users connected. When I copied files with a client connected
> with samba I got the same errors.
>
> Jan
>
> > What is this machine being used for, primarily?  What types of local
> > applications/binaries are placing data on the drive?
> >
> > - Kevin
> >
> > Jan wrote:
> >> Hey,
> >>
> >> I've a problem with an ext3 filesystem and don't know how to fix it or
> >> find the failure :(
> >>
> >> The Hardware:
> >>
> >> Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with
> >> 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and
> >> avm isdn controller.
> >>
> >> Couse of the filesystem problems I run memtest and found one bad memory
> >> module which I replaced yet.
> >>
> >> The System:
> >>
> >> Kernel 2.6.19.1
> >> Debian Gnu/Linux 3.0  with e2fsck 1.37 (21-Mar-2005)
> >>
> >>
> >> I've setup one ext3 partition with around 1.4 TB on the raid5 volume.
> >> The first four month we run the raid without any problems. About two
> >> month ago I noticed that the filesystem was remounted ro. A filesystem
> >> check found a lot of errors. After a filesystem check and a new mount of
> >> the partition and copy data on the partition you get the errors again.
> >> Also with Kernel 2.6.17.3 I got this problems. A raid volume check with
> >> the areca command line tools doesn't find any errors.
> >>
> >> Errors from dmesg / kernel.log:
> >>
> >>
> >> EXT3-fs: mounted filesystem with ordered data mode.
> >> init_special_inode: bogus i_mode (113301)
> >> init_special_inode: bogus i_mode (170101)
> >> init_special_inode: bogus i_mode (115140)
> >> init_special_inode: bogus i_mode (117302)
> >> init_special_inode: bogus i_mode (111700)
> >> EXT3-fs error (device sda1): ext3_readdir: bad entry in directory
> >> #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466,
> >> name_len=34
> >>
> >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
> >> (111501)
> >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
> >> (113301)
> >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
> >> (170101)
> >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
> >> (115140)
> >> Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 -
> >> offset=0, inode=3038782558,
> >> rec_len=28425, name_len=75
> >>
> >>
> >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
> >> (111501)
> >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
> >> (113301)
> >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
> >> (170101)
> >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
> >> (115140)
> >> Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #20351025: directory entry across
> >> blocks - offset=0, inode=20353857, rec_len=13600, name_len=1
> >> Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 -
> >> offset=96, inode=20437734, rec_len=27291, name_len=6
> >> Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #21007912: directory entry across
> >> blocks - offset=296, inode=21005643, rec_len=32184, name_len=25
> >> Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode
> >> (114764)
> >> Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 -
> >> offset=24, inode=21839878, rec_len=22019, name_len=7
> >> Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode
> >> (55314)
> >> Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode
> >> (117302)
> >> Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 -
> >> offset=24, inode=22417991, rec_len=28145, name_len=8
> >>
> >> Any hints how to solve this problem or to isolate the failure ?
> >>
> >> Best regards and thanks in advance for your help,
> >>
> >> Jan
> >>
> >> _______________________________________________
> >> Ext3-users mailing list
> >> Ext3-users at redhat.com
> >> https://www.redhat.com/mailman/listinfo/ext3-users
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users