From cheng.je at gmail.com  Fri Dec  1 02:55:02 2006
From: cheng.je at gmail.com (Joseph Cheng)
Date: Thu, 30 Nov 2006 21:55:02 -0500
Subject: maintain 6TB filesystem + fsck
Message-ID: <c707cae10611301855tb6eea99le045f02355fa1b8f@mail.gmail.com>

i posted on rhel list about proper creating of 6tb ext3 filesystem and
tuning here.......http://www.redhat.com/archives/nahant-list/2006-November/msg00239.html
i am reading lots of ext3 links like......
http://www.redhat.com/support/wpapers/redhat/ext3/
http://lists.centos.org/pipermail/centos/2005-September/052533.html
http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html
............but not enough links for large TB arrays and ext3 :( there
is lots of old faq and information so pls show me errors of my ways
lol. after reading mailing list posts i have created filesystems like
this........
mkfs.ext3 -b 4096 -i 65536 -j -m 1 -O dir_index -L /prodspace1 /dev/sda1

i put output of mkfs.ext3 and tune2fs -l below. is there any thing
that i am mistaken about?? my other problem is fsck. i read
here.....http://listman.redhat.com/archives/ext3-users/2006-October/msg00005.html
'The major problem at this point is e2fsck time, which is about 1h/TB
for fast disks, at minimum (i.e. no major corruption found).'
.........is that ext3 or ext4? i don't know how long fsck will take w/
6TB ext3 filesystem. i first choose to disable auto fsck with 'tune2fs
-i0 -c0 /dev/sda1' but seems dangerous if filesystem become corrupt
without my knowledge! what is good balance betwen using auto fsck
after number of mounts or time pass and keeping fsck time short for
large arrays? info......
os is rhel es 4 update 4 w/ generic server hardware
storage hardware is multiple apple xserve raid w/ 6TB array each
filesystem expected to contain 10 mb files to maybe even 50 mb + 100mb

# tune2fs -l /dev/sda1
tune2fs 1.35 (28-Feb-2004)
Filesystem volume name:   /prodspace1
Last mounted on:          <not available>
Filesystem UUID:          7dccbede-5f4a-4cdf-b81e-129d3dd40106
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype
needs_recovery sparse_super large_file
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              106733568
Block count:              1707722743
Reserved block count:     17077227
Free blocks:              1704243353
Free inodes:              106733557
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         2048
Inode blocks per group:   64
Filesystem created:       Thu Nov 30 18:06:45 2006
Last mount time:          Thu Nov 30 18:26:11 2006
Last write time:          Thu Nov 30 18:26:11 2006
Mount count:              1
Maximum mount count:      38
Last checked:             Thu Nov 30 18:06:45 2006
Check interval:           15552000 (6 months)
Next check after:         Tue May 29 19:06:45 2007
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      9ef5bfcf-74fd-49a1-b2dc-88aa9b881bd9
Journal backup:           inode blocks


# mkfs.ext3 -b 4096 -i 65536 -j -m 1 -O dir_index -L /prodspace1 /dev/sda1
mke2fs 1.35 (28-Feb-2004)
Filesystem label=/prodspace1
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
106733568 inodes, 1707722743 blocks
17077227 blocks (1.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=1711276032
52116 block groups
32768 blocks per group, 32768 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
       32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
       4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
       102400000, 214990848, 512000000, 550731776, 644972544

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.



From adilger at clusterfs.com  Fri Dec  1 12:44:34 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Fri, 1 Dec 2006 04:44:34 -0800
Subject: maintain 6TB filesystem + fsck
In-Reply-To: <c707cae10611301855tb6eea99le045f02355fa1b8f@mail.gmail.com>
References: <c707cae10611301855tb6eea99le045f02355fa1b8f@mail.gmail.com>
Message-ID: <20061201124434.GR6429@schatzie.adilger.int>

On Nov 30, 2006  21:55 -0500, Joseph Cheng wrote:
> http://listman.redhat.com/archives/ext3-users/2006-October/msg00005.html
> 'The major problem at this point is e2fsck time, which is about 1h/TB
> for fast disks, at minimum (i.e. no major corruption found).'
> .........is that ext3 or ext4?

I don't think it really matters.

> i don't know how long fsck will take w/ 6TB ext3 filesystem.

You have such a filesystem, test it...

> i first choose to disable auto fsck with 'tune2fs
> -i0 -c0 /dev/sda1' but seems dangerous if filesystem become corrupt
> without my knowledge! what is good balance betwen using auto fsck
> after number of mounts or time pass and keeping fsck time short for
> large arrays? info......

You can optionally run e2fsck -fn on a relatively quiet (though mounted)
filesystem, and if it checks (relatively) clean then you could reset the
fsck time in the superblock via tune2fs.

> filesystem expected to contain 10 mb files to maybe even 50 mb + 100mb

One of the major slowdowns for e2fsck is the number of inodes, so if you
expect to have very large files you should create the filesystem with
this in mind (i.e. "mke2fs -t largefile" or "mke2fs -t largefile4").
Expect e2fsck RAM usage to be about .75 * num_inodes + .25 * num_blocks,
so in the neighbourhood of 500MB for your filesystem, so reducing inode
count would also help this a fair amount.

We are working on a patch to e2fsck and the kernel to allow e2fsck to
skip unused inodes/bitmaps in each group so that e2fsck time is improved.
It isn't quite ready for prime time, but has previously been discussed
in linux-ext4 in relation to the UNINIT flags in recent e2fsprogs.  It
would at least reduce e2fsck time to O(used_inodes) from O(total_inodes)
and also potentially avoid a lot of seeky IO.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From dushyanth at gmail.com  Wed Dec  6 15:38:08 2006
From: dushyanth at gmail.com (dushy)
Date: Wed, 6 Dec 2006 15:38:08 +0000 (UTC)
Subject: File size differences
Message-ID: <loom.20061206T161707-645@post.gmane.org>

Hey,

I have two identical machines setup with a RAID 5 array. One of them is used for
failovers and data from the master is synced everyday using rsync to the
failover machine. The data on this disks are usually intranet KB's, DB's etc..

The RAID 5 arrays are formatted using the default options i,e mkfs.ext3
/dev/Xda. The RAID controller is 3ware escalade and each disk member in the RAID
5 array are 400Gb IDE.

Now the wierd part is, after syncing the failover with the master and comparing
the size of each dir and file I find some files where the size mismatches..

[root at storage-master repositories]# du --si
"/store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
8.2k    /store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt

root at storage-slave compare]# du --si
"/store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
4.1k    /store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt

stat on the same file shows..

[root at storage-master repositories]# stat
"/store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
  File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
  Size: 1126            Blocks: 16         IO Block: 4096   regular file
Device: 801h/2049d      Inode: 10403842    Links: 1
Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
Access: 2006-09-11 12:22:24.000000000 +0530
Modify: 2004-09-23 16:45:31.000000000 +0530
Change: 2006-02-23 18:31:42.000000000 +0530

root at storage-slave compare]# stat "/store1/SystemAdministration-OldVideos/SysAd
Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
  File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
  Size: 1126            Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d      Inode: 23019536    Links: 1
Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
Access: 2001-01-28 21:10:14.000000000 +0530
Modify: 2004-09-23 16:45:31.000000000 +0530
Change: 2001-01-28 21:10:14.000000000 +0530

The number of blocks allocated on the master seems to be 16 and the failover is
8. Is this the reason for the file size difference even though the content is
the same ?

I rsynced the same file from the master to a different server and the file size
matched. Any reason why the no. of blocks allocated is different across both
this machines ?

The file i gave above is just a example and there are many more files like this.
 Also only 10% of the files have different sizes. I.e out of 263032
files/folders only 17655 have the above problem.

Below is the ext3 filesystem info on both the master and failover.

[root at storage-master repositories]#  dumpe2fs -h /dev/sda1
dumpe2fs 1.35 (28-Feb-2004)
Filesystem volume name:   /store1
Last mounted on:          <not available>
Filesystem UUID:          2368a03d-f21f-4c5e-b12a-cbd2c726237c
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode filetype
needs_recovery sparse_super large_file
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              97681408
Block count:              195354408
Reserved block count:     9767720
Free blocks:              22200635
Free inodes:              97329015
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Filesystem created:       Tue Jun 28 17:06:41 2005
Last mount time:          Tue Oct 10 20:22:02 2006
Last write time:          Tue Oct 10 20:22:02 2006
Mount count:              93
Maximum mount count:      -1
Last checked:             Thu Oct 20 19:03:56 2005
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
First orphan inode:       52691033
Default directory hash:   tea
Directory Hash Seed:      0449f257-e47d-4faf-92fa-fa497efab3a1
Journal backup:           inode blocks

[root at storage-slave compare]# dumpe2fs -h /dev/sda1
dumpe2fs 1.35 (28-Feb-2004)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          b003440d-d153-4cec-a668-94f5482d54cf
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode filetype
needs_recovery sparse_super large_file
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              97681408
Block count:              195354408
Reserved block count:     9767720
Free blocks:              26532722
Free inodes:              97327187
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Filesystem created:       Tue Jun 28 14:37:12 2005
Last mount time:          Thu Nov 16 01:11:30 2006
Last write time:          Thu Nov 16 01:11:30 2006
Mount count:              65
Maximum mount count:      -1
Last checked:             Thu Oct  6 15:11:41 2005
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      fa6b4317-d51d-4050-b0f3-c72b45148777
Journal backup:           inode blocks

tia
dushy







From jpiszcz at lucidpixels.com  Wed Dec  6 19:22:42 2006
From: jpiszcz at lucidpixels.com (Justin Piszcz)
Date: Wed, 6 Dec 2006 14:22:42 -0500 (EST)
Subject: File size differences
In-Reply-To: <loom.20061206T161707-645@post.gmane.org>
References: <loom.20061206T161707-645@post.gmane.org>
Message-ID: <Pine.LNX.4.64.0612061422160.5087@p34.internal.lan>

Have you MD5SUM'd the file on both sides?  If it is the same, then you 
have no problems.

% md5sum filename

On each side, compare output.

Justin.

On Wed, 6 Dec 2006, dushy wrote:

> Hey,
> 
> I have two identical machines setup with a RAID 5 array. One of them is used for
> failovers and data from the master is synced everyday using rsync to the
> failover machine. The data on this disks are usually intranet KB's, DB's etc..
> 
> The RAID 5 arrays are formatted using the default options i,e mkfs.ext3
> /dev/Xda. The RAID controller is 3ware escalade and each disk member in the RAID
> 5 array are 400Gb IDE.
> 
> Now the wierd part is, after syncing the failover with the master and comparing
> the size of each dir and file I find some files where the size mismatches..
> 
> [root at storage-master repositories]# du --si
> "/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
> 8.2k    /store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt
> 
> root at storage-slave compare]# du --si
> "/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
> 4.1k    /store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt
> 
> stat on the same file shows..
> 
> [root at storage-master repositories]# stat
> "/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
>   File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
>   Size: 1126            Blocks: 16         IO Block: 4096   regular file
> Device: 801h/2049d      Inode: 10403842    Links: 1
> Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
> Access: 2006-09-11 12:22:24.000000000 +0530
> Modify: 2004-09-23 16:45:31.000000000 +0530
> Change: 2006-02-23 18:31:42.000000000 +0530
> 
> root at storage-slave compare]# stat "/store1/SystemAdministration-OldVideos/SysAd
> Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
>   File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
>   Size: 1126            Blocks: 8          IO Block: 4096   regular file
> Device: 801h/2049d      Inode: 23019536    Links: 1
> Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
> Access: 2001-01-28 21:10:14.000000000 +0530
> Modify: 2004-09-23 16:45:31.000000000 +0530
> Change: 2001-01-28 21:10:14.000000000 +0530
> 
> The number of blocks allocated on the master seems to be 16 and the failover is
> 8. Is this the reason for the file size difference even though the content is
> the same ?
> 
> I rsynced the same file from the master to a different server and the file size
> matched. Any reason why the no. of blocks allocated is different across both
> this machines ?
> 
> The file i gave above is just a example and there are many more files like this.
>  Also only 10% of the files have different sizes. I.e out of 263032
> files/folders only 17655 have the above problem.
> 
> Below is the ext3 filesystem info on both the master and failover.
> 
> [root at storage-master repositories]#  dumpe2fs -h /dev/sda1
> dumpe2fs 1.35 (28-Feb-2004)
> Filesystem volume name:   /store1
> Last mounted on:          <not available>
> Filesystem UUID:          2368a03d-f21f-4c5e-b12a-cbd2c726237c
> Filesystem magic number:  0xEF53
> Filesystem revision #:    1 (dynamic)
> Filesystem features:      has_journal ext_attr resize_inode filetype
> needs_recovery sparse_super large_file
> Default mount options:    (none)
> Filesystem state:         clean
> Errors behavior:          Continue
> Filesystem OS type:       Linux
> Inode count:              97681408
> Block count:              195354408
> Reserved block count:     9767720
> Free blocks:              22200635
> Free inodes:              97329015
> First block:              0
> Block size:               4096
> Fragment size:            4096
> Reserved GDT blocks:      1024
> Blocks per group:         32768
> Fragments per group:      32768
> Inodes per group:         16384
> Inode blocks per group:   512
> Filesystem created:       Tue Jun 28 17:06:41 2005
> Last mount time:          Tue Oct 10 20:22:02 2006
> Last write time:          Tue Oct 10 20:22:02 2006
> Mount count:              93
> Maximum mount count:      -1
> Last checked:             Thu Oct 20 19:03:56 2005
> Check interval:           0 (<none>)
> Reserved blocks uid:      0 (user root)
> Reserved blocks gid:      0 (group root)
> First inode:              11
> Inode size:               128
> Journal inode:            8
> First orphan inode:       52691033
> Default directory hash:   tea
> Directory Hash Seed:      0449f257-e47d-4faf-92fa-fa497efab3a1
> Journal backup:           inode blocks
> 
> [root at storage-slave compare]# dumpe2fs -h /dev/sda1
> dumpe2fs 1.35 (28-Feb-2004)
> Filesystem volume name:   <none>
> Last mounted on:          <not available>
> Filesystem UUID:          b003440d-d153-4cec-a668-94f5482d54cf
> Filesystem magic number:  0xEF53
> Filesystem revision #:    1 (dynamic)
> Filesystem features:      has_journal ext_attr resize_inode filetype
> needs_recovery sparse_super large_file
> Default mount options:    (none)
> Filesystem state:         clean
> Errors behavior:          Continue
> Filesystem OS type:       Linux
> Inode count:              97681408
> Block count:              195354408
> Reserved block count:     9767720
> Free blocks:              26532722
> Free inodes:              97327187
> First block:              0
> Block size:               4096
> Fragment size:            4096
> Reserved GDT blocks:      1024
> Blocks per group:         32768
> Fragments per group:      32768
> Inodes per group:         16384
> Inode blocks per group:   512
> Filesystem created:       Tue Jun 28 14:37:12 2005
> Last mount time:          Thu Nov 16 01:11:30 2006
> Last write time:          Thu Nov 16 01:11:30 2006
> Mount count:              65
> Maximum mount count:      -1
> Last checked:             Thu Oct  6 15:11:41 2005
> Check interval:           0 (<none>)
> Reserved blocks uid:      0 (user root)
> Reserved blocks gid:      0 (group root)
> First inode:              11
> Inode size:               128
> Journal inode:            8
> Default directory hash:   tea
> Directory Hash Seed:      fa6b4317-d51d-4050-b0f3-c72b45148777
> Journal backup:           inode blocks
> 
> tia
> dushy
> 
> 
> 
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
> 



From adilger at clusterfs.com  Wed Dec  6 19:54:34 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Wed, 6 Dec 2006 12:54:34 -0700
Subject: File size differences
In-Reply-To: <loom.20061206T161707-645@post.gmane.org>
References: <loom.20061206T161707-645@post.gmane.org>
Message-ID: <20061206195434.GB5937@schatzie.adilger.int>

On Dec 06, 2006  15:38 +0000, dushy wrote:
> [root at storage-master repositories]# stat
> "/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
>   File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
>   Size: 1126            Blocks: 16         IO Block: 4096   regular file
> Device: 801h/2049d      Inode: 10403842    Links: 1
> Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
> Access: 2006-09-11 12:22:24.000000000 +0530
> Modify: 2004-09-23 16:45:31.000000000 +0530
> Change: 2006-02-23 18:31:42.000000000 +0530
> 
> root at storage-slave compare]# stat "/store1/SystemAdministration-OldVideos/SysAd
> Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
>   File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
>   Size: 1126            Blocks: 8          IO Block: 4096   regular file
> Device: 801h/2049d      Inode: 23019536    Links: 1
> Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
> Access: 2001-01-28 21:10:14.000000000 +0530
> Modify: 2004-09-23 16:45:31.000000000 +0530
> Change: 2001-01-28 21:10:14.000000000 +0530

I'd suspect you have SELinux enabled on one of the nodes and not the
other?  Could also be ACLs. It is likely adding a 4kB EA block to each file.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From mnalis-ml at voyager.hr  Thu Dec  7 20:18:18 2006
From: mnalis-ml at voyager.hr (Matija Nalis)
Date: Thu, 7 Dec 2006 21:18:18 +0100
Subject: File size differences
In-Reply-To: <loom.20061206T161707-645@post.gmane.org>
References: <loom.20061206T161707-645@post.gmane.org>
Message-ID: <20061207201818.GA3445@eagle102.home.lan>

On Wed, Dec 06, 2006 at 03:38:08PM +0000, dushy wrote:
> Now the wierd part is, after syncing the failover with the master and comparing
> the size of each dir and file I find some files where the size mismatches..
>   Size: 1126            Blocks: 16         IO Block: 4096   regular file
>   Size: 1126            Blocks: 8          IO Block: 4096   regular file

maybe those files contain enough zero-bytes, and rsync has made a 
sparse file ?

-- 
Opinions above are GNU-copylefted.



From bruno at wolff.to  Sat Dec  9 22:04:59 2006
From: bruno at wolff.to (Bruno Wolff III)
Date: Sat, 9 Dec 2006 16:04:59 -0600
Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching
Message-ID: <20061209220459.GA6202@wolff.to>

I have been trying to figure out whether I can enable write caching on my
PATA hard drives (WD3200JB) and have fsync not return until data is
safely on the platters. I am also running software raid.
This is currently on FC5 (though soon to be FC6) with a 2.6.18 kernel.

>From snippets I have found on the net, it looks like write barriers are
pushed down through software raid when using raid 1. So that if I mount
the file systems with data=ordered and barrier=1, I think I should be
OK, but I was hoping to get a more definitive answer.

It also looks like barrier=1 is or will be the default for ext3. Is there
a way I can check if this is the case on my system?

/proc/mounts doesn't show the barrier option when I use barrier=1 or don't
specify it at all. mount -lv shows the barrier option (when it was used
for mounting), but not the data option. I am not sure if either of these
are using the same data that the ext3 driver is using.



From lists at nerdbynature.de  Sat Dec  9 22:51:32 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Sat, 9 Dec 2006 22:51:32 +0000 (GMT)
Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching
In-Reply-To: <20061209220459.GA6202@wolff.to>
References: <20061209220459.GA6202@wolff.to>
Message-ID: <Pine.LNX.4.64.0612092236240.22257@sheep.housecafe.de>

On Sat, 9 Dec 2006, Bruno Wolff III wrote:
> It also looks like barrier=1 is or will be the default for ext3. Is there
> a way I can check if this is the case on my system?

Hm, indeed: if write barriers are not available, mounting an XFS 
filesystem shows:

> Filesystem "md0": Disabling barriers, not supported by the underlying device

Mounting the same device when formatted with ext3 does not show this 
message nor does /proc/mounts reveal anything....could this be tweaked 
somehow?

Christian.
-- 
BOFH excuse #288:

Hard drive sleeping. Let it wake up on it's own...



From ric at emc.com  Mon Dec 11 16:14:52 2006
From: ric at emc.com (Ric Wheeler)
Date: Mon, 11 Dec 2006 11:14:52 -0500
Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching
In-Reply-To: <20061209220459.GA6202@wolff.to>
References: <20061209220459.GA6202@wolff.to>
Message-ID: <457D83FC.1040709@emc.com>



Bruno Wolff III wrote:
> I have been trying to figure out whether I can enable write caching on my
> PATA hard drives (WD3200JB) and have fsync not return until data is
> safely on the platters. I am also running software raid.
> This is currently on FC5 (though soon to be FC6) with a 2.6.18 kernel.
> 
>>From snippets I have found on the net, it looks like write barriers are
> pushed down through software raid when using raid 1. So that if I mount
> the file systems with data=ordered and barrier=1, I think I should be
> OK, but I was hoping to get a more definitive answer.
> 
> It also looks like barrier=1 is or will be the default for ext3. Is there
> a way I can check if this is the case on my system?
> 
> /proc/mounts doesn't show the barrier option when I use barrier=1 or don't
> specify it at all. mount -lv shows the barrier option (when it was used
> for mounting), but not the data option. I am not sure if either of these
> are using the same data that the ext3 driver is using.

You can always do a sanity test on the barrier by timing how many synchronous 
files/sec you can create (i.e., create/write/fsync/close).  Speeds vary 
depending on what kind of drive you have, journal mode, etc, but you will always 
see much faster times with the barrier off than on while writing small files 
(say 10K).

regards,

ric



From bruno at wolff.to  Mon Dec 11 16:36:56 2006
From: bruno at wolff.to (Bruno Wolff III)
Date: Mon, 11 Dec 2006 10:36:56 -0600
Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching
In-Reply-To: <457D83FC.1040709@emc.com>
References: <20061209220459.GA6202@wolff.to> <457D83FC.1040709@emc.com>
Message-ID: <20061211163656.GB28931@wolff.to>

On Mon, Dec 11, 2006 at 11:14:52 -0500,
  Ric Wheeler <ric at emc.com> wrote:
> 
> You can always do a sanity test on the barrier by timing how many 
> synchronous files/sec you can create (i.e., create/write/fsync/close).  
> Speeds vary depending on what kind of drive you have, journal mode, etc, 
> but you will always see much faster times with the barrier off than on 
> while writing small files (say 10K).

That's probably a good idea in any case. Down the road I will be interested
in whether barriers work through encrypted file systems and this will be a good
test to have available.

I should get at most 120 commits per second if write barriers are working;
so I think that should be easy to detect.

Is there already a tool out there that does this? It shouldn't be hard to
write something simple, but maybe someone has written something fancy
already.



From ric at emc.com  Mon Dec 11 17:44:40 2006
From: ric at emc.com (Ric Wheeler)
Date: Mon, 11 Dec 2006 12:44:40 -0500
Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching
In-Reply-To: <20061211163656.GB28931@wolff.to>
References: <20061209220459.GA6202@wolff.to> <457D83FC.1040709@emc.com>
	<20061211163656.GB28931@wolff.to>
Message-ID: <457D9908.5080602@emc.com>



Bruno Wolff III wrote:
> On Mon, Dec 11, 2006 at 11:14:52 -0500,
>   Ric Wheeler <ric at emc.com> wrote:
>> You can always do a sanity test on the barrier by timing how many 
>> synchronous files/sec you can create (i.e., create/write/fsync/close).  
>> Speeds vary depending on what kind of drive you have, journal mode, etc, 
>> but you will always see much faster times with the barrier off than on 
>> while writing small files (say 10K).
> 
> That's probably a good idea in any case. Down the road I will be interested
> in whether barriers work through encrypted file systems and this will be a good
> test to have available.
> 
> I should get at most 120 commits per second if write barriers are working;
> so I think that should be easy to detect.
> 
> Is there already a tool out there that does this? It shouldn't be hard to
> write something simple, but maybe someone has written something fancy
> already.

I will send you the test code that I use & some test runs,

ric



From bruno at wolff.to  Mon Dec 11 18:48:38 2006
From: bruno at wolff.to (Bruno Wolff III)
Date: Mon, 11 Dec 2006 12:48:38 -0600
Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching
In-Reply-To: <457D9908.5080602@emc.com>
References: <20061209220459.GA6202@wolff.to> <457D83FC.1040709@emc.com>
	<20061211163656.GB28931@wolff.to> <457D9908.5080602@emc.com>
Message-ID: <20061211184838.GA10516@wolff.to>

On Mon, Dec 11, 2006 at 12:44:40 -0500,
  Ric Wheeler <ric at emc.com> wrote:
> 
> I will send you the test code that I use & some test runs,

I tried out Ric's test code and it does appear that the barrier is working
under fc5 using software raid. For the quick test I didn't have an idle
system. If I ran with either no barrier option and write caching enabled
I saw very roughly a 10x speed up. The test seemed to show running with
write caching disabled was about 20% faster than with just using barriers.
My theory is that because there was a lot of other disk activity, the cache
flushes forced by using barriers was writing a lot of disk blocks from
outside the test making it report slower numbers, while in theory my system
throughput was actually better.
Tonight I can try the test on a system without a lot of disk activity and
see if that makes much difference.

Thanks Ric.



From ric at emc.com  Mon Dec 11 19:20:12 2006
From: ric at emc.com (Ric Wheeler)
Date: Mon, 11 Dec 2006 14:20:12 -0500
Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching
In-Reply-To: <20061211184838.GA10516@wolff.to>
References: <20061209220459.GA6202@wolff.to> <457D83FC.1040709@emc.com>
	<20061211163656.GB28931@wolff.to> <457D9908.5080602@emc.com>
	<20061211184838.GA10516@wolff.to>
Message-ID: <457DAF6C.1000607@emc.com>

Bruno Wolff III wrote:
> On Mon, Dec 11, 2006 at 12:44:40 -0500,
>   Ric Wheeler <ric at emc.com> wrote:
> 
>>I will send you the test code that I use & some test runs,
> 
> 
> I tried out Ric's test code and it does appear that the barrier is working
> under fc5 using software raid. For the quick test I didn't have an idle
> system. If I ran with either no barrier option and write caching enabled
> I saw very roughly a 10x speed up. The test seemed to show running with
> write caching disabled was about 20% faster than with just using barriers.
> My theory is that because there was a lot of other disk activity, the cache
> flushes forced by using barriers was writing a lot of disk blocks from
> outside the test making it report slower numbers, while in theory my system
> throughput was actually better.
> Tonight I can try the test on a system without a lot of disk activity and
> see if that makes much difference.
> 
> Thanks Ric.
> 
Glad to see that it is useful.

Thanks really go out to Jens Axboe and Chris Mason for getting this all 
to work correctly in the first place ;-)

As I mentioned in our private email exchange, you should see better 
performance with the write barrier on vs write cache disabled for some 
scenarios (large file writes and using data journal mode for small files 
are the two cases that I have noticed).

ric




From chris at cjx.com  Wed Dec 13 13:47:06 2006
From: chris at cjx.com (Chris Allen)
Date: Wed, 13 Dec 2006 13:47:06 +0000
Subject: ext3 4TB fs limit on amd64 (FAQ?)
In-Reply-To: <Pine.LNX.4.64.0611280652020.19073@sheep.housecafe.de>
References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de>
	<Pine.LNX.4.64.0611280652020.19073@sheep.housecafe.de>
Message-ID: <4580045A.4060806@cjx.com>



Christian Kujau wrote:
> On Sun, 26 Nov 2006, Ralf Gross wrote:
>> I've a question about the max. ext3 FS size. The ext3 FAQ explains that
>> the limit is 4TB.
>
> Hm, strange: I'm pretty sure that mkfs.ext3 understands bigger 
> blocksizes for quite a while now. Then again, the FAQ says "Version: 
> 2004-10-14"...
>
> So, although I'd really love to have this information (and the FAQ!) on
> http://e2fsprogs.sf.net/  this is what I found:
>
> blocksize    file size limit     filesystem size limit
>  1 KiB        16448 MiB (~ 16 GiB)    2048 GiB (= 2 TiB)
>  2 KiB         256 GiB         8192 GiB (= 8 TiB)
>  4 KiB        2048 GiB (= 2 TiB)     16384 GiB (= 16 TiB)
>  8 KiB         65568 GiB (~ 64 TiB)     32768 GiB (= 32 TiB)
>
> Note that an 8 KiB blocksize is only supported on systems with 8 KiB 
> pagesize (i.e. linux/alpha).
>
>

We use 6TB ext3 filesystems over vanilla Fedora Core 5 on several 
heavily loaded systems.
All perform fine without any obvious problems.




From dushyanth at gmail.com  Wed Dec 13 15:04:01 2006
From: dushyanth at gmail.com (dushy)
Date: Wed, 13 Dec 2006 15:04:01 +0000 (UTC)
Subject: File size differences
References: <loom.20061206T161707-645@post.gmane.org>
	<20061206195434.GB5937@schatzie.adilger.int>
Message-ID: <loom.20061213T160240-632@post.gmane.org>

Hey,

> I'd suspect you have SELinux enabled on one of the nodes and not the
> other?  Could also be ACLs. It is likely adding a 4kB EA block to each file.

I dont have SELinux enabled on both sides. Iam checking on ACL's.

tia
dushy
 




From dushyanth at gmail.com  Wed Dec 13 15:06:35 2006
From: dushyanth at gmail.com (dushy)
Date: Wed, 13 Dec 2006 15:06:35 +0000 (UTC)
Subject: File size differences
References: <loom.20061206T161707-645@post.gmane.org>
	<Pine.LNX.4.64.0612061422160.5087@p34.internal.lan>
Message-ID: <loom.20061213T160436-716@post.gmane.org>

Hey,
 
> Have you MD5SUM'd the file on both sides?  If it is the same, then you 
> have no problems.
> 
> % md5sum filename
> 
> On each side, compare output.

md5sum on the affected files on both sides are the same. As i said earlier, the
file is exactly the same on both sides excpet the file size which is different.
The size on master is exactly 4.1k bigger on only some files.

[root at storage-master repositories]# md5sum
"/store1/SystemAdministration-OldVideos/SysAd Training/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
649efcb46ad483abcf1edd334e16d76b  /store1/SystemAdministration-OldVideos/SysAd
Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt

[root at storage-slave compare]# md5sum
"/store1/SystemAdministration-OldVideos/SysAdTraining/Technology
Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
649efcb46ad483abcf1edd334e16d76b  /store1/SystemAdministration-OldVideos/SysAd
Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt

Iam just curious as to why there should be a file size diffference in
only certain files.

tia
dushy



From dushyanth at gmail.com  Wed Dec 13 11:18:03 2006
From: dushyanth at gmail.com (dushy)
Date: Wed, 13 Dec 2006 16:48:03 +0530
Subject: File size differences
In-Reply-To: <20061206195434.GB5937@schatzie.adilger.int>
References: <loom.20061206T161707-645@post.gmane.org>
	<20061206195434.GB5937@schatzie.adilger.int>
Message-ID: <497509650612130318m1fca1563q4201d24dd28099ab@mail.gmail.com>

Hey,


On 12/7/06, Andreas Dilger <adilger at clusterfs.com> wrote:
> On Dec 06, 2006  15:38 +0000, dushy wrote:
> > [root at storage-master repositories]# stat
> > "/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
> >   File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
> >   Size: 1126            Blocks: 16         IO Block: 4096   regular file
> > Device: 801h/2049d      Inode: 10403842    Links: 1
> > Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
> > Access: 2006-09-11 12:22:24.000000000 +0530
> > Modify: 2004-09-23 16:45:31.000000000 +0530
> > Change: 2006-02-23 18:31:42.000000000 +0530
> >
> > root at storage-slave compare]# stat "/store1/SystemAdministration-OldVideos/SysAd
> > Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt"
> >   File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology
> > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt'
> >   Size: 1126            Blocks: 8          IO Block: 4096   regular file
> > Device: 801h/2049d      Inode: 23019536    Links: 1
> > Access: (0775/-rwxrwxr-x)  Uid: (   48/  apache)   Gid: (   48/  apache)
> > Access: 2001-01-28 21:10:14.000000000 +0530
> > Modify: 2004-09-23 16:45:31.000000000 +0530
> > Change: 2001-01-28 21:10:14.000000000 +0530
>
> I'd suspect you have SELinux enabled on one of the nodes and not the
> other?  Could also be ACLs. It is likely adding a 4kB EA block to each file.

I dont have SELinux enabled on either side. Iam checking abt ACL's and
will update accordingly.

tia
dushy



From ext3 at jks.tupari.net  Tue Dec 19 22:55:41 2006
From: ext3 at jks.tupari.net (ext3 at jks.tupari.net)
Date: Tue, 19 Dec 2006 17:55:41 -0500 (EST)
Subject: Does ext3 prevent partial page writes?
Message-ID: <Pine.LNX.4.63.0612191753300.22217@tupari.net>

Basically I want to know if I can turn off full_page_writes in my postgres 
config.
http://www.postgresql.org/docs/8.2/interactive/wal-reliability.html



From lists at nerdbynature.de  Wed Dec 20 06:42:20 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Wed, 20 Dec 2006 06:42:20 +0000 (GMT)
Subject: Does ext3 prevent partial page writes?
In-Reply-To: <Pine.LNX.4.63.0612191753300.22217@tupari.net>
References: <Pine.LNX.4.63.0612191753300.22217@tupari.net>
Message-ID: <Pine.LNX.4.64.0612200638500.22257@sheep.housecafe.de>

On Tue, 19 Dec 2006, ext3 at jks.tupari.net wrote:
> Basically I want to know if I can turn off full_page_writes in my postgres 
> config.

if your devices support write barriers, you can turn off this option, 
methinks (mount -o barrier=1). Of course, you'll run a few tests before 
going live, right?

Christian.
-- 
BOFH excuse #449:

greenpeace free'd the mallocs



From bruno at wolff.to  Wed Dec 20 16:58:25 2006
From: bruno at wolff.to (Bruno Wolff III)
Date: Wed, 20 Dec 2006 10:58:25 -0600
Subject: Does ext3 prevent partial page writes?
In-Reply-To: <Pine.LNX.4.64.0612200638500.22257@sheep.housecafe.de>
References: <Pine.LNX.4.63.0612191753300.22217@tupari.net>
	<Pine.LNX.4.64.0612200638500.22257@sheep.housecafe.de>
Message-ID: <20061220165825.GC3732@wolff.to>

On Wed, Dec 20, 2006 at 06:42:20 +0000,
  Christian Kujau <lists at nerdbynature.de> wrote:
> On Tue, 19 Dec 2006, ext3 at jks.tupari.net wrote:
> >Basically I want to know if I can turn off full_page_writes in my postgres 
> >config.
> 
> if your devices support write barriers, you can turn off this option, 
> methinks (mount -o barrier=1). Of course, you'll run a few tests before 
> going live, right?

I have tested write barriers in FC5 using ext3 on top of sofware raid (raid
1 is the only type of raid that supports write barriers) and it seems to
be working correctly. If you use this with the data=journal option I would
expect you would be OK. It might be a good idea to check on the postgres
list about the exact semantics you need for this.

I also asked about write barriers on the dm-crypt list and they are supported
there, but there is probably a problem with it on 2.6.19 kernels on SMP
machines relating to a change to use per cpu work queues.



From alazarev at itg.uiuc.edu  Wed Dec 20 19:58:12 2006
From: alazarev at itg.uiuc.edu (Alex Lazarevich)
Date: Wed, 20 Dec 2006 13:58:12 -0600
Subject: ext3 4TB fs limit on amd64 (FAQ?)
In-Reply-To: <Pine.LNX.4.64.0611280652020.19073@sheep.housecafe.de>
References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de>
	<Pine.LNX.4.64.0611280652020.19073@sheep.housecafe.de>
Message-ID: <458995D4.1060206@itg.uiuc.edu>

Christian Kujau wrote:
> On Sun, 26 Nov 2006, Ralf Gross wrote:
>> I've a question about the max. ext3 FS size. The ext3 FAQ explains that
>> the limit is 4TB.
>
> Hm, strange: I'm pretty sure that mkfs.ext3 understands bigger 
> blocksizes for quite a while now. Then again, the FAQ says "Version: 
> 2004-10-14"...
>
> So, although I'd really love to have this information (and the FAQ!) on
> http://e2fsprogs.sf.net/  this is what I found:
>
> blocksize    file size limit     filesystem size limit
>  1 KiB        16448 MiB (~ 16 GiB)    2048 GiB (= 2 TiB)
>  2 KiB         256 GiB         8192 GiB (= 8 TiB)
>  4 KiB        2048 GiB (= 2 TiB)     16384 GiB (= 16 TiB)
>  8 KiB         65568 GiB (~ 64 TiB)     32768 GiB (= 32 TiB)
>
> Note that an 8 KiB blocksize is only supported on systems with 8 KiB 
> pagesize (i.e. linux/alpha).
>
Is this still true? 8KiB pagefile only on linux/alpha? We run RHEL4-AS 
x64_86 on AMD Opteron, and the OS is going to let me create the 8192 
block size on a 9TB partition, but it's giving a warning:

partition is: Disk geometry for /dev/sda: 0.000-9296872.000 megabytes

[root at dudemiestro ~]# mkfs.ext3 -b 8192 -m 1 /dev/sda1
Warning: blocksize 8192 not usable on most systems.
mke2fs 1.35 (28-Feb-2004)
mkfs.ext3: 8192-byte blocks too big for system (max 4096)
Proceed anyway? (y,n)

Anyone do this before and had any kind of success?

Thanks,

Alex



From adilger at clusterfs.com  Wed Dec 20 21:21:22 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Wed, 20 Dec 2006 14:21:22 -0700
Subject: ext3 4TB fs limit on amd64 (FAQ?)
In-Reply-To: <458995D4.1060206@itg.uiuc.edu>
References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de>
	<Pine.LNX.4.64.0611280652020.19073@sheep.housecafe.de>
	<458995D4.1060206@itg.uiuc.edu>
Message-ID: <20061220212122.GB5937@schatzie.adilger.int>

On Dec 20, 2006  13:58 -0600, Alex Lazarevich wrote:
> Is this still true? 8KiB pagefile only on linux/alpha? We run RHEL4-AS 
> x64_86 on AMD Opteron, and the OS is going to let me create the 8192 
> block size on a 9TB partition, but it's giving a warning:
> 
> partition is: Disk geometry for /dev/sda: 0.000-9296872.000 megabytes
> 
> [root at dudemiestro ~]# mkfs.ext3 -b 8192 -m 1 /dev/sda1
> Warning: blocksize 8192 not usable on most systems.
> mke2fs 1.35 (28-Feb-2004)
> mkfs.ext3: 8192-byte blocks too big for system (max 4096)
> Proceed anyway? (y,n)

Sadly, x86_64 also has 4kB PAGE_SIZE like i386 (for compatibility
reason or whatever).  I'd always hoped for 8kB+ PAGE_SIZE when we
got to 64-bit but it seems this will never happen.

> Anyone do this before and had any kind of success?

You need Alpha, PPC, ia64, mips, arm?, or possibly other non-*86
arch to have large PAGE_SIZE, or fix the VM to support larger
PAGE_SIZE than the hardware.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From jan at netropol.de  Wed Dec 27 23:06:05 2006
From: jan at netropol.de (Jan)
Date: Wed, 27 Dec 2006 23:06:05 +0000
Subject: Problem with ext3 filesystem
Message-ID: <4592FC5D.70101@netropol.de>

Hey,

I've a problem with an ext3 filesystem and don't know how to fix it or
find the failure :(

The Hardware:

Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with
400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and
avm isdn controller.

Couse of the filesystem problems I run memtest and found one bad memory
module which I replaced yet.

The System:

Kernel 2.6.19.1
Debian Gnu/Linux 3.0  with e2fsck 1.37 (21-Mar-2005)


I've setup one ext3 partition with around 1.4 TB on the raid5 volume.
The first four month we run the raid without any problems. About two
month ago I noticed that the filesystem was remounted ro. A filesystem
check found a lot of errors. After a filesystem check and a new mount of
the partition and copy data on the partition you get the errors again.
Also with Kernel 2.6.17.3 I got this problems. A raid volume check with
the areca command line tools doesn't find any errors.

Errors from dmesg / kernel.log:


EXT3-fs: mounted filesystem with ordered data mode.
init_special_inode: bogus i_mode (113301)
init_special_inode: bogus i_mode (170101)
init_special_inode: bogus i_mode (115140)
init_special_inode: bogus i_mode (117302)
init_special_inode: bogus i_mode (111700)
EXT3-fs error (device sda1): ext3_readdir: bad entry in directory
#143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466,
name_len=34

Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (111501)
Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (113301)
Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (170101)
Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (115140)
Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1):
ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 -
offset=0, inode=3038782558,
rec_len=28425, name_len=75


Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (111501)
Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (113301)
Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (170101)
Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (115140)
Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1):
ext3_readdir: bad entry in directory #20351025: directory entry across
blocks - offset=0, inode=20353857, rec_len=13600, name_len=1
Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1):
ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 -
offset=96, inode=20437734, rec_len=27291, name_len=6
Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1):
ext3_readdir: bad entry in directory #21007912: directory entry across
blocks - offset=296, inode=21005643, rec_len=32184, name_len=25
Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode (114764)
Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1):
ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 -
offset=24, inode=21839878, rec_len=22019, name_len=7
Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode (55314)
Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode (117302)
Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1):
ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 -
offset=24, inode=22417991, rec_len=28145, name_len=8

Any hints how to solve this problem or to isolate the failure ?

Best regards and thanks in advance for your help,

Jan



From jan at netropol.de  Thu Dec 28 08:05:19 2006
From: jan at netropol.de (Jan)
Date: Thu, 28 Dec 2006 08:05:19 +0000
Subject: Problem with ext3 filesystem
In-Reply-To: <4592F2F6.6090708@criminalinfo.net>
References: <4592FC5D.70101@netropol.de> <4592F2F6.6090708@criminalinfo.net>
Message-ID: <45937ABF.8040206@netropol.de>

The machine is used mainly as fileserver with samba and netatalk. this
should be the only server applications which are placing data on the
drive. For testing I disabled netatalk yet. I can do an fsck and the
filesystem is fine after that. I do a remount and copy witch cp a few
GB, do an unmount and the fsck will have errors again in the target
directory of the copied files. In this test there are no samba or
netatalk users connected. When I copied files with a client connected
with samba I got the same errors.

Jan

> What is this machine being used for, primarily?  What types of local
> applications/binaries are placing data on the drive?
> 
> - Kevin
> 
> Jan wrote:
>> Hey,
>>
>> I've a problem with an ext3 filesystem and don't know how to fix it or
>> find the failure :(
>>
>> The Hardware:
>>
>> Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with
>> 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and
>> avm isdn controller.
>>
>> Couse of the filesystem problems I run memtest and found one bad memory
>> module which I replaced yet.
>>
>> The System:
>>
>> Kernel 2.6.19.1
>> Debian Gnu/Linux 3.0  with e2fsck 1.37 (21-Mar-2005)
>>
>>
>> I've setup one ext3 partition with around 1.4 TB on the raid5 volume.
>> The first four month we run the raid without any problems. About two
>> month ago I noticed that the filesystem was remounted ro. A filesystem
>> check found a lot of errors. After a filesystem check and a new mount of
>> the partition and copy data on the partition you get the errors again.
>> Also with Kernel 2.6.17.3 I got this problems. A raid volume check with
>> the areca command line tools doesn't find any errors.
>>
>> Errors from dmesg / kernel.log:
>>
>>
>> EXT3-fs: mounted filesystem with ordered data mode.
>> init_special_inode: bogus i_mode (113301)
>> init_special_inode: bogus i_mode (170101)
>> init_special_inode: bogus i_mode (115140)
>> init_special_inode: bogus i_mode (117302)
>> init_special_inode: bogus i_mode (111700)
>> EXT3-fs error (device sda1): ext3_readdir: bad entry in directory
>> #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466,
>> name_len=34
>>
>> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
>> (111501)
>> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
>> (113301)
>> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
>> (170101)
>> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
>> (115140)
>> Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1):
>> ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 -
>> offset=0, inode=3038782558,
>> rec_len=28425, name_len=75
>>
>>
>> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
>> (111501)
>> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
>> (113301)
>> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
>> (170101)
>> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
>> (115140)
>> Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1):
>> ext3_readdir: bad entry in directory #20351025: directory entry across
>> blocks - offset=0, inode=20353857, rec_len=13600, name_len=1
>> Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1):
>> ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 -
>> offset=96, inode=20437734, rec_len=27291, name_len=6
>> Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1):
>> ext3_readdir: bad entry in directory #21007912: directory entry across
>> blocks - offset=296, inode=21005643, rec_len=32184, name_len=25
>> Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode
>> (114764)
>> Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1):
>> ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 -
>> offset=24, inode=21839878, rec_len=22019, name_len=7
>> Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode
>> (55314)
>> Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode
>> (117302)
>> Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1):
>> ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 -
>> offset=24, inode=22417991, rec_len=28145, name_len=8
>>
>> Any hints how to solve this problem or to isolate the failure ?
>>
>> Best regards and thanks in advance for your help,
>>
>> Jan
>>
>> _______________________________________________
>> Ext3-users mailing list
>> Ext3-users at redhat.com
>> https://www.redhat.com/mailman/listinfo/ext3-users
> 



From jrumpf at heavyload.net  Thu Dec 28 16:49:34 2006
From: jrumpf at heavyload.net (Jeremy Rumpf)
Date: Thu, 28 Dec 2006 11:49:34 -0500
Subject: Problem with ext3 filesystem
In-Reply-To: <45937ABF.8040206@netropol.de>
References: <4592FC5D.70101@netropol.de> <4592F2F6.6090708@criminalinfo.net>
	<45937ABF.8040206@netropol.de>
Message-ID: <200612281149.34320.jrumpf@heavyload.net>

Jan,

I did notice that you are using a recent kernel so this may not be relevant:

http://thread.gmane.org/gmane.comp.file-systems.ext3.user/2351/focus=2358

Is a thread from 2005 about block aliasing on large arrays. Specifically read 
the last two posts from Andreas and Stephen. The ideal would be that you are 
seeing the corruption after the filesystem filled to a certain capacity. 
Cause is possibly that a block pointer (in the device driver or VFS layer) 
wrapped and is now referring to the wrong block on the device causing 
corruption.

Though possibly you did find a bad memory module using memtest. It is possible 
that other modules may be bad as well and memtest isn't detecting it. Try 
removing all but one or two modules (RAM will decrease, but be sufficient for 
testing) and restest. 

At minimal, I would get a backup of the data as soon as possible so you don't 
lose anything.

Thanks,
Jeremy

On Thursday 28 December 2006 03:05, Jan wrote:
> The machine is used mainly as fileserver with samba and netatalk. this
> should be the only server applications which are placing data on the
> drive. For testing I disabled netatalk yet. I can do an fsck and the
> filesystem is fine after that. I do a remount and copy witch cp a few
> GB, do an unmount and the fsck will have errors again in the target
> directory of the copied files. In this test there are no samba or
> netatalk users connected. When I copied files with a client connected
> with samba I got the same errors.
>
> Jan
>
> > What is this machine being used for, primarily?  What types of local
> > applications/binaries are placing data on the drive?
> >
> > - Kevin
> >
> > Jan wrote:
> >> Hey,
> >>
> >> I've a problem with an ext3 filesystem and don't know how to fix it or
> >> find the failure :(
> >>
> >> The Hardware:
> >>
> >> Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with
> >> 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and
> >> avm isdn controller.
> >>
> >> Couse of the filesystem problems I run memtest and found one bad memory
> >> module which I replaced yet.
> >>
> >> The System:
> >>
> >> Kernel 2.6.19.1
> >> Debian Gnu/Linux 3.0  with e2fsck 1.37 (21-Mar-2005)
> >>
> >>
> >> I've setup one ext3 partition with around 1.4 TB on the raid5 volume.
> >> The first four month we run the raid without any problems. About two
> >> month ago I noticed that the filesystem was remounted ro. A filesystem
> >> check found a lot of errors. After a filesystem check and a new mount of
> >> the partition and copy data on the partition you get the errors again.
> >> Also with Kernel 2.6.17.3 I got this problems. A raid volume check with
> >> the areca command line tools doesn't find any errors.
> >>
> >> Errors from dmesg / kernel.log:
> >>
> >>
> >> EXT3-fs: mounted filesystem with ordered data mode.
> >> init_special_inode: bogus i_mode (113301)
> >> init_special_inode: bogus i_mode (170101)
> >> init_special_inode: bogus i_mode (115140)
> >> init_special_inode: bogus i_mode (117302)
> >> init_special_inode: bogus i_mode (111700)
> >> EXT3-fs error (device sda1): ext3_readdir: bad entry in directory
> >> #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466,
> >> name_len=34
> >>
> >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
> >> (111501)
> >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
> >> (113301)
> >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
> >> (170101)
> >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode
> >> (115140)
> >> Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 -
> >> offset=0, inode=3038782558,
> >> rec_len=28425, name_len=75
> >>
> >>
> >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
> >> (111501)
> >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
> >> (113301)
> >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
> >> (170101)
> >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode
> >> (115140)
> >> Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #20351025: directory entry across
> >> blocks - offset=0, inode=20353857, rec_len=13600, name_len=1
> >> Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 -
> >> offset=96, inode=20437734, rec_len=27291, name_len=6
> >> Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #21007912: directory entry across
> >> blocks - offset=296, inode=21005643, rec_len=32184, name_len=25
> >> Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode
> >> (114764)
> >> Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 -
> >> offset=24, inode=21839878, rec_len=22019, name_len=7
> >> Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode
> >> (55314)
> >> Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode
> >> (117302)
> >> Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1):
> >> ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 -
> >> offset=24, inode=22417991, rec_len=28145, name_len=8
> >>
> >> Any hints how to solve this problem or to isolate the failure ?
> >>
> >> Best regards and thanks in advance for your help,
> >>
> >> Jan
> >>
> >> _______________________________________________
> >> Ext3-users mailing list
> >> Ext3-users at redhat.com
> >> https://www.redhat.com/mailman/listinfo/ext3-users
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users



From jan at netropol.de  Fri Dec 29 21:10:51 2006
From: jan at netropol.de (Jan)
Date: Fri, 29 Dec 2006 21:10:51 +0000
Subject: Problem with ext3 filesystem
In-Reply-To: <4592FC5D.70101@netropol.de>
References: <4592FC5D.70101@netropol.de>
Message-ID: <4595845B.7010403@netropol.de>

I did some more test. First I split the raid in two partitions and run
mkfs.ext3 -c for badblocks.

then I tried to copy data on the first partion. after a while i got
errors (around 130gb were on th partition) :

attempt to access beyond end of device
sda1: rw=0, want=3151373440, limit=1464846304
attempt to access beyond end of device
sda1: rw=0, want=2853870672, limit=1464846304
attempt to access beyond end of device
sda1: rw=0, want=1751501064, limit=1464846304
attempt to access beyond end of device
sda1: rw=0, want=2783268072, limit=1464846304
attempt to access beyond end of device
sda1: rw=0, want=2835511344, limit=1464846304
attempt to access beyond end of device


so I googled and found this thread:

http://lkml.org/lkml/2006/10/5/353

I tried the script from there on my partition (whith around 130 gb on it):

dd bs=1M count=200 if=/dev/zero of=test0
 while :; do
        echo "cp 0-1"; cp test0 test1 || break
        echo "cp 1-2"; cp test1 test2 || break
        echo "cp 2-3"; cp test2 test3 || break
        echo "cp 3-4"; cp test3 test4 || break
        echo "od 0" ; od test0 || break
        echo "rm 1"; rm test1 || break
        echo "rm 2"; rm test2 || break
        echo "rm 3"; rm test3 || break
        echo "rm 4"; rm test4 || break
 done


The script was running not more then 10 minutes and I got:

EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system
zones - Block = 327680, count = 1
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system
zones - Block = 327680, count = 1
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system
zones - Block = 327680, count = 1

Then I create a new fs on the partition without any data on it and
got the same errors while running ths script.

On internal ide harddisk softwareraid-1 but partiton size only 4 gb no
problem with the test script.

The testscript with 10mb file is now running for several hours without
any problems on the sata raid partition.

New Partion with size of 340GB, testscript with 200mb files. Errors:

ock = 65536, count = 1
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system
zones - Block = 65536, count = 1
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system
zones - Block = 65536, count = 1
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system
zones - Block = 65536, count = 1
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda2) in ext3_truncate: IO failure
EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda2) in ext3_orphan_del: Readonly filesystem
EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda2) in ext3_delete_inode: IO failure

And with 180 GB Partition:


ree_blocks: Freeing blocks in system zones - Block = 262144, count = 1
EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda3): ext3_free_blocks: Freeing blocks in system
zones - Block = 262144, count = 1
EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda3): ext3_free_blocks: Freeing blocks in system
zones - Block = 262144, count = 1
EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda3) in ext3_truncate: IO failure
EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda3) in ext3_orphan_del: Readonly filesystem
EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda3) in ext3_delete_inode: IO failure


Strange ? I still don't know how to solve the problem.

Regards, Jan


> Hey,
> 
> I've a problem with an ext3 filesystem and don't know how to fix it or
> find the failure :(
> 
> The Hardware:
> 
> Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with
> 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and
> avm isdn controller.
> 
> Couse of the filesystem problems I run memtest and found one bad memory
> module which I replaced yet.
> 
> The System:
> 
> Kernel 2.6.19.1
> Debian Gnu/Linux 3.0  with e2fsck 1.37 (21-Mar-2005)
> 
> 
> I've setup one ext3 partition with around 1.4 TB on the raid5 volume.
> The first four month we run the raid without any problems. About two
> month ago I noticed that the filesystem was remounted ro. A filesystem
> check found a lot of errors. After a filesystem check and a new mount of
> the partition and copy data on the partition you get the errors again.
> Also with Kernel 2.6.17.3 I got this problems. A raid volume check with
> the areca command line tools doesn't find any errors.
> 
> Errors from dmesg / kernel.log:
> 
> 
> EXT3-fs: mounted filesystem with ordered data mode.
> init_special_inode: bogus i_mode (113301)
> init_special_inode: bogus i_mode (170101)
> init_special_inode: bogus i_mode (115140)
> init_special_inode: bogus i_mode (117302)
> init_special_inode: bogus i_mode (111700)
> EXT3-fs error (device sda1): ext3_readdir: bad entry in directory
> #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466,
> name_len=34
> 
> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (111501)
> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (113301)
> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (170101)
> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (115140)
> Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 -
> offset=0, inode=3038782558,
> rec_len=28425, name_len=75
> 
> 
> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (111501)
> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (113301)
> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (170101)
> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (115140)
> Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #20351025: directory entry across
> blocks - offset=0, inode=20353857, rec_len=13600, name_len=1
> Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 -
> offset=96, inode=20437734, rec_len=27291, name_len=6
> Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #21007912: directory entry across
> blocks - offset=296, inode=21005643, rec_len=32184, name_len=25
> Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode (114764)
> Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 -
> offset=24, inode=21839878, rec_len=22019, name_len=7
> Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode (55314)
> Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode (117302)
> Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 -
> offset=24, inode=22417991, rec_len=28145, name_len=8
> 
> Any hints how to solve this problem or to isolate the failure ?
> 
> Best regards and thanks in advance for your help,
> 
> Jan
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users



From lists at nerdbynature.de  Fri Dec 29 20:45:48 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Fri, 29 Dec 2006 20:45:48 +0000 (GMT)
Subject: Problem with ext3 filesystem
In-Reply-To: <4595845B.7010403@netropol.de>
References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de>
Message-ID: <Pine.LNX.4.64.0612292041270.22257@sheep.housecafe.de>

On Fri, 29 Dec 2006, Jan wrote:
> attempt to access beyond end of device
> sda1: rw=0, want=2853870672, limit=1464846304

disk/cabling problems? just try

dd if=/dev/sda of=/dev/null bs=8M

and watch your syslog for errors.

good luck,
Christian.
-- 
BOFH excuse #352:

The cables are not the same length.



From lists at nerdbynature.de  Fri Dec 29 21:34:05 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Fri, 29 Dec 2006 21:34:05 +0000 (GMT)
Subject: Problem with ext3 filesystem
In-Reply-To: <459583F2.9060401@shaw.ca>
References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de>
	<Pine.LNX.4.64.0612292041270.22257@sheep.housecafe.de>
	<459583F2.9060401@shaw.ca>
Message-ID: <Pine.LNX.4.64.0612292127080.22257@sheep.housecafe.de>



[please reply on-list, so that all can comment]

On Fri, 29 Dec 2006, ..:::BeOS Mr. X:::.. wrote:
> Hey how would I check my syslog on BeOS ? I have dd. Is there I way I can use 
> that to calculate my hard drive MB/s read speed ?

um, I wasn't aware that ext3fs is supported on BeOS? Otherwise it's a 
bit off topic, no?

1) syslog: if BeOS has no syslog, how do you know what's wrong with the
    system?
2) if you have GNU/dd, you can send a USR1 to the dd process to see the
    progress and a MB/s value.

still, I don't see the relation to ext3 or even with the OP's problem.

cheers,
Christian.
-- 
BOFH excuse #328:

Fiber optics caused gas main leak



From mr._x at shaw.ca  Sat Dec 30 00:46:12 2006
From: mr._x at shaw.ca (..:::BeOS Mr. X:::..)
Date: Fri, 29 Dec 2006 16:46:12 -0800
Subject: Problem with ext3 filesystem
In-Reply-To: <Pine.LNX.4.64.0612292127080.22257@sheep.housecafe.de>
References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de>
	<Pine.LNX.4.64.0612292041270.22257@sheep.housecafe.de>
	<459583F2.9060401@shaw.ca>
	<Pine.LNX.4.64.0612292127080.22257@sheep.housecafe.de>
Message-ID: <4595B6D4.7060408@shaw.ca>

Well I was just asking for interests sake, I do read this ext3 out of 
curiosity just to learn a few tips. I didtn want to bother the whole 
user group, but what is the USR1 bit ? How would I do that ?

Christian Kujau wrote:
> 
> 
> [please reply on-list, so that all can comment]
> 
> On Fri, 29 Dec 2006, ..:::BeOS Mr. X:::.. wrote:
>> Hey how would I check my syslog on BeOS ? I have dd. Is there I way I 
>> can use that to calculate my hard drive MB/s read speed ?
> 
> um, I wasn't aware that ext3fs is supported on BeOS? Otherwise it's a 
> bit off topic, no?
> 
> 1) syslog: if BeOS has no syslog, how do you know what's wrong with the
>    system?
> 2) if you have GNU/dd, you can send a USR1 to the dd process to see the
>    progress and a MB/s value.
> 
> still, I don't see the relation to ext3 or even with the OP's problem.
> 
> cheers,
> Christian.



From bryan at kdzbn.homelinux.net  Sat Dec 30 03:39:47 2006
From: bryan at kdzbn.homelinux.net (Bryan Kadzban)
Date: Fri, 29 Dec 2006 22:39:47 -0500
Subject: Problem with ext3 filesystem
In-Reply-To: <4595B6D4.7060408@shaw.ca>
References: <4592FC5D.70101@netropol.de>
	<4595845B.7010403@netropol.de>	<Pine.LNX.4.64.0612292041270.22257@sheep.housecafe.de>	<459583F2.9060401@shaw.ca>	<Pine.LNX.4.64.0612292127080.22257@sheep.housecafe.de>
	<4595B6D4.7060408@shaw.ca>
Message-ID: <4595DF83.2080902@kdzbn.homelinux.net>

..:::BeOS Mr. X:::.. wrote:
> Well I was just asking for interests sake, I do read this ext3 out of
> curiosity just to learn a few tips. I didtn want to bother the whole
> user group, but what is the USR1 bit ? How would I do that ?

It's not a bit, it's a signal.  E.g.:

killall -USR1 dd

Or write a C program to make the kill(2) system call on the PID of the
dd process every so often.  Of course you will need privileges either
way; either dd has to be running as you, or you need to be root.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20061229/1abd115c/attachment.sig>

From tytso at mit.edu  Sat Dec 30 04:08:16 2006
From: tytso at mit.edu (Theodore Tso)
Date: Fri, 29 Dec 2006 23:08:16 -0500
Subject: Problem with ext3 filesystem
In-Reply-To: <4595845B.7010403@netropol.de>
References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de>
Message-ID: <20061230040816.GA27654@thunk.org>

You might want to try "badblocks -w /dev/XXX" and see if it reports
any bad blocks or I/O errors.  This really seems like a device driver
or hardware problem...

						- Ted



From lists at nerdbynature.de  Sat Dec 30 12:45:48 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Sat, 30 Dec 2006 12:45:48 +0000 (GMT)
Subject: Problem with ext3 filesystem
In-Reply-To: <20061230040816.GA27654@thunk.org>
References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de>
	<20061230040816.GA27654@thunk.org>
Message-ID: <Pine.LNX.4.64.0612301244090.22257@sheep.housecafe.de>

On Fri, 29 Dec 2006, Theodore Tso wrote:
> You might want to try "badblocks -w /dev/XXX" and see if it reports

May I add that this options *destroys all data* on your device:

"WARNING
    Never use the -w option on a device containing an existing file
    system.  This option erases data!  If you want to do write-mode testing
    on an existing file system, use the -n option instead.  It is slower,
    but it will preserve your data."

Christian.
-- 
BOFH excuse #272:

Netscape has crashed



From samjnaa at gmail.com  Sun Dec 31 04:18:35 2006
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Sun, 31 Dec 2006 09:48:35 +0530
Subject: Ext4 improvements
Message-ID: <45973A1B.7010409@gmail.com>

Please be patient with my ignorance if what I am asking is meaningless
in any way. I am not too technically knowledgeable about filesystem
internals but I am willing to learn. (I thought of posting to linux-ext4
but did not want to intrude within the technical threads with my layman
thread.)

 From Wikipedia > ReiserFS article > Design section:

[quote]ext2 and other Berkeley FFS-like filesystems simply use a fixed
formula for computing inode locations, hence limiting the number of
files they may contain. Most such filesystems also store directories as
simple lists of entries, which makes directory lookups and updates
linear time operations and degrades performance on very large
directories. The single B+ tree design in ReiserFS avoids both of these
problems due to better scalability properties.[/quote]

So will ext4 avoid both of these problems just like ReiserFS? Does it
use a B+ tree? Or this "dancing B* tree" that Reiser4 is supposed to have?

Also: I found that a newly created ext3 partition uses 128 MB whereas a
new reiser3 partition uses only 32 MB. I assume that the 128 MB is the
space taken for the pre-allocated inodes or such. And I now come to know
that others have this problem much more serious on bigger filesystems -
[see comment 2 at
http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-default-fs/]. 



If ext4 uses a B+ (or B*?) tree like ReiserFS then this space can be
reduced, right?

Thanks.

Shriramana Sharma.

P.S: Are there any recommended tutorials for learning filesystem basics?

P.P.S: I just put this post here because I want to convert from reiserfs
of uncertain future to ext4.




From lists at nerdbynature.de  Sun Dec 31 06:57:28 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Sun, 31 Dec 2006 06:57:28 +0000 (GMT)
Subject: Ext4 improvements
In-Reply-To: <45973A1B.7010409@gmail.com>
References: <45973A1B.7010409@gmail.com>
Message-ID: <Pine.LNX.4.64.0612310641460.22257@sheep.housecafe.de>

On Sun, 31 Dec 2006, Shriramana Sharma wrote:
> So will ext4 avoid both of these problems just like ReiserFS? Does it
> use a B+ tree? Or this "dancing B* tree" that Reiser4 is supposed to have?

I cannot comment on stability/performance of ext4, but here are the 
specs: http://www.bullopensource.org/ext4/

> Also: I found that a newly created ext3 partition uses 128 MB whereas a
> new reiser3 partition uses only 32 MB.

are you talking about ext3 or ext4? I haven't tested ext4 yet but for 
ext3 it looks like this:

$ df -h /mnt/test0 /mnt/test1
Filesystem         Size  Used Avail Use% Mounted on
/tmp/reiser3.img    33M   33M  944K  98% /mnt/test0
/tmp/ext3.img       15M  1.6M   13M  11% /mnt/test1

(reiser3 needs at least a 32MB image file/device)

> P.S: Are there any recommended tutorials for learning filesystem basics?

Hm, I'm no filesystem guru but I suggest reading the specs and the 
source should help a lot...


my 2 cents,
Christian.
-- 
BOFH excuse #127:

Sticky bits on disk.



From jan.stobbe at netropol.de  Fri Dec 29 21:05:32 2006
From: jan.stobbe at netropol.de (Jan Stobbe)
Date: Fri, 29 Dec 2006 21:05:32 +0000
Subject: Problem with ext3 filesystem
In-Reply-To: <4592FC5D.70101@netropol.de>
References: <4592FC5D.70101@netropol.de>
Message-ID: <4595831C.6080001@netropol.de>

I did some more test. First I split the raid in two partitions and run
mkfs.ext3 -c for badblocks.

then I tried to copy data on the first partion. after a while i got
errors (around 130gb were on th partition) :

attempt to access beyond end of device
sda1: rw=0, want=3151373440, limit=1464846304
attempt to access beyond end of device
sda1: rw=0, want=2853870672, limit=1464846304
attempt to access beyond end of device
sda1: rw=0, want=1751501064, limit=1464846304
attempt to access beyond end of device
sda1: rw=0, want=2783268072, limit=1464846304
attempt to access beyond end of device
sda1: rw=0, want=2835511344, limit=1464846304
attempt to access beyond end of device


so I googled and found this thread:

http://lkml.org/lkml/2006/10/5/353

I tried the script from there on my partition (whith around 130 gb on it):

dd bs=1M count=200 if=/dev/zero of=test0
 while :; do
        echo "cp 0-1"; cp test0 test1 || break
        echo "cp 1-2"; cp test1 test2 || break
        echo "cp 2-3"; cp test2 test3 || break
        echo "cp 3-4"; cp test3 test4 || break
        echo "od 0" ; od test0 || break
        echo "rm 1"; rm test1 || break
        echo "rm 2"; rm test2 || break
        echo "rm 3"; rm test3 || break
        echo "rm 4"; rm test4 || break
 done


The script was running not more then 10 minutes and I got:

EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system
zones - Block = 327680, count = 1
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system
zones - Block = 327680, count = 1
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system
zones - Block = 327680, count = 1

Then I create a new fs on the partition without any data on it and
got the same errors while running ths script.

On internal ide harddisk softwareraid-1 but partiton size only 4 gb no
problem with the test script.

The testscript with 10mb file is now running for several hours without
any problems on the sata raid partition.

New Partion with size of 340GB, testscript with 200mb files. Errors:

ock = 65536, count = 1
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system
zones - Block = 65536, count = 1
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system
zones - Block = 65536, count = 1
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system
zones - Block = 65536, count = 1
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda2) in ext3_truncate: IO failure
EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda2) in ext3_orphan_del: Readonly filesystem
EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda2) in ext3_delete_inode: IO failure

And with 180 GB Partition:


ree_blocks: Freeing blocks in system zones - Block = 262144, count = 1
EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda3): ext3_free_blocks: Freeing blocks in system
zones - Block = 262144, count = 1
EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device sda3): ext3_free_blocks: Freeing blocks in system
zones - Block = 262144, count = 1
EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda3) in ext3_truncate: IO failure
EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda3) in ext3_orphan_del: Readonly filesystem
EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem
EXT3-fs error (device sda3) in ext3_delete_inode: IO failure


Strange ? I still don't know how to solve the problem.

Regards, Jan


> Hey,
> 
> I've a problem with an ext3 filesystem and don't know how to fix it or
> find the failure :(
> 
> The Hardware:
> 
> Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with
> 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and
> avm isdn controller.
> 
> Couse of the filesystem problems I run memtest and found one bad memory
> module which I replaced yet.
> 
> The System:
> 
> Kernel 2.6.19.1
> Debian Gnu/Linux 3.0  with e2fsck 1.37 (21-Mar-2005)
> 
> 
> I've setup one ext3 partition with around 1.4 TB on the raid5 volume.
> The first four month we run the raid without any problems. About two
> month ago I noticed that the filesystem was remounted ro. A filesystem
> check found a lot of errors. After a filesystem check and a new mount of
> the partition and copy data on the partition you get the errors again.
> Also with Kernel 2.6.17.3 I got this problems. A raid volume check with
> the areca command line tools doesn't find any errors.
> 
> Errors from dmesg / kernel.log:
> 
> 
> EXT3-fs: mounted filesystem with ordered data mode.
> init_special_inode: bogus i_mode (113301)
> init_special_inode: bogus i_mode (170101)
> init_special_inode: bogus i_mode (115140)
> init_special_inode: bogus i_mode (117302)
> init_special_inode: bogus i_mode (111700)
> EXT3-fs error (device sda1): ext3_readdir: bad entry in directory
> #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466,
> name_len=34
> 
> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (111501)
> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (113301)
> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (170101)
> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (115140)
> Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 -
> offset=0, inode=3038782558,
> rec_len=28425, name_len=75
> 
> 
> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (111501)
> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (113301)
> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (170101)
> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (115140)
> Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #20351025: directory entry across
> blocks - offset=0, inode=20353857, rec_len=13600, name_len=1
> Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 -
> offset=96, inode=20437734, rec_len=27291, name_len=6
> Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #21007912: directory entry across
> blocks - offset=296, inode=21005643, rec_len=32184, name_len=25
> Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode (114764)
> Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 -
> offset=24, inode=21839878, rec_len=22019, name_len=7
> Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode (55314)
> Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode (117302)
> Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1):
> ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 -
> offset=24, inode=22417991, rec_len=28145, name_len=8
> 
> Any hints how to solve this problem or to isolate the failure ?
> 
> Best regards and thanks in advance for your help,
> 
> Jan
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


-- 
Jan Stobbe                              Netropol Digitale Systeme GmbH
jan.stobbe at netropol.de                  Stresemannstrasse 161
Tel: +49 40 284167-20                   D-22769 Hamburg/Germany
Fax: +49 40 284167-40                   http://www.netropol.de/