From cheng.je at gmail.com Fri Dec 1 02:55:02 2006 From: cheng.je at gmail.com (Joseph Cheng) Date: Thu, 30 Nov 2006 21:55:02 -0500 Subject: maintain 6TB filesystem + fsck Message-ID: i posted on rhel list about proper creating of 6tb ext3 filesystem and tuning here.......http://www.redhat.com/archives/nahant-list/2006-November/msg00239.html i am reading lots of ext3 links like...... http://www.redhat.com/support/wpapers/redhat/ext3/ http://lists.centos.org/pipermail/centos/2005-September/052533.html http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html ............but not enough links for large TB arrays and ext3 :( there is lots of old faq and information so pls show me errors of my ways lol. after reading mailing list posts i have created filesystems like this........ mkfs.ext3 -b 4096 -i 65536 -j -m 1 -O dir_index -L /prodspace1 /dev/sda1 i put output of mkfs.ext3 and tune2fs -l below. is there any thing that i am mistaken about?? my other problem is fsck. i read here.....http://listman.redhat.com/archives/ext3-users/2006-October/msg00005.html 'The major problem at this point is e2fsck time, which is about 1h/TB for fast disks, at minimum (i.e. no major corruption found).' .........is that ext3 or ext4? i don't know how long fsck will take w/ 6TB ext3 filesystem. i first choose to disable auto fsck with 'tune2fs -i0 -c0 /dev/sda1' but seems dangerous if filesystem become corrupt without my knowledge! what is good balance betwen using auto fsck after number of mounts or time pass and keeping fsck time short for large arrays? info...... os is rhel es 4 update 4 w/ generic server hardware storage hardware is multiple apple xserve raid w/ 6TB array each filesystem expected to contain 10 mb files to maybe even 50 mb + 100mb # tune2fs -l /dev/sda1 tune2fs 1.35 (28-Feb-2004) Filesystem volume name: /prodspace1 Last mounted on: Filesystem UUID: 7dccbede-5f4a-4cdf-b81e-129d3dd40106 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 106733568 Block count: 1707722743 Reserved block count: 17077227 Free blocks: 1704243353 Free inodes: 106733557 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1024 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 2048 Inode blocks per group: 64 Filesystem created: Thu Nov 30 18:06:45 2006 Last mount time: Thu Nov 30 18:26:11 2006 Last write time: Thu Nov 30 18:26:11 2006 Mount count: 1 Maximum mount count: 38 Last checked: Thu Nov 30 18:06:45 2006 Check interval: 15552000 (6 months) Next check after: Tue May 29 19:06:45 2007 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 9ef5bfcf-74fd-49a1-b2dc-88aa9b881bd9 Journal backup: inode blocks # mkfs.ext3 -b 4096 -i 65536 -j -m 1 -O dir_index -L /prodspace1 /dev/sda1 mke2fs 1.35 (28-Feb-2004) Filesystem label=/prodspace1 OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 106733568 inodes, 1707722743 blocks 17077227 blocks (1.00%) reserved for the super user First data block=0 Maximum filesystem blocks=1711276032 52116 block groups 32768 blocks per group, 32768 fragments per group 2048 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544 Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 38 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. From adilger at clusterfs.com Fri Dec 1 12:44:34 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Fri, 1 Dec 2006 04:44:34 -0800 Subject: maintain 6TB filesystem + fsck In-Reply-To: References: Message-ID: <20061201124434.GR6429@schatzie.adilger.int> On Nov 30, 2006 21:55 -0500, Joseph Cheng wrote: > http://listman.redhat.com/archives/ext3-users/2006-October/msg00005.html > 'The major problem at this point is e2fsck time, which is about 1h/TB > for fast disks, at minimum (i.e. no major corruption found).' > .........is that ext3 or ext4? I don't think it really matters. > i don't know how long fsck will take w/ 6TB ext3 filesystem. You have such a filesystem, test it... > i first choose to disable auto fsck with 'tune2fs > -i0 -c0 /dev/sda1' but seems dangerous if filesystem become corrupt > without my knowledge! what is good balance betwen using auto fsck > after number of mounts or time pass and keeping fsck time short for > large arrays? info...... You can optionally run e2fsck -fn on a relatively quiet (though mounted) filesystem, and if it checks (relatively) clean then you could reset the fsck time in the superblock via tune2fs. > filesystem expected to contain 10 mb files to maybe even 50 mb + 100mb One of the major slowdowns for e2fsck is the number of inodes, so if you expect to have very large files you should create the filesystem with this in mind (i.e. "mke2fs -t largefile" or "mke2fs -t largefile4"). Expect e2fsck RAM usage to be about .75 * num_inodes + .25 * num_blocks, so in the neighbourhood of 500MB for your filesystem, so reducing inode count would also help this a fair amount. We are working on a patch to e2fsck and the kernel to allow e2fsck to skip unused inodes/bitmaps in each group so that e2fsck time is improved. It isn't quite ready for prime time, but has previously been discussed in linux-ext4 in relation to the UNINIT flags in recent e2fsprogs. It would at least reduce e2fsck time to O(used_inodes) from O(total_inodes) and also potentially avoid a lot of seeky IO. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From dushyanth at gmail.com Wed Dec 6 15:38:08 2006 From: dushyanth at gmail.com (dushy) Date: Wed, 6 Dec 2006 15:38:08 +0000 (UTC) Subject: File size differences Message-ID: Hey, I have two identical machines setup with a RAID 5 array. One of them is used for failovers and data from the master is synced everyday using rsync to the failover machine. The data on this disks are usually intranet KB's, DB's etc.. The RAID 5 arrays are formatted using the default options i,e mkfs.ext3 /dev/Xda. The RAID controller is 3ware escalade and each disk member in the RAID 5 array are 400Gb IDE. Now the wierd part is, after syncing the failover with the master and comparing the size of each dir and file I find some files where the size mismatches.. [root at storage-master repositories]# du --si "/store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" 8.2k /store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt root at storage-slave compare]# du --si "/store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" 4.1k /store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt stat on the same file shows.. [root at storage-master repositories]# stat "/store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt' Size: 1126 Blocks: 16 IO Block: 4096 regular file Device: 801h/2049d Inode: 10403842 Links: 1 Access: (0775/-rwxrwxr-x) Uid: ( 48/ apache) Gid: ( 48/ apache) Access: 2006-09-11 12:22:24.000000000 +0530 Modify: 2004-09-23 16:45:31.000000000 +0530 Change: 2006-02-23 18:31:42.000000000 +0530 root at storage-slave compare]# stat "/store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt' Size: 1126 Blocks: 8 IO Block: 4096 regular file Device: 801h/2049d Inode: 23019536 Links: 1 Access: (0775/-rwxrwxr-x) Uid: ( 48/ apache) Gid: ( 48/ apache) Access: 2001-01-28 21:10:14.000000000 +0530 Modify: 2004-09-23 16:45:31.000000000 +0530 Change: 2001-01-28 21:10:14.000000000 +0530 The number of blocks allocated on the master seems to be 16 and the failover is 8. Is this the reason for the file size difference even though the content is the same ? I rsynced the same file from the master to a different server and the file size matched. Any reason why the no. of blocks allocated is different across both this machines ? The file i gave above is just a example and there are many more files like this. Also only 10% of the files have different sizes. I.e out of 263032 files/folders only 17655 have the above problem. Below is the ext3 filesystem info on both the master and failover. [root at storage-master repositories]# dumpe2fs -h /dev/sda1 dumpe2fs 1.35 (28-Feb-2004) Filesystem volume name: /store1 Last mounted on: Filesystem UUID: 2368a03d-f21f-4c5e-b12a-cbd2c726237c Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode filetype needs_recovery sparse_super large_file Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 97681408 Block count: 195354408 Reserved block count: 9767720 Free blocks: 22200635 Free inodes: 97329015 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1024 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16384 Inode blocks per group: 512 Filesystem created: Tue Jun 28 17:06:41 2005 Last mount time: Tue Oct 10 20:22:02 2006 Last write time: Tue Oct 10 20:22:02 2006 Mount count: 93 Maximum mount count: -1 Last checked: Thu Oct 20 19:03:56 2005 Check interval: 0 () Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 First orphan inode: 52691033 Default directory hash: tea Directory Hash Seed: 0449f257-e47d-4faf-92fa-fa497efab3a1 Journal backup: inode blocks [root at storage-slave compare]# dumpe2fs -h /dev/sda1 dumpe2fs 1.35 (28-Feb-2004) Filesystem volume name: Last mounted on: Filesystem UUID: b003440d-d153-4cec-a668-94f5482d54cf Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode filetype needs_recovery sparse_super large_file Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 97681408 Block count: 195354408 Reserved block count: 9767720 Free blocks: 26532722 Free inodes: 97327187 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1024 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16384 Inode blocks per group: 512 Filesystem created: Tue Jun 28 14:37:12 2005 Last mount time: Thu Nov 16 01:11:30 2006 Last write time: Thu Nov 16 01:11:30 2006 Mount count: 65 Maximum mount count: -1 Last checked: Thu Oct 6 15:11:41 2005 Check interval: 0 () Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: fa6b4317-d51d-4050-b0f3-c72b45148777 Journal backup: inode blocks tia dushy From jpiszcz at lucidpixels.com Wed Dec 6 19:22:42 2006 From: jpiszcz at lucidpixels.com (Justin Piszcz) Date: Wed, 6 Dec 2006 14:22:42 -0500 (EST) Subject: File size differences In-Reply-To: References: Message-ID: Have you MD5SUM'd the file on both sides? If it is the same, then you have no problems. % md5sum filename On each side, compare output. Justin. On Wed, 6 Dec 2006, dushy wrote: > Hey, > > I have two identical machines setup with a RAID 5 array. One of them is used for > failovers and data from the master is synced everyday using rsync to the > failover machine. The data on this disks are usually intranet KB's, DB's etc.. > > The RAID 5 arrays are formatted using the default options i,e mkfs.ext3 > /dev/Xda. The RAID controller is 3ware escalade and each disk member in the RAID > 5 array are 400Gb IDE. > > Now the wierd part is, after syncing the failover with the master and comparing > the size of each dir and file I find some files where the size mismatches.. > > [root at storage-master repositories]# du --si > "/store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" > 8.2k /store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt > > root at storage-slave compare]# du --si > "/store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" > 4.1k /store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt > > stat on the same file shows.. > > [root at storage-master repositories]# stat > "/store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" > File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt' > Size: 1126 Blocks: 16 IO Block: 4096 regular file > Device: 801h/2049d Inode: 10403842 Links: 1 > Access: (0775/-rwxrwxr-x) Uid: ( 48/ apache) Gid: ( 48/ apache) > Access: 2006-09-11 12:22:24.000000000 +0530 > Modify: 2004-09-23 16:45:31.000000000 +0530 > Change: 2006-02-23 18:31:42.000000000 +0530 > > root at storage-slave compare]# stat "/store1/SystemAdministration-OldVideos/SysAd > Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" > File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt' > Size: 1126 Blocks: 8 IO Block: 4096 regular file > Device: 801h/2049d Inode: 23019536 Links: 1 > Access: (0775/-rwxrwxr-x) Uid: ( 48/ apache) Gid: ( 48/ apache) > Access: 2001-01-28 21:10:14.000000000 +0530 > Modify: 2004-09-23 16:45:31.000000000 +0530 > Change: 2001-01-28 21:10:14.000000000 +0530 > > The number of blocks allocated on the master seems to be 16 and the failover is > 8. Is this the reason for the file size difference even though the content is > the same ? > > I rsynced the same file from the master to a different server and the file size > matched. Any reason why the no. of blocks allocated is different across both > this machines ? > > The file i gave above is just a example and there are many more files like this. > Also only 10% of the files have different sizes. I.e out of 263032 > files/folders only 17655 have the above problem. > > Below is the ext3 filesystem info on both the master and failover. > > [root at storage-master repositories]# dumpe2fs -h /dev/sda1 > dumpe2fs 1.35 (28-Feb-2004) > Filesystem volume name: /store1 > Last mounted on: > Filesystem UUID: 2368a03d-f21f-4c5e-b12a-cbd2c726237c > Filesystem magic number: 0xEF53 > Filesystem revision #: 1 (dynamic) > Filesystem features: has_journal ext_attr resize_inode filetype > needs_recovery sparse_super large_file > Default mount options: (none) > Filesystem state: clean > Errors behavior: Continue > Filesystem OS type: Linux > Inode count: 97681408 > Block count: 195354408 > Reserved block count: 9767720 > Free blocks: 22200635 > Free inodes: 97329015 > First block: 0 > Block size: 4096 > Fragment size: 4096 > Reserved GDT blocks: 1024 > Blocks per group: 32768 > Fragments per group: 32768 > Inodes per group: 16384 > Inode blocks per group: 512 > Filesystem created: Tue Jun 28 17:06:41 2005 > Last mount time: Tue Oct 10 20:22:02 2006 > Last write time: Tue Oct 10 20:22:02 2006 > Mount count: 93 > Maximum mount count: -1 > Last checked: Thu Oct 20 19:03:56 2005 > Check interval: 0 () > Reserved blocks uid: 0 (user root) > Reserved blocks gid: 0 (group root) > First inode: 11 > Inode size: 128 > Journal inode: 8 > First orphan inode: 52691033 > Default directory hash: tea > Directory Hash Seed: 0449f257-e47d-4faf-92fa-fa497efab3a1 > Journal backup: inode blocks > > [root at storage-slave compare]# dumpe2fs -h /dev/sda1 > dumpe2fs 1.35 (28-Feb-2004) > Filesystem volume name: > Last mounted on: > Filesystem UUID: b003440d-d153-4cec-a668-94f5482d54cf > Filesystem magic number: 0xEF53 > Filesystem revision #: 1 (dynamic) > Filesystem features: has_journal ext_attr resize_inode filetype > needs_recovery sparse_super large_file > Default mount options: (none) > Filesystem state: clean > Errors behavior: Continue > Filesystem OS type: Linux > Inode count: 97681408 > Block count: 195354408 > Reserved block count: 9767720 > Free blocks: 26532722 > Free inodes: 97327187 > First block: 0 > Block size: 4096 > Fragment size: 4096 > Reserved GDT blocks: 1024 > Blocks per group: 32768 > Fragments per group: 32768 > Inodes per group: 16384 > Inode blocks per group: 512 > Filesystem created: Tue Jun 28 14:37:12 2005 > Last mount time: Thu Nov 16 01:11:30 2006 > Last write time: Thu Nov 16 01:11:30 2006 > Mount count: 65 > Maximum mount count: -1 > Last checked: Thu Oct 6 15:11:41 2005 > Check interval: 0 () > Reserved blocks uid: 0 (user root) > Reserved blocks gid: 0 (group root) > First inode: 11 > Inode size: 128 > Journal inode: 8 > Default directory hash: tea > Directory Hash Seed: fa6b4317-d51d-4050-b0f3-c72b45148777 > Journal backup: inode blocks > > tia > dushy > > > > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > From adilger at clusterfs.com Wed Dec 6 19:54:34 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Wed, 6 Dec 2006 12:54:34 -0700 Subject: File size differences In-Reply-To: References: Message-ID: <20061206195434.GB5937@schatzie.adilger.int> On Dec 06, 2006 15:38 +0000, dushy wrote: > [root at storage-master repositories]# stat > "/store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" > File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt' > Size: 1126 Blocks: 16 IO Block: 4096 regular file > Device: 801h/2049d Inode: 10403842 Links: 1 > Access: (0775/-rwxrwxr-x) Uid: ( 48/ apache) Gid: ( 48/ apache) > Access: 2006-09-11 12:22:24.000000000 +0530 > Modify: 2004-09-23 16:45:31.000000000 +0530 > Change: 2006-02-23 18:31:42.000000000 +0530 > > root at storage-slave compare]# stat "/store1/SystemAdministration-OldVideos/SysAd > Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" > File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt' > Size: 1126 Blocks: 8 IO Block: 4096 regular file > Device: 801h/2049d Inode: 23019536 Links: 1 > Access: (0775/-rwxrwxr-x) Uid: ( 48/ apache) Gid: ( 48/ apache) > Access: 2001-01-28 21:10:14.000000000 +0530 > Modify: 2004-09-23 16:45:31.000000000 +0530 > Change: 2001-01-28 21:10:14.000000000 +0530 I'd suspect you have SELinux enabled on one of the nodes and not the other? Could also be ACLs. It is likely adding a 4kB EA block to each file. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From mnalis-ml at voyager.hr Thu Dec 7 20:18:18 2006 From: mnalis-ml at voyager.hr (Matija Nalis) Date: Thu, 7 Dec 2006 21:18:18 +0100 Subject: File size differences In-Reply-To: References: Message-ID: <20061207201818.GA3445@eagle102.home.lan> On Wed, Dec 06, 2006 at 03:38:08PM +0000, dushy wrote: > Now the wierd part is, after syncing the failover with the master and comparing > the size of each dir and file I find some files where the size mismatches.. > Size: 1126 Blocks: 16 IO Block: 4096 regular file > Size: 1126 Blocks: 8 IO Block: 4096 regular file maybe those files contain enough zero-bytes, and rsync has made a sparse file ? -- Opinions above are GNU-copylefted. From bruno at wolff.to Sat Dec 9 22:04:59 2006 From: bruno at wolff.to (Bruno Wolff III) Date: Sat, 9 Dec 2006 16:04:59 -0600 Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching Message-ID: <20061209220459.GA6202@wolff.to> I have been trying to figure out whether I can enable write caching on my PATA hard drives (WD3200JB) and have fsync not return until data is safely on the platters. I am also running software raid. This is currently on FC5 (though soon to be FC6) with a 2.6.18 kernel. >From snippets I have found on the net, it looks like write barriers are pushed down through software raid when using raid 1. So that if I mount the file systems with data=ordered and barrier=1, I think I should be OK, but I was hoping to get a more definitive answer. It also looks like barrier=1 is or will be the default for ext3. Is there a way I can check if this is the case on my system? /proc/mounts doesn't show the barrier option when I use barrier=1 or don't specify it at all. mount -lv shows the barrier option (when it was used for mounting), but not the data option. I am not sure if either of these are using the same data that the ext3 driver is using. From lists at nerdbynature.de Sat Dec 9 22:51:32 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Sat, 9 Dec 2006 22:51:32 +0000 (GMT) Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching In-Reply-To: <20061209220459.GA6202@wolff.to> References: <20061209220459.GA6202@wolff.to> Message-ID: On Sat, 9 Dec 2006, Bruno Wolff III wrote: > It also looks like barrier=1 is or will be the default for ext3. Is there > a way I can check if this is the case on my system? Hm, indeed: if write barriers are not available, mounting an XFS filesystem shows: > Filesystem "md0": Disabling barriers, not supported by the underlying device Mounting the same device when formatted with ext3 does not show this message nor does /proc/mounts reveal anything....could this be tweaked somehow? Christian. -- BOFH excuse #288: Hard drive sleeping. Let it wake up on it's own... From ric at emc.com Mon Dec 11 16:14:52 2006 From: ric at emc.com (Ric Wheeler) Date: Mon, 11 Dec 2006 11:14:52 -0500 Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching In-Reply-To: <20061209220459.GA6202@wolff.to> References: <20061209220459.GA6202@wolff.to> Message-ID: <457D83FC.1040709@emc.com> Bruno Wolff III wrote: > I have been trying to figure out whether I can enable write caching on my > PATA hard drives (WD3200JB) and have fsync not return until data is > safely on the platters. I am also running software raid. > This is currently on FC5 (though soon to be FC6) with a 2.6.18 kernel. > >>From snippets I have found on the net, it looks like write barriers are > pushed down through software raid when using raid 1. So that if I mount > the file systems with data=ordered and barrier=1, I think I should be > OK, but I was hoping to get a more definitive answer. > > It also looks like barrier=1 is or will be the default for ext3. Is there > a way I can check if this is the case on my system? > > /proc/mounts doesn't show the barrier option when I use barrier=1 or don't > specify it at all. mount -lv shows the barrier option (when it was used > for mounting), but not the data option. I am not sure if either of these > are using the same data that the ext3 driver is using. You can always do a sanity test on the barrier by timing how many synchronous files/sec you can create (i.e., create/write/fsync/close). Speeds vary depending on what kind of drive you have, journal mode, etc, but you will always see much faster times with the barrier off than on while writing small files (say 10K). regards, ric From bruno at wolff.to Mon Dec 11 16:36:56 2006 From: bruno at wolff.to (Bruno Wolff III) Date: Mon, 11 Dec 2006 10:36:56 -0600 Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching In-Reply-To: <457D83FC.1040709@emc.com> References: <20061209220459.GA6202@wolff.to> <457D83FC.1040709@emc.com> Message-ID: <20061211163656.GB28931@wolff.to> On Mon, Dec 11, 2006 at 11:14:52 -0500, Ric Wheeler wrote: > > You can always do a sanity test on the barrier by timing how many > synchronous files/sec you can create (i.e., create/write/fsync/close). > Speeds vary depending on what kind of drive you have, journal mode, etc, > but you will always see much faster times with the barrier off than on > while writing small files (say 10K). That's probably a good idea in any case. Down the road I will be interested in whether barriers work through encrypted file systems and this will be a good test to have available. I should get at most 120 commits per second if write barriers are working; so I think that should be easy to detect. Is there already a tool out there that does this? It shouldn't be hard to write something simple, but maybe someone has written something fancy already. From ric at emc.com Mon Dec 11 17:44:40 2006 From: ric at emc.com (Ric Wheeler) Date: Mon, 11 Dec 2006 12:44:40 -0500 Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching In-Reply-To: <20061211163656.GB28931@wolff.to> References: <20061209220459.GA6202@wolff.to> <457D83FC.1040709@emc.com> <20061211163656.GB28931@wolff.to> Message-ID: <457D9908.5080602@emc.com> Bruno Wolff III wrote: > On Mon, Dec 11, 2006 at 11:14:52 -0500, > Ric Wheeler wrote: >> You can always do a sanity test on the barrier by timing how many >> synchronous files/sec you can create (i.e., create/write/fsync/close). >> Speeds vary depending on what kind of drive you have, journal mode, etc, >> but you will always see much faster times with the barrier off than on >> while writing small files (say 10K). > > That's probably a good idea in any case. Down the road I will be interested > in whether barriers work through encrypted file systems and this will be a good > test to have available. > > I should get at most 120 commits per second if write barriers are working; > so I think that should be easy to detect. > > Is there already a tool out there that does this? It shouldn't be hard to > write something simple, but maybe someone has written something fancy > already. I will send you the test code that I use & some test runs, ric From bruno at wolff.to Mon Dec 11 18:48:38 2006 From: bruno at wolff.to (Bruno Wolff III) Date: Mon, 11 Dec 2006 12:48:38 -0600 Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching In-Reply-To: <457D9908.5080602@emc.com> References: <20061209220459.GA6202@wolff.to> <457D83FC.1040709@emc.com> <20061211163656.GB28931@wolff.to> <457D9908.5080602@emc.com> Message-ID: <20061211184838.GA10516@wolff.to> On Mon, Dec 11, 2006 at 12:44:40 -0500, Ric Wheeler wrote: > > I will send you the test code that I use & some test runs, I tried out Ric's test code and it does appear that the barrier is working under fc5 using software raid. For the quick test I didn't have an idle system. If I ran with either no barrier option and write caching enabled I saw very roughly a 10x speed up. The test seemed to show running with write caching disabled was about 20% faster than with just using barriers. My theory is that because there was a lot of other disk activity, the cache flushes forced by using barriers was writing a lot of disk blocks from outside the test making it report slower numbers, while in theory my system throughput was actually better. Tonight I can try the test on a system without a lot of disk activity and see if that makes much difference. Thanks Ric. From ric at emc.com Mon Dec 11 19:20:12 2006 From: ric at emc.com (Ric Wheeler) Date: Mon, 11 Dec 2006 14:20:12 -0500 Subject: fsync, ext3, raid (md) 1, write barriers and PATA caching In-Reply-To: <20061211184838.GA10516@wolff.to> References: <20061209220459.GA6202@wolff.to> <457D83FC.1040709@emc.com> <20061211163656.GB28931@wolff.to> <457D9908.5080602@emc.com> <20061211184838.GA10516@wolff.to> Message-ID: <457DAF6C.1000607@emc.com> Bruno Wolff III wrote: > On Mon, Dec 11, 2006 at 12:44:40 -0500, > Ric Wheeler wrote: > >>I will send you the test code that I use & some test runs, > > > I tried out Ric's test code and it does appear that the barrier is working > under fc5 using software raid. For the quick test I didn't have an idle > system. If I ran with either no barrier option and write caching enabled > I saw very roughly a 10x speed up. The test seemed to show running with > write caching disabled was about 20% faster than with just using barriers. > My theory is that because there was a lot of other disk activity, the cache > flushes forced by using barriers was writing a lot of disk blocks from > outside the test making it report slower numbers, while in theory my system > throughput was actually better. > Tonight I can try the test on a system without a lot of disk activity and > see if that makes much difference. > > Thanks Ric. > Glad to see that it is useful. Thanks really go out to Jens Axboe and Chris Mason for getting this all to work correctly in the first place ;-) As I mentioned in our private email exchange, you should see better performance with the write barrier on vs write cache disabled for some scenarios (large file writes and using data journal mode for small files are the two cases that I have noticed). ric From chris at cjx.com Wed Dec 13 13:47:06 2006 From: chris at cjx.com (Chris Allen) Date: Wed, 13 Dec 2006 13:47:06 +0000 Subject: ext3 4TB fs limit on amd64 (FAQ?) In-Reply-To: References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de> Message-ID: <4580045A.4060806@cjx.com> Christian Kujau wrote: > On Sun, 26 Nov 2006, Ralf Gross wrote: >> I've a question about the max. ext3 FS size. The ext3 FAQ explains that >> the limit is 4TB. > > Hm, strange: I'm pretty sure that mkfs.ext3 understands bigger > blocksizes for quite a while now. Then again, the FAQ says "Version: > 2004-10-14"... > > So, although I'd really love to have this information (and the FAQ!) on > http://e2fsprogs.sf.net/ this is what I found: > > blocksize file size limit filesystem size limit > 1 KiB 16448 MiB (~ 16 GiB) 2048 GiB (= 2 TiB) > 2 KiB 256 GiB 8192 GiB (= 8 TiB) > 4 KiB 2048 GiB (= 2 TiB) 16384 GiB (= 16 TiB) > 8 KiB 65568 GiB (~ 64 TiB) 32768 GiB (= 32 TiB) > > Note that an 8 KiB blocksize is only supported on systems with 8 KiB > pagesize (i.e. linux/alpha). > > We use 6TB ext3 filesystems over vanilla Fedora Core 5 on several heavily loaded systems. All perform fine without any obvious problems. From dushyanth at gmail.com Wed Dec 13 15:04:01 2006 From: dushyanth at gmail.com (dushy) Date: Wed, 13 Dec 2006 15:04:01 +0000 (UTC) Subject: File size differences References: <20061206195434.GB5937@schatzie.adilger.int> Message-ID: Hey, > I'd suspect you have SELinux enabled on one of the nodes and not the > other? Could also be ACLs. It is likely adding a 4kB EA block to each file. I dont have SELinux enabled on both sides. Iam checking on ACL's. tia dushy From dushyanth at gmail.com Wed Dec 13 15:06:35 2006 From: dushyanth at gmail.com (dushy) Date: Wed, 13 Dec 2006 15:06:35 +0000 (UTC) Subject: File size differences References: Message-ID: Hey, > Have you MD5SUM'd the file on both sides? If it is the same, then you > have no problems. > > % md5sum filename > > On each side, compare output. md5sum on the affected files on both sides are the same. As i said earlier, the file is exactly the same on both sides excpet the file size which is different. The size on master is exactly 4.1k bigger on only some files. [root at storage-master repositories]# md5sum "/store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" 649efcb46ad483abcf1edd334e16d76b /store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt [root at storage-slave compare]# md5sum "/store1/SystemAdministration-OldVideos/SysAdTraining/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" 649efcb46ad483abcf1edd334e16d76b /store1/SystemAdministration-OldVideos/SysAd Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt Iam just curious as to why there should be a file size diffference in only certain files. tia dushy From dushyanth at gmail.com Wed Dec 13 11:18:03 2006 From: dushyanth at gmail.com (dushy) Date: Wed, 13 Dec 2006 16:48:03 +0530 Subject: File size differences In-Reply-To: <20061206195434.GB5937@schatzie.adilger.int> References: <20061206195434.GB5937@schatzie.adilger.int> Message-ID: <497509650612130318m1fca1563q4201d24dd28099ab@mail.gmail.com> Hey, On 12/7/06, Andreas Dilger wrote: > On Dec 06, 2006 15:38 +0000, dushy wrote: > > [root at storage-master repositories]# stat > > "/store1/SystemAdministration-OldVideos/SysAd Training/Technology > > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" > > File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology > > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt' > > Size: 1126 Blocks: 16 IO Block: 4096 regular file > > Device: 801h/2049d Inode: 10403842 Links: 1 > > Access: (0775/-rwxrwxr-x) Uid: ( 48/ apache) Gid: ( 48/ apache) > > Access: 2006-09-11 12:22:24.000000000 +0530 > > Modify: 2004-09-23 16:45:31.000000000 +0530 > > Change: 2006-02-23 18:31:42.000000000 +0530 > > > > root at storage-slave compare]# stat "/store1/SystemAdministration-OldVideos/SysAd > > Training/Technology Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt" > > File: `/store1/SystemAdministration-OldVideos/SysAd Training/Technology > > Basics/000_READ_ME_FIRST_FOR_INDEX_OF_VIDEOS.txt' > > Size: 1126 Blocks: 8 IO Block: 4096 regular file > > Device: 801h/2049d Inode: 23019536 Links: 1 > > Access: (0775/-rwxrwxr-x) Uid: ( 48/ apache) Gid: ( 48/ apache) > > Access: 2001-01-28 21:10:14.000000000 +0530 > > Modify: 2004-09-23 16:45:31.000000000 +0530 > > Change: 2001-01-28 21:10:14.000000000 +0530 > > I'd suspect you have SELinux enabled on one of the nodes and not the > other? Could also be ACLs. It is likely adding a 4kB EA block to each file. I dont have SELinux enabled on either side. Iam checking abt ACL's and will update accordingly. tia dushy From ext3 at jks.tupari.net Tue Dec 19 22:55:41 2006 From: ext3 at jks.tupari.net (ext3 at jks.tupari.net) Date: Tue, 19 Dec 2006 17:55:41 -0500 (EST) Subject: Does ext3 prevent partial page writes? Message-ID: Basically I want to know if I can turn off full_page_writes in my postgres config. http://www.postgresql.org/docs/8.2/interactive/wal-reliability.html From lists at nerdbynature.de Wed Dec 20 06:42:20 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Wed, 20 Dec 2006 06:42:20 +0000 (GMT) Subject: Does ext3 prevent partial page writes? In-Reply-To: References: Message-ID: On Tue, 19 Dec 2006, ext3 at jks.tupari.net wrote: > Basically I want to know if I can turn off full_page_writes in my postgres > config. if your devices support write barriers, you can turn off this option, methinks (mount -o barrier=1). Of course, you'll run a few tests before going live, right? Christian. -- BOFH excuse #449: greenpeace free'd the mallocs From bruno at wolff.to Wed Dec 20 16:58:25 2006 From: bruno at wolff.to (Bruno Wolff III) Date: Wed, 20 Dec 2006 10:58:25 -0600 Subject: Does ext3 prevent partial page writes? In-Reply-To: References: Message-ID: <20061220165825.GC3732@wolff.to> On Wed, Dec 20, 2006 at 06:42:20 +0000, Christian Kujau wrote: > On Tue, 19 Dec 2006, ext3 at jks.tupari.net wrote: > >Basically I want to know if I can turn off full_page_writes in my postgres > >config. > > if your devices support write barriers, you can turn off this option, > methinks (mount -o barrier=1). Of course, you'll run a few tests before > going live, right? I have tested write barriers in FC5 using ext3 on top of sofware raid (raid 1 is the only type of raid that supports write barriers) and it seems to be working correctly. If you use this with the data=journal option I would expect you would be OK. It might be a good idea to check on the postgres list about the exact semantics you need for this. I also asked about write barriers on the dm-crypt list and they are supported there, but there is probably a problem with it on 2.6.19 kernels on SMP machines relating to a change to use per cpu work queues. From alazarev at itg.uiuc.edu Wed Dec 20 19:58:12 2006 From: alazarev at itg.uiuc.edu (Alex Lazarevich) Date: Wed, 20 Dec 2006 13:58:12 -0600 Subject: ext3 4TB fs limit on amd64 (FAQ?) In-Reply-To: References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de> Message-ID: <458995D4.1060206@itg.uiuc.edu> Christian Kujau wrote: > On Sun, 26 Nov 2006, Ralf Gross wrote: >> I've a question about the max. ext3 FS size. The ext3 FAQ explains that >> the limit is 4TB. > > Hm, strange: I'm pretty sure that mkfs.ext3 understands bigger > blocksizes for quite a while now. Then again, the FAQ says "Version: > 2004-10-14"... > > So, although I'd really love to have this information (and the FAQ!) on > http://e2fsprogs.sf.net/ this is what I found: > > blocksize file size limit filesystem size limit > 1 KiB 16448 MiB (~ 16 GiB) 2048 GiB (= 2 TiB) > 2 KiB 256 GiB 8192 GiB (= 8 TiB) > 4 KiB 2048 GiB (= 2 TiB) 16384 GiB (= 16 TiB) > 8 KiB 65568 GiB (~ 64 TiB) 32768 GiB (= 32 TiB) > > Note that an 8 KiB blocksize is only supported on systems with 8 KiB > pagesize (i.e. linux/alpha). > Is this still true? 8KiB pagefile only on linux/alpha? We run RHEL4-AS x64_86 on AMD Opteron, and the OS is going to let me create the 8192 block size on a 9TB partition, but it's giving a warning: partition is: Disk geometry for /dev/sda: 0.000-9296872.000 megabytes [root at dudemiestro ~]# mkfs.ext3 -b 8192 -m 1 /dev/sda1 Warning: blocksize 8192 not usable on most systems. mke2fs 1.35 (28-Feb-2004) mkfs.ext3: 8192-byte blocks too big for system (max 4096) Proceed anyway? (y,n) Anyone do this before and had any kind of success? Thanks, Alex From adilger at clusterfs.com Wed Dec 20 21:21:22 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Wed, 20 Dec 2006 14:21:22 -0700 Subject: ext3 4TB fs limit on amd64 (FAQ?) In-Reply-To: <458995D4.1060206@itg.uiuc.edu> References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de> <458995D4.1060206@itg.uiuc.edu> Message-ID: <20061220212122.GB5937@schatzie.adilger.int> On Dec 20, 2006 13:58 -0600, Alex Lazarevich wrote: > Is this still true? 8KiB pagefile only on linux/alpha? We run RHEL4-AS > x64_86 on AMD Opteron, and the OS is going to let me create the 8192 > block size on a 9TB partition, but it's giving a warning: > > partition is: Disk geometry for /dev/sda: 0.000-9296872.000 megabytes > > [root at dudemiestro ~]# mkfs.ext3 -b 8192 -m 1 /dev/sda1 > Warning: blocksize 8192 not usable on most systems. > mke2fs 1.35 (28-Feb-2004) > mkfs.ext3: 8192-byte blocks too big for system (max 4096) > Proceed anyway? (y,n) Sadly, x86_64 also has 4kB PAGE_SIZE like i386 (for compatibility reason or whatever). I'd always hoped for 8kB+ PAGE_SIZE when we got to 64-bit but it seems this will never happen. > Anyone do this before and had any kind of success? You need Alpha, PPC, ia64, mips, arm?, or possibly other non-*86 arch to have large PAGE_SIZE, or fix the VM to support larger PAGE_SIZE than the hardware. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From jan at netropol.de Wed Dec 27 23:06:05 2006 From: jan at netropol.de (Jan) Date: Wed, 27 Dec 2006 23:06:05 +0000 Subject: Problem with ext3 filesystem Message-ID: <4592FC5D.70101@netropol.de> Hey, I've a problem with an ext3 filesystem and don't know how to fix it or find the failure :( The Hardware: Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and avm isdn controller. Couse of the filesystem problems I run memtest and found one bad memory module which I replaced yet. The System: Kernel 2.6.19.1 Debian Gnu/Linux 3.0 with e2fsck 1.37 (21-Mar-2005) I've setup one ext3 partition with around 1.4 TB on the raid5 volume. The first four month we run the raid without any problems. About two month ago I noticed that the filesystem was remounted ro. A filesystem check found a lot of errors. After a filesystem check and a new mount of the partition and copy data on the partition you get the errors again. Also with Kernel 2.6.17.3 I got this problems. A raid volume check with the areca command line tools doesn't find any errors. Errors from dmesg / kernel.log: EXT3-fs: mounted filesystem with ordered data mode. init_special_inode: bogus i_mode (113301) init_special_inode: bogus i_mode (170101) init_special_inode: bogus i_mode (115140) init_special_inode: bogus i_mode (117302) init_special_inode: bogus i_mode (111700) EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466, name_len=34 Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (111501) Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (113301) Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (170101) Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (115140) Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 - offset=0, inode=3038782558, rec_len=28425, name_len=75 Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (111501) Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (113301) Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (170101) Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (115140) Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #20351025: directory entry across blocks - offset=0, inode=20353857, rec_len=13600, name_len=1 Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 - offset=96, inode=20437734, rec_len=27291, name_len=6 Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #21007912: directory entry across blocks - offset=296, inode=21005643, rec_len=32184, name_len=25 Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode (114764) Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 - offset=24, inode=21839878, rec_len=22019, name_len=7 Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode (55314) Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode (117302) Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 - offset=24, inode=22417991, rec_len=28145, name_len=8 Any hints how to solve this problem or to isolate the failure ? Best regards and thanks in advance for your help, Jan From jan at netropol.de Thu Dec 28 08:05:19 2006 From: jan at netropol.de (Jan) Date: Thu, 28 Dec 2006 08:05:19 +0000 Subject: Problem with ext3 filesystem In-Reply-To: <4592F2F6.6090708@criminalinfo.net> References: <4592FC5D.70101@netropol.de> <4592F2F6.6090708@criminalinfo.net> Message-ID: <45937ABF.8040206@netropol.de> The machine is used mainly as fileserver with samba and netatalk. this should be the only server applications which are placing data on the drive. For testing I disabled netatalk yet. I can do an fsck and the filesystem is fine after that. I do a remount and copy witch cp a few GB, do an unmount and the fsck will have errors again in the target directory of the copied files. In this test there are no samba or netatalk users connected. When I copied files with a client connected with samba I got the same errors. Jan > What is this machine being used for, primarily? What types of local > applications/binaries are placing data on the drive? > > - Kevin > > Jan wrote: >> Hey, >> >> I've a problem with an ext3 filesystem and don't know how to fix it or >> find the failure :( >> >> The Hardware: >> >> Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with >> 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and >> avm isdn controller. >> >> Couse of the filesystem problems I run memtest and found one bad memory >> module which I replaced yet. >> >> The System: >> >> Kernel 2.6.19.1 >> Debian Gnu/Linux 3.0 with e2fsck 1.37 (21-Mar-2005) >> >> >> I've setup one ext3 partition with around 1.4 TB on the raid5 volume. >> The first four month we run the raid without any problems. About two >> month ago I noticed that the filesystem was remounted ro. A filesystem >> check found a lot of errors. After a filesystem check and a new mount of >> the partition and copy data on the partition you get the errors again. >> Also with Kernel 2.6.17.3 I got this problems. A raid volume check with >> the areca command line tools doesn't find any errors. >> >> Errors from dmesg / kernel.log: >> >> >> EXT3-fs: mounted filesystem with ordered data mode. >> init_special_inode: bogus i_mode (113301) >> init_special_inode: bogus i_mode (170101) >> init_special_inode: bogus i_mode (115140) >> init_special_inode: bogus i_mode (117302) >> init_special_inode: bogus i_mode (111700) >> EXT3-fs error (device sda1): ext3_readdir: bad entry in directory >> #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466, >> name_len=34 >> >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode >> (111501) >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode >> (113301) >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode >> (170101) >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode >> (115140) >> Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1): >> ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 - >> offset=0, inode=3038782558, >> rec_len=28425, name_len=75 >> >> >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode >> (111501) >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode >> (113301) >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode >> (170101) >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode >> (115140) >> Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1): >> ext3_readdir: bad entry in directory #20351025: directory entry across >> blocks - offset=0, inode=20353857, rec_len=13600, name_len=1 >> Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1): >> ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 - >> offset=96, inode=20437734, rec_len=27291, name_len=6 >> Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1): >> ext3_readdir: bad entry in directory #21007912: directory entry across >> blocks - offset=296, inode=21005643, rec_len=32184, name_len=25 >> Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode >> (114764) >> Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1): >> ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 - >> offset=24, inode=21839878, rec_len=22019, name_len=7 >> Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode >> (55314) >> Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode >> (117302) >> Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1): >> ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 - >> offset=24, inode=22417991, rec_len=28145, name_len=8 >> >> Any hints how to solve this problem or to isolate the failure ? >> >> Best regards and thanks in advance for your help, >> >> Jan >> >> _______________________________________________ >> Ext3-users mailing list >> Ext3-users at redhat.com >> https://www.redhat.com/mailman/listinfo/ext3-users > From jrumpf at heavyload.net Thu Dec 28 16:49:34 2006 From: jrumpf at heavyload.net (Jeremy Rumpf) Date: Thu, 28 Dec 2006 11:49:34 -0500 Subject: Problem with ext3 filesystem In-Reply-To: <45937ABF.8040206@netropol.de> References: <4592FC5D.70101@netropol.de> <4592F2F6.6090708@criminalinfo.net> <45937ABF.8040206@netropol.de> Message-ID: <200612281149.34320.jrumpf@heavyload.net> Jan, I did notice that you are using a recent kernel so this may not be relevant: http://thread.gmane.org/gmane.comp.file-systems.ext3.user/2351/focus=2358 Is a thread from 2005 about block aliasing on large arrays. Specifically read the last two posts from Andreas and Stephen. The ideal would be that you are seeing the corruption after the filesystem filled to a certain capacity. Cause is possibly that a block pointer (in the device driver or VFS layer) wrapped and is now referring to the wrong block on the device causing corruption. Though possibly you did find a bad memory module using memtest. It is possible that other modules may be bad as well and memtest isn't detecting it. Try removing all but one or two modules (RAM will decrease, but be sufficient for testing) and restest. At minimal, I would get a backup of the data as soon as possible so you don't lose anything. Thanks, Jeremy On Thursday 28 December 2006 03:05, Jan wrote: > The machine is used mainly as fileserver with samba and netatalk. this > should be the only server applications which are placing data on the > drive. For testing I disabled netatalk yet. I can do an fsck and the > filesystem is fine after that. I do a remount and copy witch cp a few > GB, do an unmount and the fsck will have errors again in the target > directory of the copied files. In this test there are no samba or > netatalk users connected. When I copied files with a client connected > with samba I got the same errors. > > Jan > > > What is this machine being used for, primarily? What types of local > > applications/binaries are placing data on the drive? > > > > - Kevin > > > > Jan wrote: > >> Hey, > >> > >> I've a problem with an ext3 filesystem and don't know how to fix it or > >> find the failure :( > >> > >> The Hardware: > >> > >> Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with > >> 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and > >> avm isdn controller. > >> > >> Couse of the filesystem problems I run memtest and found one bad memory > >> module which I replaced yet. > >> > >> The System: > >> > >> Kernel 2.6.19.1 > >> Debian Gnu/Linux 3.0 with e2fsck 1.37 (21-Mar-2005) > >> > >> > >> I've setup one ext3 partition with around 1.4 TB on the raid5 volume. > >> The first four month we run the raid without any problems. About two > >> month ago I noticed that the filesystem was remounted ro. A filesystem > >> check found a lot of errors. After a filesystem check and a new mount of > >> the partition and copy data on the partition you get the errors again. > >> Also with Kernel 2.6.17.3 I got this problems. A raid volume check with > >> the areca command line tools doesn't find any errors. > >> > >> Errors from dmesg / kernel.log: > >> > >> > >> EXT3-fs: mounted filesystem with ordered data mode. > >> init_special_inode: bogus i_mode (113301) > >> init_special_inode: bogus i_mode (170101) > >> init_special_inode: bogus i_mode (115140) > >> init_special_inode: bogus i_mode (117302) > >> init_special_inode: bogus i_mode (111700) > >> EXT3-fs error (device sda1): ext3_readdir: bad entry in directory > >> #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466, > >> name_len=34 > >> > >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode > >> (111501) > >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode > >> (113301) > >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode > >> (170101) > >> Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode > >> (115140) > >> Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1): > >> ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 - > >> offset=0, inode=3038782558, > >> rec_len=28425, name_len=75 > >> > >> > >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode > >> (111501) > >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode > >> (113301) > >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode > >> (170101) > >> Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode > >> (115140) > >> Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1): > >> ext3_readdir: bad entry in directory #20351025: directory entry across > >> blocks - offset=0, inode=20353857, rec_len=13600, name_len=1 > >> Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1): > >> ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 - > >> offset=96, inode=20437734, rec_len=27291, name_len=6 > >> Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1): > >> ext3_readdir: bad entry in directory #21007912: directory entry across > >> blocks - offset=296, inode=21005643, rec_len=32184, name_len=25 > >> Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode > >> (114764) > >> Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1): > >> ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 - > >> offset=24, inode=21839878, rec_len=22019, name_len=7 > >> Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode > >> (55314) > >> Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode > >> (117302) > >> Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1): > >> ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 - > >> offset=24, inode=22417991, rec_len=28145, name_len=8 > >> > >> Any hints how to solve this problem or to isolate the failure ? > >> > >> Best regards and thanks in advance for your help, > >> > >> Jan > >> > >> _______________________________________________ > >> Ext3-users mailing list > >> Ext3-users at redhat.com > >> https://www.redhat.com/mailman/listinfo/ext3-users > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From jan at netropol.de Fri Dec 29 21:10:51 2006 From: jan at netropol.de (Jan) Date: Fri, 29 Dec 2006 21:10:51 +0000 Subject: Problem with ext3 filesystem In-Reply-To: <4592FC5D.70101@netropol.de> References: <4592FC5D.70101@netropol.de> Message-ID: <4595845B.7010403@netropol.de> I did some more test. First I split the raid in two partitions and run mkfs.ext3 -c for badblocks. then I tried to copy data on the first partion. after a while i got errors (around 130gb were on th partition) : attempt to access beyond end of device sda1: rw=0, want=3151373440, limit=1464846304 attempt to access beyond end of device sda1: rw=0, want=2853870672, limit=1464846304 attempt to access beyond end of device sda1: rw=0, want=1751501064, limit=1464846304 attempt to access beyond end of device sda1: rw=0, want=2783268072, limit=1464846304 attempt to access beyond end of device sda1: rw=0, want=2835511344, limit=1464846304 attempt to access beyond end of device so I googled and found this thread: http://lkml.org/lkml/2006/10/5/353 I tried the script from there on my partition (whith around 130 gb on it): dd bs=1M count=200 if=/dev/zero of=test0 while :; do echo "cp 0-1"; cp test0 test1 || break echo "cp 1-2"; cp test1 test2 || break echo "cp 2-3"; cp test2 test3 || break echo "cp 3-4"; cp test3 test4 || break echo "od 0" ; od test0 || break echo "rm 1"; rm test1 || break echo "rm 2"; rm test2 || break echo "rm 3"; rm test3 || break echo "rm 4"; rm test4 || break done The script was running not more then 10 minutes and I got: EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system zones - Block = 327680, count = 1 EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system zones - Block = 327680, count = 1 EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system zones - Block = 327680, count = 1 Then I create a new fs on the partition without any data on it and got the same errors while running ths script. On internal ide harddisk softwareraid-1 but partiton size only 4 gb no problem with the test script. The testscript with 10mb file is now running for several hours without any problems on the sata raid partition. New Partion with size of 340GB, testscript with 200mb files. Errors: ock = 65536, count = 1 EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system zones - Block = 65536, count = 1 EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system zones - Block = 65536, count = 1 EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system zones - Block = 65536, count = 1 EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda2) in ext3_truncate: IO failure EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda2) in ext3_orphan_del: Readonly filesystem EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda2) in ext3_delete_inode: IO failure And with 180 GB Partition: ree_blocks: Freeing blocks in system zones - Block = 262144, count = 1 EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda3): ext3_free_blocks: Freeing blocks in system zones - Block = 262144, count = 1 EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda3): ext3_free_blocks: Freeing blocks in system zones - Block = 262144, count = 1 EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda3) in ext3_truncate: IO failure EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda3) in ext3_orphan_del: Readonly filesystem EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda3) in ext3_delete_inode: IO failure Strange ? I still don't know how to solve the problem. Regards, Jan > Hey, > > I've a problem with an ext3 filesystem and don't know how to fix it or > find the failure :( > > The Hardware: > > Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with > 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and > avm isdn controller. > > Couse of the filesystem problems I run memtest and found one bad memory > module which I replaced yet. > > The System: > > Kernel 2.6.19.1 > Debian Gnu/Linux 3.0 with e2fsck 1.37 (21-Mar-2005) > > > I've setup one ext3 partition with around 1.4 TB on the raid5 volume. > The first four month we run the raid without any problems. About two > month ago I noticed that the filesystem was remounted ro. A filesystem > check found a lot of errors. After a filesystem check and a new mount of > the partition and copy data on the partition you get the errors again. > Also with Kernel 2.6.17.3 I got this problems. A raid volume check with > the areca command line tools doesn't find any errors. > > Errors from dmesg / kernel.log: > > > EXT3-fs: mounted filesystem with ordered data mode. > init_special_inode: bogus i_mode (113301) > init_special_inode: bogus i_mode (170101) > init_special_inode: bogus i_mode (115140) > init_special_inode: bogus i_mode (117302) > init_special_inode: bogus i_mode (111700) > EXT3-fs error (device sda1): ext3_readdir: bad entry in directory > #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466, > name_len=34 > > Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (111501) > Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (113301) > Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (170101) > Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (115140) > Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 - > offset=0, inode=3038782558, > rec_len=28425, name_len=75 > > > Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (111501) > Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (113301) > Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (170101) > Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (115140) > Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #20351025: directory entry across > blocks - offset=0, inode=20353857, rec_len=13600, name_len=1 > Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 - > offset=96, inode=20437734, rec_len=27291, name_len=6 > Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #21007912: directory entry across > blocks - offset=296, inode=21005643, rec_len=32184, name_len=25 > Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode (114764) > Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 - > offset=24, inode=21839878, rec_len=22019, name_len=7 > Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode (55314) > Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode (117302) > Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 - > offset=24, inode=22417991, rec_len=28145, name_len=8 > > Any hints how to solve this problem or to isolate the failure ? > > Best regards and thanks in advance for your help, > > Jan > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From lists at nerdbynature.de Fri Dec 29 20:45:48 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Fri, 29 Dec 2006 20:45:48 +0000 (GMT) Subject: Problem with ext3 filesystem In-Reply-To: <4595845B.7010403@netropol.de> References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de> Message-ID: On Fri, 29 Dec 2006, Jan wrote: > attempt to access beyond end of device > sda1: rw=0, want=2853870672, limit=1464846304 disk/cabling problems? just try dd if=/dev/sda of=/dev/null bs=8M and watch your syslog for errors. good luck, Christian. -- BOFH excuse #352: The cables are not the same length. From lists at nerdbynature.de Fri Dec 29 21:34:05 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Fri, 29 Dec 2006 21:34:05 +0000 (GMT) Subject: Problem with ext3 filesystem In-Reply-To: <459583F2.9060401@shaw.ca> References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de> <459583F2.9060401@shaw.ca> Message-ID: [please reply on-list, so that all can comment] On Fri, 29 Dec 2006, ..:::BeOS Mr. X:::.. wrote: > Hey how would I check my syslog on BeOS ? I have dd. Is there I way I can use > that to calculate my hard drive MB/s read speed ? um, I wasn't aware that ext3fs is supported on BeOS? Otherwise it's a bit off topic, no? 1) syslog: if BeOS has no syslog, how do you know what's wrong with the system? 2) if you have GNU/dd, you can send a USR1 to the dd process to see the progress and a MB/s value. still, I don't see the relation to ext3 or even with the OP's problem. cheers, Christian. -- BOFH excuse #328: Fiber optics caused gas main leak From mr._x at shaw.ca Sat Dec 30 00:46:12 2006 From: mr._x at shaw.ca (..:::BeOS Mr. X:::..) Date: Fri, 29 Dec 2006 16:46:12 -0800 Subject: Problem with ext3 filesystem In-Reply-To: References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de> <459583F2.9060401@shaw.ca> Message-ID: <4595B6D4.7060408@shaw.ca> Well I was just asking for interests sake, I do read this ext3 out of curiosity just to learn a few tips. I didtn want to bother the whole user group, but what is the USR1 bit ? How would I do that ? Christian Kujau wrote: > > > [please reply on-list, so that all can comment] > > On Fri, 29 Dec 2006, ..:::BeOS Mr. X:::.. wrote: >> Hey how would I check my syslog on BeOS ? I have dd. Is there I way I >> can use that to calculate my hard drive MB/s read speed ? > > um, I wasn't aware that ext3fs is supported on BeOS? Otherwise it's a > bit off topic, no? > > 1) syslog: if BeOS has no syslog, how do you know what's wrong with the > system? > 2) if you have GNU/dd, you can send a USR1 to the dd process to see the > progress and a MB/s value. > > still, I don't see the relation to ext3 or even with the OP's problem. > > cheers, > Christian. From bryan at kdzbn.homelinux.net Sat Dec 30 03:39:47 2006 From: bryan at kdzbn.homelinux.net (Bryan Kadzban) Date: Fri, 29 Dec 2006 22:39:47 -0500 Subject: Problem with ext3 filesystem In-Reply-To: <4595B6D4.7060408@shaw.ca> References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de> <459583F2.9060401@shaw.ca> <4595B6D4.7060408@shaw.ca> Message-ID: <4595DF83.2080902@kdzbn.homelinux.net> ..:::BeOS Mr. X:::.. wrote: > Well I was just asking for interests sake, I do read this ext3 out of > curiosity just to learn a few tips. I didtn want to bother the whole > user group, but what is the USR1 bit ? How would I do that ? It's not a bit, it's a signal. E.g.: killall -USR1 dd Or write a C program to make the kill(2) system call on the PID of the dd process every so often. Of course you will need privileges either way; either dd has to be running as you, or you need to be root. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature URL: From tytso at mit.edu Sat Dec 30 04:08:16 2006 From: tytso at mit.edu (Theodore Tso) Date: Fri, 29 Dec 2006 23:08:16 -0500 Subject: Problem with ext3 filesystem In-Reply-To: <4595845B.7010403@netropol.de> References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de> Message-ID: <20061230040816.GA27654@thunk.org> You might want to try "badblocks -w /dev/XXX" and see if it reports any bad blocks or I/O errors. This really seems like a device driver or hardware problem... - Ted From lists at nerdbynature.de Sat Dec 30 12:45:48 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Sat, 30 Dec 2006 12:45:48 +0000 (GMT) Subject: Problem with ext3 filesystem In-Reply-To: <20061230040816.GA27654@thunk.org> References: <4592FC5D.70101@netropol.de> <4595845B.7010403@netropol.de> <20061230040816.GA27654@thunk.org> Message-ID: On Fri, 29 Dec 2006, Theodore Tso wrote: > You might want to try "badblocks -w /dev/XXX" and see if it reports May I add that this options *destroys all data* on your device: "WARNING Never use the -w option on a device containing an existing file system. This option erases data! If you want to do write-mode testing on an existing file system, use the -n option instead. It is slower, but it will preserve your data." Christian. -- BOFH excuse #272: Netscape has crashed From samjnaa at gmail.com Sun Dec 31 04:18:35 2006 From: samjnaa at gmail.com (Shriramana Sharma) Date: Sun, 31 Dec 2006 09:48:35 +0530 Subject: Ext4 improvements Message-ID: <45973A1B.7010409@gmail.com> Please be patient with my ignorance if what I am asking is meaningless in any way. I am not too technically knowledgeable about filesystem internals but I am willing to learn. (I thought of posting to linux-ext4 but did not want to intrude within the technical threads with my layman thread.) From Wikipedia > ReiserFS article > Design section: [quote]ext2 and other Berkeley FFS-like filesystems simply use a fixed formula for computing inode locations, hence limiting the number of files they may contain. Most such filesystems also store directories as simple lists of entries, which makes directory lookups and updates linear time operations and degrades performance on very large directories. The single B+ tree design in ReiserFS avoids both of these problems due to better scalability properties.[/quote] So will ext4 avoid both of these problems just like ReiserFS? Does it use a B+ tree? Or this "dancing B* tree" that Reiser4 is supposed to have? Also: I found that a newly created ext3 partition uses 128 MB whereas a new reiser3 partition uses only 32 MB. I assume that the 128 MB is the space taken for the pre-allocated inodes or such. And I now come to know that others have this problem much more serious on bigger filesystems - [see comment 2 at http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-default-fs/]. If ext4 uses a B+ (or B*?) tree like ReiserFS then this space can be reduced, right? Thanks. Shriramana Sharma. P.S: Are there any recommended tutorials for learning filesystem basics? P.P.S: I just put this post here because I want to convert from reiserfs of uncertain future to ext4. From lists at nerdbynature.de Sun Dec 31 06:57:28 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Sun, 31 Dec 2006 06:57:28 +0000 (GMT) Subject: Ext4 improvements In-Reply-To: <45973A1B.7010409@gmail.com> References: <45973A1B.7010409@gmail.com> Message-ID: On Sun, 31 Dec 2006, Shriramana Sharma wrote: > So will ext4 avoid both of these problems just like ReiserFS? Does it > use a B+ tree? Or this "dancing B* tree" that Reiser4 is supposed to have? I cannot comment on stability/performance of ext4, but here are the specs: http://www.bullopensource.org/ext4/ > Also: I found that a newly created ext3 partition uses 128 MB whereas a > new reiser3 partition uses only 32 MB. are you talking about ext3 or ext4? I haven't tested ext4 yet but for ext3 it looks like this: $ df -h /mnt/test0 /mnt/test1 Filesystem Size Used Avail Use% Mounted on /tmp/reiser3.img 33M 33M 944K 98% /mnt/test0 /tmp/ext3.img 15M 1.6M 13M 11% /mnt/test1 (reiser3 needs at least a 32MB image file/device) > P.S: Are there any recommended tutorials for learning filesystem basics? Hm, I'm no filesystem guru but I suggest reading the specs and the source should help a lot... my 2 cents, Christian. -- BOFH excuse #127: Sticky bits on disk. From jan.stobbe at netropol.de Fri Dec 29 21:05:32 2006 From: jan.stobbe at netropol.de (Jan Stobbe) Date: Fri, 29 Dec 2006 21:05:32 +0000 Subject: Problem with ext3 filesystem In-Reply-To: <4592FC5D.70101@netropol.de> References: <4592FC5D.70101@netropol.de> Message-ID: <4595831C.6080001@netropol.de> I did some more test. First I split the raid in two partitions and run mkfs.ext3 -c for badblocks. then I tried to copy data on the first partion. after a while i got errors (around 130gb were on th partition) : attempt to access beyond end of device sda1: rw=0, want=3151373440, limit=1464846304 attempt to access beyond end of device sda1: rw=0, want=2853870672, limit=1464846304 attempt to access beyond end of device sda1: rw=0, want=1751501064, limit=1464846304 attempt to access beyond end of device sda1: rw=0, want=2783268072, limit=1464846304 attempt to access beyond end of device sda1: rw=0, want=2835511344, limit=1464846304 attempt to access beyond end of device so I googled and found this thread: http://lkml.org/lkml/2006/10/5/353 I tried the script from there on my partition (whith around 130 gb on it): dd bs=1M count=200 if=/dev/zero of=test0 while :; do echo "cp 0-1"; cp test0 test1 || break echo "cp 1-2"; cp test1 test2 || break echo "cp 2-3"; cp test2 test3 || break echo "cp 3-4"; cp test3 test4 || break echo "od 0" ; od test0 || break echo "rm 1"; rm test1 || break echo "rm 2"; rm test2 || break echo "rm 3"; rm test3 || break echo "rm 4"; rm test4 || break done The script was running not more then 10 minutes and I got: EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system zones - Block = 327680, count = 1 EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system zones - Block = 327680, count = 1 EXT3-fs error (device sda1) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks in system zones - Block = 327680, count = 1 Then I create a new fs on the partition without any data on it and got the same errors while running ths script. On internal ide harddisk softwareraid-1 but partiton size only 4 gb no problem with the test script. The testscript with 10mb file is now running for several hours without any problems on the sata raid partition. New Partion with size of 340GB, testscript with 200mb files. Errors: ock = 65536, count = 1 EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system zones - Block = 65536, count = 1 EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system zones - Block = 65536, count = 1 EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda2): ext3_free_blocks: Freeing blocks in system zones - Block = 65536, count = 1 EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda2) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda2) in ext3_truncate: IO failure EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda2) in ext3_orphan_del: Readonly filesystem EXT3-fs error (device sda2) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda2) in ext3_delete_inode: IO failure And with 180 GB Partition: ree_blocks: Freeing blocks in system zones - Block = 262144, count = 1 EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda3): ext3_free_blocks: Freeing blocks in system zones - Block = 262144, count = 1 EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device sda3): ext3_free_blocks: Freeing blocks in system zones - Block = 262144, count = 1 EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda3) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda3) in ext3_truncate: IO failure EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda3) in ext3_orphan_del: Readonly filesystem EXT3-fs error (device sda3) in ext3_reserve_inode_write: Readonly filesystem EXT3-fs error (device sda3) in ext3_delete_inode: IO failure Strange ? I still don't know how to solve the problem. Regards, Jan > Hey, > > I've a problem with an ext3 filesystem and don't know how to fix it or > find the failure :( > > The Hardware: > > Tyan mainboard, AMD Athlon CPU, ARECA ARC-1120 RaidController Raid5 with > 400GB Seagate HD's, 756 MB Ram, other harddisks for system, network and > avm isdn controller. > > Couse of the filesystem problems I run memtest and found one bad memory > module which I replaced yet. > > The System: > > Kernel 2.6.19.1 > Debian Gnu/Linux 3.0 with e2fsck 1.37 (21-Mar-2005) > > > I've setup one ext3 partition with around 1.4 TB on the raid5 volume. > The first four month we run the raid without any problems. About two > month ago I noticed that the filesystem was remounted ro. A filesystem > check found a lot of errors. After a filesystem check and a new mount of > the partition and copy data on the partition you get the errors again. > Also with Kernel 2.6.17.3 I got this problems. A raid volume check with > the areca command line tools doesn't find any errors. > > Errors from dmesg / kernel.log: > > > EXT3-fs: mounted filesystem with ordered data mode. > init_special_inode: bogus i_mode (113301) > init_special_inode: bogus i_mode (170101) > init_special_inode: bogus i_mode (115140) > init_special_inode: bogus i_mode (117302) > init_special_inode: bogus i_mode (111700) > EXT3-fs error (device sda1): ext3_readdir: bad entry in directory > #143278260: rec_len % 4 != 0 - offset=0, inode=1857588108, rec_len=8466, > name_len=34 > > Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (111501) > Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (113301) > Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (170101) > Dec 22 14:25:03 datahaven kernel: init_special_inode: bogus i_mode (115140) > Dec 22 14:25:19 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #150569204: rec_len %% 4 != 0 - > offset=0, inode=3038782558, > rec_len=28425, name_len=75 > > > Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (111501) > Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (113301) > Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (170101) > Dec 22 06:31:43 datahaven kernel: init_special_inode: bogus i_mode (115140) > Dec 22 06:31:54 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #20351025: directory entry across > blocks - offset=0, inode=20353857, rec_len=13600, name_len=1 > Dec 22 06:31:55 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #20417957: rec_len %% 4 != 0 - > offset=96, inode=20437734, rec_len=27291, name_len=6 > Dec 22 06:31:59 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #21007912: directory entry across > blocks - offset=296, inode=21005643, rec_len=32184, name_len=25 > Dec 22 06:32:24 datahaven kernel: init_special_inode: bogus i_mode (114764) > Dec 22 06:32:29 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #21839877: rec_len %% 4 != 0 - > offset=24, inode=21839878, rec_len=22019, name_len=7 > Dec 22 06:32:30 datahaven kernel: init_special_inode: bogus i_mode (55314) > Dec 22 06:32:34 datahaven kernel: init_special_inode: bogus i_mode (117302) > Dec 22 06:32:36 datahaven kernel: EXT3-fs error (device sda1): > ext3_readdir: bad entry in directory #22448122: rec_len %% 4 != 0 - > offset=24, inode=22417991, rec_len=28145, name_len=8 > > Any hints how to solve this problem or to isolate the failure ? > > Best regards and thanks in advance for your help, > > Jan > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users -- Jan Stobbe Netropol Digitale Systeme GmbH jan.stobbe at netropol.de Stresemannstrasse 161 Tel: +49 40 284167-20 D-22769 Hamburg/Germany Fax: +49 40 284167-40 http://www.netropol.de/