From cajun at cajuninc.com Wed Nov 1 01:15:30 2006 From: cajun at cajuninc.com (M. Lewis) Date: Tue, 31 Oct 2006 20:15:30 -0500 Subject: e2fsck: Bad magic number in super-block Message-ID: <4547F532.3030504@cajuninc.com> I posted this to the Fedora-list, but thought I might get some additional information here as well. I have a HD that refuses to mount with a 'bad magic number in super-block'. I'm running FedoraCore 6 x86_64. [root at moe ~]# fdisk -l /dev/hdc Disk /dev/hdc: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hdc1 * 1 13 104391 83 Linux /dev/hdc2 14 9729 78043770 8e Linux LVM [root at moe ~]# mount -t ext3 /dev/hdc2 /Big-Drive/ mount: wrong fs type, bad option, bad superblock on /dev/hdc2, missing codepage or other error In some cases useful info is found in syslog - try dmesg | tail or so [root at moe ~]# e2fsck -b 11239425 /dev/hdc2 e2fsck 1.39 (29-May-2006) e2fsck: Invalid argument while trying to open /dev/hdc2 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 [root at moe ~]# !624 mke2fs -n /dev/hdc2 mke2fs 1.39 (29-May-2006) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 9764864 inodes, 19510942 blocks 975547 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 596 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424 I've tried 'e2fsck -b (superblock) /dev/hdc2 on all the superblocks listed above to no avail. I've read about 'mke2fs -S' as being a possible solution, however I see that it is recommended as a last resort. Therefore I have held off on trying that method. I'm afraid I'm toasted, however I'm still hopeful that I might recover some (or all) of my data. Have I overlooked something? Thanks, Mike -- IBM: Insanely Better Marketing 18:20:01 up 1 day, 4:08, 0 users, load average: 0.12, 0.27, 0.25 Linux Registered User #241685 http://counter.li.org From mnalis-ml at voyager.hr Wed Nov 1 11:58:37 2006 From: mnalis-ml at voyager.hr (Matija Nalis) Date: Wed, 1 Nov 2006 12:58:37 +0100 Subject: e2fsck: Bad magic number in super-block In-Reply-To: <4547F532.3030504@cajuninc.com> References: <4547F532.3030504@cajuninc.com> Message-ID: <20061101115837.GA3046@eagle102.home.lan> On Tue, Oct 31, 2006 at 08:15:30PM -0500, M. Lewis wrote: > I posted this to the Fedora-list, but thought I might get some > additional information here as well. > > I have a HD that refuses to mount with a 'bad magic number in > super-block'. I'm running FedoraCore 6 x86_64. > > [root at moe ~]# fdisk -l /dev/hdc > > Disk /dev/hdc: 250.0 GB, 250059350016 bytes > 255 heads, 63 sectors/track, 30401 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/hdc1 * 1 13 104391 83 Linux > /dev/hdc2 14 9729 78043770 8e Linux LVM are you ABSOLUTELY SURE that /dev/hdc2 really containt DIRECTLY the ext3 filesystem ? By FDISK output, it looks like it is not ext3 partition, but a physical volume controlled by LVM, so you should use LVM tools to find real data (vgscan, vgdisplay, pvscan, lvscan, ...) the ext3 partition you are looking for is probably on logical volume on that LVM... -- Opinions above are GNU-copylefted. From cajun at cajuninc.com Wed Nov 1 19:27:16 2006 From: cajun at cajuninc.com (M. Lewis) Date: Wed, 01 Nov 2006 14:27:16 -0500 Subject: e2fsck: Bad magic number in super-block In-Reply-To: <20061101115837.GA3046@eagle102.home.lan> References: <4547F532.3030504@cajuninc.com> <20061101115837.GA3046@eagle102.home.lan> Message-ID: <4548F514.60702@cajuninc.com> Matija Nalis wrote: > On Tue, Oct 31, 2006 at 08:15:30PM -0500, M. Lewis wrote: >> I posted this to the Fedora-list, but thought I might get some >> additional information here as well. >> >> I have a HD that refuses to mount with a 'bad magic number in >> super-block'. I'm running FedoraCore 6 x86_64. >> >> [root at moe ~]# fdisk -l /dev/hdc >> >> Disk /dev/hdc: 250.0 GB, 250059350016 bytes >> 255 heads, 63 sectors/track, 30401 cylinders >> Units = cylinders of 16065 * 512 = 8225280 bytes >> >> Device Boot Start End Blocks Id System >> /dev/hdc1 * 1 13 104391 83 Linux >> /dev/hdc2 14 9729 78043770 8e Linux LVM > > are you ABSOLUTELY SURE that /dev/hdc2 really containt DIRECTLY the ext3 > filesystem ? > > By FDISK output, it looks like it is not ext3 partition, but a physical > volume controlled by LVM, so you should use LVM tools to find real data > (vgscan, vgdisplay, pvscan, lvscan, ...) > > the ext3 partition you are looking for is probably on logical volume on that > LVM... > Thanks Matija. No, at this point the only thing I am sure of is I can't mount the drive with my data. I'm not familiar with any of the LVM tools. Here's the output of the tools you suggested: [root at moe ~]# vgscan Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2 [root at moe ~]# vgdisplay --- Volume group --- VG Name VolGroup00 System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 3 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 1 Act PV 1 VG Size 74.41 GB PE Size 32.00 MB Total PE 2381 Alloc PE / Size 2380 / 74.38 GB Free PE / Size 1 / 32.00 MB VG UUID vXWCaM-XkRG-l28x-tyH1-Rrf4-KjXF-AfASLY [root at moe ~]# pvscan PV /dev/hda2 VG VolGroup00 lvm2 [74.41 GB / 32.00 MB free] Total: 1 [74.41 GB] / in use: 1 [74.41 GB] / in no VG: 0 [0 ] [root at moe ~]# lvscan ACTIVE '/dev/VolGroup00/LogVol00' [72.44 GB] inherit ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit -- Dreams are free, but you get soaked on the connect time. 14:25:01 up 2 days, 13 min, 0 users, load average: 0.09, 0.28, 0.25 Linux Registered User #241685 http://counter.li.org From bryan at kadzban.is-a-geek.net Wed Nov 1 22:36:34 2006 From: bryan at kadzban.is-a-geek.net (Bryan Kadzban) Date: Wed, 01 Nov 2006 17:36:34 -0500 Subject: e2fsck: Bad magic number in super-block In-Reply-To: <4548F514.60702@cajuninc.com> References: <4547F532.3030504@cajuninc.com> <20061101115837.GA3046@eagle102.home.lan> <4548F514.60702@cajuninc.com> Message-ID: <45492172.3050000@kadzban.is-a-geek.net> I likewise know very little about the LVM tools, but this: M. Lewis wrote: > [root at moe ~]# lvscan > ACTIVE '/dev/VolGroup00/LogVol00' [72.44 GB] inherit > ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit implies to me that you might try fdisk'ing /dev/VolGroup00/LogVol00 and /dev/VolGroup00/LogVol01 instead. Perhaps one of those has a partition table that you could fsck. (Or perhaps both of those *are* partitions that you can fsck.) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature URL: From haven at thehavennet.org.uk Thu Nov 2 16:13:34 2006 From: haven at thehavennet.org.uk (Simon Alman) Date: Thu, 2 Nov 2006 16:13:34 -0000 (UTC) Subject: RHEL connundrum with df and du Message-ID: <40732.84.12.36.186.1162484014.squirrel@saratoga.thehavennet.org.uk> Hi All I am having an issue with disk space and since it is happening using an ext3 formatted partition I felt that this would be the most appropriate list to post to. The problem is this; I have access to two RHEL systems (one running RHEL3 and one running RHEL4). Both are in the same company and both exhibit the same problem that I have not seen anywhere else before. Both have full / partitions. df shows this to be the case so it must be true ... right ? Well du disagrees, on one system with a full 20GB / parition that df shows to be full, df can only find 3.6GB of files. So off to the redhat FAQ I went and found this: http://kbase.redhat.com/faq/FAQ_35_5209.shtm Great that explains the problem precisely ... except it didn't help. No processes were holding large deleted files in a locked state. So I looked at inodes thinking that they may have all been used up ... they haven't df shows only 5% inode usage. I forced an fsck run on the partition on reboot and neither this nor the reboot helped, fsck shows clean and after the reboot things were still broken. So I'm sat here trying to explain to the client why their 5k worth of dell server is currently no much use to them (I can't install anything on it due to the space issue ...). Has anyone come across anything similar before ? I've worked with linux for six years and this is a new one on me ... and twice in the same company. I suspect major kernel b0rkage since both systems use Dell's RHEL build for the specific model of server but proving it is beyond me right now. Any help/advice would be very gratefully appreciated. Kind regards Simon Alman From neilb at cse.unsw.edu.au Fri Nov 3 01:09:09 2006 From: neilb at cse.unsw.edu.au (Neil Brown) Date: Fri, 3 Nov 2006 12:09:09 +1100 Subject: question about exact behaviour with data=ordered. Message-ID: <17738.38581.928915.23988@cse.unsw.edu.au> Suppose I have a large machine with 24Gig of memory, and I write a 2 gig file. This is below 10% and so background_dirty_threshold wont cause any write out. Suppose that a regular journal commit is then triggered. Am I correct in thinking this will flush out the full 2Gig, causing the commit to take about 30 seconds if the drive sustains 60Meg/second? If so, what other operations will be blocked while the commit happens? I assume sync updates (rm, chmod, mkdir etc) will block? Is it safe to assume that normaly async writes won't block? What about if they extend the file and so change the file size? What about atime updates? Could they ever block for the full 30 seconds? Supposing lots of stuff would block for 30seconds, is there anything that could be done to improve this? Would it be possible (easy?) to modify the commit process to flush out 'ordered' data without locking the journal? As you might guess, we have a situation where writing large files on a large-memory machine is causing occasional bad fs delays and I'm trying to understand what is going on. Thanks for any input, NeilBrown From keld at dkuug.dk Sun Nov 5 00:25:22 2006 From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen) Date: Sun, 5 Nov 2006 01:25:22 +0100 Subject: compressed read-only ext3 file system Message-ID: <20061105002522.GA6981@rap.rap.dk> Hi I am looking for a compressed ext3 file system, for a read-only purpose. The idea is that I would like to make a live-cd that is fast to install. The install should be almost just a raw copy of what is on the cd, uncompressed. In that way I should be able to make an install of a full Linux system of say 3 GB in under 2 minutes. The fs type I would like to unpack is ext3 - but other fs types should be doable as well. Then I would like to run the cd from the cdrom drive, so some kind of live-cd running code should also be available. Has this been done before? Is the idea feasible? best regards keld From adilger at clusterfs.com Tue Nov 7 00:06:02 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Mon, 6 Nov 2006 17:06:02 -0700 Subject: e2defrag - Unable to allocate buffer for inode priorities In-Reply-To: <87iri0ma8s.fsf@informatik.uni-tuebingen.de> References: <20061031171050.GG5655@schatzie.adilger.int> <20061031192947.GA12277@thunk.org> <87iri0ma8s.fsf@informatik.uni-tuebingen.de> Message-ID: <20061107000602.GE6012@schatzie.adilger.int> On Oct 31, 2006 22:44 +0100, Goswin von Brederlow wrote: > It should be doing that (checking for ext3 I can confirm) as of > > It doesn't handle ext3 right and does know so: > > # mke2fs -j /dev/ram0 > # e2defrag -r /dev/ram0 > > e2defrag (/dev/ram0): ext3 filesystems not (yet) supported > > It hapily defrags a filesystem with resize_inode. Is it destroying > resize capability or directly destroying data? It is destroying the resize capability. The primary issue here is that tools which manipulate the filesystem directly (e.g. e2fsprogs) have to understand ALL of the *COMPAT flags, and not just the INCOMPAT flags. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From brederlo at informatik.uni-tuebingen.de Tue Nov 7 03:36:17 2006 From: brederlo at informatik.uni-tuebingen.de (Goswin von Brederlow) Date: Tue, 07 Nov 2006 04:36:17 +0100 Subject: e2defrag - Unable to allocate buffer for inode priorities In-Reply-To: <20061107000602.GE6012@schatzie.adilger.int> (Andreas Dilger's message of "Mon, 6 Nov 2006 17:06:02 -0700") References: <20061031171050.GG5655@schatzie.adilger.int> <20061031192947.GA12277@thunk.org> <87iri0ma8s.fsf@informatik.uni-tuebingen.de> <20061107000602.GE6012@schatzie.adilger.int> Message-ID: <87bqnknd1q.fsf@informatik.uni-tuebingen.de> Andreas Dilger writes: > On Oct 31, 2006 22:44 +0100, Goswin von Brederlow wrote: >> It should be doing that (checking for ext3 I can confirm) as of >> >> It doesn't handle ext3 right and does know so: >> >> # mke2fs -j /dev/ram0 >> # e2defrag -r /dev/ram0 >> >> e2defrag (/dev/ram0): ext3 filesystems not (yet) supported >> >> It hapily defrags a filesystem with resize_inode. Is it destroying >> resize capability or directly destroying data? > > It is destroying the resize capability. The primary issue here is > that tools which manipulate the filesystem directly (e.g. e2fsprogs) > have to understand ALL of the *COMPAT flags, and not just the INCOMPAT > flags. > > Cheers, Andreas Defrag should leave special inodes well enough alone (and I know it does not, hence the ext3 incompatibiliy) and then it should preserve all compat features. Time for some more fixing. MfG Goswin From adilger at clusterfs.com Tue Nov 7 17:30:51 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 7 Nov 2006 10:30:51 -0700 Subject: e2defrag - Unable to allocate buffer for inode priorities In-Reply-To: <87bqnknd1q.fsf@informatik.uni-tuebingen.de> References: <20061031171050.GG5655@schatzie.adilger.int> <20061031192947.GA12277@thunk.org> <87iri0ma8s.fsf@informatik.uni-tuebingen.de> <20061107000602.GE6012@schatzie.adilger.int> <87bqnknd1q.fsf@informatik.uni-tuebingen.de> Message-ID: <20061107173051.GH6012@schatzie.adilger.int> On Nov 07, 2006 04:36 +0100, Goswin von Brederlow wrote: > Andreas Dilger writes: > > The primary issue here is > > that tools which manipulate the filesystem directly (e.g. e2fsprogs) > > have to understand ALL of the *COMPAT flags, and not just the INCOMPAT > > flags. > > Defrag should leave special inodes well enough alone (and I know it > does not, hence the ext3 incompatibiliy) and then it should preserve > all compat features. Compat features are not always related to special inodes. For example, the COMPAT_DIR_INDEX feature is for the directory indexing and has nothing to do with special inodes. In this case, defrag shouldn't break the indexing, but it depends on each specific feature. Hence my assertion that defrag needs to understand EVERY feature flag in the filesystem before it touches the filesystem. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From raghuni at cossindia.org Thu Nov 9 06:44:34 2006 From: raghuni at cossindia.org (Raghu Ni) Date: Thu, 9 Nov 2006 12:14:34 +0530 Subject: How to create a huge file system - 3-4TB? Message-ID: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com> We have a server with about 6x750Gb SATA drives setup on a hardware RAID controller. We created hardware RAID 5 on these 6x750GB HDDs. The effective size after RAID 5 implementation is 3.4TB. This server we want to use it as a data backup server. Here is the problem we are stuck with, when we use fdisk -l, we can see the drive specs and its size as 3.4TB. But when we want to create two different partitions of 1.7TB each, then we get the error "out of range" while specifying cylinders. And if we go for one single partition of 3.4TB, mke2fs returns error when we format the partition for ext3 file system and after some specific duration it exits with a error "Inodes not found... " similar errors. Any help / suggesstions / ideas to get around this problem are highly appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adilger at clusterfs.com Thu Nov 9 08:24:19 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 9 Nov 2006 01:24:19 -0700 Subject: How to create a huge file system - 3-4TB? In-Reply-To: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com> References: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com> Message-ID: <20061109082419.GC6012@schatzie.adilger.int> On Nov 09, 2006 12:14 +0530, Raghu Ni wrote: > We have a server with about 6x750Gb SATA drives setup on a hardware RAID > controller. We created hardware RAID 5 on these 6x750GB HDDs. The effective > size after RAID 5 implementation is 3.4TB. This server we want to use it as > a data backup server. > > Here is the problem we are stuck with, when we use fdisk -l, we can see the > drive specs and its size as 3.4TB. But when we want to create two different > partitions of 1.7TB each, then we get the error "out of range" while > specifying cylinders. > > And if we go for one single partition of 3.4TB, mke2fs returns error when we > format the partition for ext3 file system and after some specific duration > it exits with a error "Inodes not found... " similar errors. Don't use a partition at all. Just make the filesystem directly on the whole device (e.g. mke2fs /dev/sda). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From raghuni at cossindia.org Thu Nov 9 09:12:00 2006 From: raghuni at cossindia.org (Raghu Ni) Date: Thu, 9 Nov 2006 14:42:00 +0530 Subject: How to create a huge file system - 3-4TB? In-Reply-To: <20061109082419.GC6012@schatzie.adilger.int> References: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com> <20061109082419.GC6012@schatzie.adilger.int> Message-ID: <6b16facb0611090112n3a2d59edl9177833953def72b@mail.gmail.com> Can also use this technique for md device ? On 11/9/06, Andreas Dilger wrote: > > On Nov 09, 2006 12:14 +0530, Raghu Ni wrote: > > We have a server with about 6x750Gb SATA drives setup on a hardware RAID > > controller. We created hardware RAID 5 on these 6x750GB HDDs. The > effective > > size after RAID 5 implementation is 3.4TB. This server we want to use it > as > > a data backup server. > > > > Here is the problem we are stuck with, when we use fdisk -l, we can see > the > > drive specs and its size as 3.4TB. But when we want to create two > different > > partitions of 1.7TB each, then we get the error "out of range" while > > specifying cylinders. > > > > And if we go for one single partition of 3.4TB, mke2fs returns error > when we > > format the partition for ext3 file system and after some specific > duration > > it exits with a error "Inodes not found... " similar errors. > > Don't use a partition at all. Just make the filesystem directly on the > whole > device (e.g. mke2fs /dev/sda). > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlb17 at duke.edu Thu Nov 9 13:03:03 2006 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Thu, 9 Nov 2006 08:03:03 -0500 (EST) Subject: How to create a huge file system - 3-4TB? In-Reply-To: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com> References: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com> Message-ID: On Thu, 9 Nov 2006 at 12:14pm, Raghu Ni wrote > Here is the problem we are stuck with, when we use fdisk -l, we can see the > drive specs and its size as 3.4TB. But when we want to create two different > partitions of 1.7TB each, then we get the error "out of range" while > specifying cylinders. > > And if we go for one single partition of 3.4TB, mke2fs returns error when we > format the partition for ext3 file system and after some specific duration > it exits with a error "Inodes not found... " similar errors. > > Any help / suggesstions / ideas to get around this problem are highly > appreciated. fdisk can't handle devices larger than 2TiB. If you really want to use partitions, use parted and create a gpt disklabel (the standard msdos won't work either). Note that you won't be able to boot from this disk. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From raghuni at cossindia.org Fri Nov 10 06:07:28 2006 From: raghuni at cossindia.org (Raghu Ni) Date: Fri, 10 Nov 2006 11:37:28 +0530 Subject: How to create a huge file system - 3-4TB? In-Reply-To: References: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com> Message-ID: <6b16facb0611092207p2f090f1ch4d34465ac9172e14@mail.gmail.com> Thaks for all your inputs.. We tried with parted and we are success in createing two 1.7 TB partitions. RaghuNi On 11/9/06, Joshua Baker-LePain wrote: > > On Thu, 9 Nov 2006 at 12:14pm, Raghu Ni wrote > > > Here is the problem we are stuck with, when we use fdisk -l, we can see > the > > drive specs and its size as 3.4TB. But when we want to create two > different > > partitions of 1.7TB each, then we get the error "out of range" while > > specifying cylinders. > > > > And if we go for one single partition of 3.4TB, mke2fs returns error > when we > > format the partition for ext3 file system and after some specific > duration > > it exits with a error "Inodes not found... " similar errors. > > > > Any help / suggesstions / ideas to get around this problem are highly > > appreciated. > > fdisk can't handle devices larger than 2TiB. If you really want to use > partitions, use parted and create a gpt disklabel (the standard msdos > won't work either). Note that you won't be able to boot from this disk. > > -- > Joshua Baker-LePain > Department of Biomedical Engineering > Duke University > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scinteeb at yahoo.com Thu Nov 16 01:52:50 2006 From: scinteeb at yahoo.com (Bogdan Scintee) Date: Wed, 15 Nov 2006 17:52:50 -0800 (PST) Subject: ext3 corrupted Message-ID: <20061116015250.76652.qmail@web56503.mail.re3.yahoo.com> Hi there, For years I've been using the ext3 file system without to think that it can ever gets broken so bad. This was until last week when a box that I have running Linux from a SanDisk CF went down. Since then I am struggling with this CF trying to understand what is happening. The CF is SanDisk ultra II 1GB. On this I have 4 partitions all of them with ext3: boot / swap data On "data" partition I am doing a kind of logging. On power failure the box went down and never came back. The problem is of course the data partition. running e2fsck /dev/sdc8 returns: e2fsck 1.38 (30-Jun-2005) /dev/sdc8: Attempt to read block from filesystem resulted in short read while reading block 275 /dev/sdc8: Attempt to read block from filesystem resulted in short read reading journal superblock e2fsck: Attempt to read block from file-system resulted in short read while checking ext3 journal for /dev/sdc8 which looks like being the result of the write access on power failure. Then I did: mke2fs -n /dev/sdc8: mke2fs 1.38 (30-Jun-2005) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) 108544 inodes, 433940 blocks 21697 blocks (5.00%) reserved for the super user First data block=1 53 block groups 8192 blocks per group, 8192 fragments per group 2048 inodes per group Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409 I performed than the fsck.ext3 using the backup superblocks (e2fsck -c -b /dev/sdc8) but I got constantly the result: e2fsck 1.38 (30-Jun-2005) /dev/sdc8: Attempt to read block from filesystem resulted in short read while reading block 275 /dev/sdc8: Attempt to read block from filesystem resulted in short read reading journal superblock e2fsck: Attempt to read block from filesystem resulted in short read while checking ext3 journal for /dev/sdc8 The disappointing thing is that a windows application (Nucleus Kernel Linux) shows the file system and recover completely the partition in very short time. There is any suggestion about what should I do? Best regards, Bogdan. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at nerdbynature.de Thu Nov 16 07:45:27 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Thu, 16 Nov 2006 07:45:27 +0000 (GMT) Subject: ext3 corrupted In-Reply-To: <20061116015250.76652.qmail@web56503.mail.re3.yahoo.com> References: <20061116015250.76652.qmail@web56503.mail.re3.yahoo.com> Message-ID: On Wed, 15 Nov 2006, Bogdan Scintee wrote: > e2fsck 1.38 (30-Jun-2005) If you can, please upgrade to a current version of e2fsprogs: http://sourceforge.net/project/showfiles.php?group_id=2406 > /dev/sdc8: Attempt to read block from filesystem resulted in short read while reading block 275 > /dev/sdc8: Attempt to read block from filesystem resulted in short read reading journal superblock > e2fsck: Attempt to read block from file-system resulted in short read while checking ext3 journal for /dev/sdc8 Please check your system logfiles/dmesg for disk/IO errors. If there are any, your best bet is to save the raw data to a save place (with "dd" or better "dd_rescue") and try e2fsck again on the saved image, eg.: # dd_rescue if=/dev/your_CF_disk of=~/not-on-your-CF-disk/CF.img # e2fsck ~/not-on-your-CF-disk/CF.img ...and see what it gives. Christian. -- BOFH excuse #413: Cow-tippers tipped a cow onto the server. From lists at nerdbynature.de Thu Nov 16 19:59:46 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Thu, 16 Nov 2006 19:59:46 +0000 (GMT) Subject: ext3 corrupted In-Reply-To: <20061116191824.4948.qmail@web56503.mail.re3.yahoo.com> References: <20061116191824.4948.qmail@web56503.mail.re3.yahoo.com> Message-ID: [ please reply on-list, so that others can help too ] On Thu, 16 Nov 2006, Bogdan Scintee wrote: > Buffer I/O error on device sdc8, logical block 512 > sd 4:0:0:0: SCSI error: return code = 0x8000002 > sdc: Current: sense key: Medium Error > Additional sense: Unrecovered read error > end_request: I/O error, dev sdc, sector 1134514 So, it really is the device generating the errors, the filesystem can't do much here :( > I got finally a img file. Running the e2fsck on this image I got > > e2fsprogs-1.39/e2fsck/e2fsck -v -d sdc8.img Very good, but I forgot one thing: if you have enough space, make a backup of sdc8.img, then try e2fsck. If e2fsck screws up, you still have the backup, even if the device (your CF card) fails completly. > e2fsck 1.39 (29-May-2006) > Superblock has an invalid ext3 journal (inode 8). > Clear? yes > *** ext3 journal has been deleted - filesystem is now ext2 only *** > Superblock doesn't have has_journal flag, but has ext3 journal inode. > Clear? yes > sdc8.img was not cleanly unmounted, check forced. > Pass 1: Checking inodes, blocks, and sizes > Journal inode is not in use, but contains data. Clear? yes > Pass 2: Checking directory structure > Directory inode 2, block 0, offset 0: directory corrupted > Salvage? yes > Missing '.' in directory inode 2. > Fix? yes > Setting filetype for entry '.' in ??? (2) to 2. > Missing '..' in directory inode 2. > Fix? yes > Setting filetype for entry '..' in ??? (2) to 2. > Pass 3: Checking directory connectivity > '..' in / (2) is (0), should be / (2). > Fix? yes > Unconnected directory inode 4097 (/???) > Connect to /lost+found? yes > /lost+found not found. Create? yes > Unconnected directory inode 61441 (/???) > Connect to /lost+found? yes > Unconnected directory inode 8193 (/???) > Connect to /lost+found? yes > Unconnected directory inode 28673 (/???) > Connect to /lost+found? yes > Unconnected directory inode 30721 (/???) > Connect to /lost+found? yes > Unconnected directory inode 57345 (/???) > Connect to /lost+found? yes > Pass 4: Checking reference counts > Inode 4097 ref count is 3, should be 2. Fix? yes > Inode 8193 ref count is 5, should be 4. Fix? yes > Inode 28673 ref count is 3, should be 2. Fix? yes > Inode 30721 ref count is 7, should be 6. Fix? yes > Inode 57345 ref count is 3, should be 2. Fix? yes > Inode 61441 ref count is 9, should be 8. Fix? yes > Pass 5: Checking group summary information > Block bitmap differences: -(276--8192) -(8454--8761) > Fix? yes > Free blocks count wrong for group #0 (65535, counted=7917). > Fix? yes > Free blocks count wrong for group #1 (4763, counted=5071). > Fix? yes > Free blocks count wrong (285521, counted=293747). > Fix? yes > > sdc8.img: ***** FILE SYSTEM WAS MODIFIED ***** > 1051 inodes used (0%) > 125 non-contiguous inodes (11.9%) > # of inodes with ind/dind/tind blocks: 405/68/0 > 140196 blocks used (32%) > 0 bad blocks > 0 large files > 967 regular files > 74 directories > 0 character device files > 0 block device files > 0 fifos > -6 links > 0 symbolic links (0 fast symbolic links) > 0 sockets > -------- > 1035 files > > I tried different options during the recovery of the image but > unfortunately the result wasn't > as expected. At the end part of the files where recovered but the > parent folders names where replaced by > inode number (I guess) and attached to lost+found directory. > > I tried different options during the recovery of the image but unfortunately the result wasn't > as expected. At the end part of the files where recovered but the parent folders names where replaced by > inode number (I guess) and attached to lost+found directory. Yes, when you had a lot of small files, even 1 GB of data in lost+found is a pain to reconstruct :( > As I said in my previous e-mail the windoze application (Nucleus Kernel Linux) is very fast and seems to recover > more information from CF. I too came across some win32 tools to recover b0rked filesystems: one is called "R-Linux", the other one "Stellar Phoenix Linux", demos available. > I have a question now: The journal is kept only on the primary superblock or it has also copies on every alternative > superblock? I don't know... > My feeling is that the CF got a badblock exactly on the journal and the e2fsck can't correct the information, therefore > can't complete the job. > > Do you have any knowledge about a application which is able to handle such situation? Dunno, maybe somebody else will have a look on this... Christian. -- BOFH excuse #453: Spider infestation in warm case parts From oliver.hookins at anchor.com.au Thu Nov 16 01:42:09 2006 From: oliver.hookins at anchor.com.au (Oliver Hookins) Date: Thu, 16 Nov 2006 12:42:09 +1100 Subject: Online filesystem check Message-ID: <20061116014209.GA10244@captain.bridge.anchor.net.au> Hi all, a while ago one of my work colleagues attended a seminar with Ted Tso, and at the time Ted was talking about a way to perform online ext3 filesystem checks. I can't find any information on it by googling, is it implemented in current versions of the filesystem utilities? Either way, does anyone know where I can find more information on it? -- Regards, Oliver Hookins Anchor Systems From oliver.hookins at anchor.com.au Mon Nov 20 06:51:41 2006 From: oliver.hookins at anchor.com.au (Oliver Hookins) Date: Mon, 20 Nov 2006 17:51:41 +1100 Subject: Online filesystem check In-Reply-To: References: <20061116014209.GA10244@captain.bridge.anchor.net.au> Message-ID: <20061120065141.GA5612@captain.bridge.anchor.net.au> On Mon Nov 20, 2006 at 10:35:09 +0530, Saswat Praharaj wrote: >If you are talking about checking filesystem from the web , then it should >be straight forward. >You need to write a simple web server (free C source should be available in >the net). > >On "check file system" event, just invoke e2fsck and return the status to >browser. >You need to write an IPC to communicate between your webserver and e2fsck. > >Best, >-Saswat Sorry maybe you misunderstand. By "online" I am referring to "while the ext3 filesystem is mounted r/w". > >On 11/16/06, Oliver Hookins wrote: >> >>Hi all, >> >>a while ago one of my work colleagues attended a seminar with Ted Tso, and >>at the time Ted was talking about a way to perform online ext3 filesystem >>checks. I can't find any information on it by googling, is it implemented >>in >>current versions of the filesystem utilities? Either way, does anyone know >>where I can find more information on it? >> >>-- >>Regards, >>Oliver Hookins >>Anchor Systems >> >>_______________________________________________ >>Ext3-users mailing list >>Ext3-users at redhat.com >>https://www.redhat.com/mailman/listinfo/ext3-users >> -- Regards, Oliver Hookins Anchor Systems From lists at nerdbynature.de Tue Nov 21 19:09:41 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Tue, 21 Nov 2006 19:09:41 +0000 (GMT) Subject: 2.6.19-rc5-git4 benchmarks Message-ID: Apologies for the wide alias, but as it may interest serveral fs groups, here it is: In the everlasting search for the best fs for my shiny new disks, I was interested in some numbers, here're the results: http://nerdbynature.de/bench/amd64/2.6.19-rc5-git4/test-3/dm-crypt-3.html details: http://nerdbynature.de/wp/?cat=4 (in short: ext3 pretty fast in all operations. then again, the numbers suggest that sometimes a crypto-fs is faster than withou crypto, eg. 'ext3_no-cipher' vs. 'ext3_aes-cbc-essiv:md5'...that's strange, no?) Thanks, Christian. -- BOFH excuse #11: magnetic interference from money/credit cards From lists at nerdbynature.de Tue Nov 21 21:31:22 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Tue, 21 Nov 2006 21:31:22 +0000 (GMT) Subject: Online filesystem check In-Reply-To: <20061116014209.GA10244@captain.bridge.anchor.net.au> References: <20061116014209.GA10244@captain.bridge.anchor.net.au> Message-ID: On Thu, 16 Nov 2006, Oliver Hookins wrote: > checks. I can't find any information on it by googling, is it implemented in > current versions of the filesystem utilities? No, current version of e2fsprogs do not support online fsck. The most "online" you can get is trying to remount ro, then fsck the ro device. > Either way, does anyone know where I can find more information on it? I'm not aware that this was a planned feature of ext2/3 fs. I've heard about the *BSD guys trying to get this done for UFS, but it's not implemented there either, AFAIK. Not being an fs guru, online fsck really sounds difficult and I'm not sure if it's worth the battle.... Christian. -- BOFH excuse #72: Satan did it From lists at nerdbynature.de Thu Nov 23 00:58:18 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Thu, 23 Nov 2006 00:58:18 +0000 (GMT) Subject: BUG: warning at kernel/softirq.c:141 Message-ID: Hello ext3-users, we have an oopsy situation here: we have 4 machines: 3 client nodes, 1 master: the master holds a fairly big repository of small files. The repo's current size is ~40GB with ~1.2 M files in ~100 directories. Now, we like to rsync changes from the master to the client nodes, which is working perfectly for 2 nodes, but our 3rd node oopses "sometimes", rendering the machine unusable and we are forced to reboot the box (no serial console, no sysrq possible). Below is the oops and a few details, more details, .config, dmesg, tune2fs-l are here: http://nerdbynature.de/bits/2.6.18-debian Yes, it's a debian kernel, 2.6.18-2-k7 to be specific, it happend with 2.6.17-2-k7 too. We haven't tried vanilla yet. All boxen are the same hardware (amd64, 32bit kernel+userland (debian/unstable) 1GB ram). The filesystem is residing on a raid0-md, consisting of 2 sata-disks. Any ideas what could cause this? Thanks, Christian. f8836d37 Modules linked in: ipt_TCPMSS xt_tcpudp xt_state iptable_filter ip_conntrack_ftp ip_conntrack_irc ip_conntrack nfnetlink ip_tables x_tables ipv6 ipip tunnel4 dm_snapshot dm_mirror dm_mod shpchp pci_hotplug i2c_viapro psmouse i2c_core serio_raw pcspkr evdev amd64_agp agpgart parport_pc parport rtc floppy ide_generic r8169 uhci_hcd ehci_hcd usbcore thermal processor fan raid0 raid1 md_mod sata_via sd_mod libata scsi_mod via82cxxx ide_core ext3 jbd mbcache EIP: 0060:[] Not tainted VLI EFLAGS: 00010283 (2.6.18-2-k7 #1) [] journal_try_to_free_buffers+0x59/0x13a [jbd] [] ext3_releasepage+0x0/0x61 [ext3] [] try_to_release_page+0x34/0x46 [] shrink_inactive_list+0x44b/0x71c [] do_IRQ+0x48/0x52 [] common_interrupt+0x1a/0x20 [] dput+0x1a/0x119 [] prune_one_dentry+0x68/0x74 [] mb_cache_shrink_fn+0x1d/0xb5 [mbcache] [] shrink_zone+0xaf/0xd0 [] kswapd+0x295/0x399 [] autoremove_wake_function+0x0/0x2d [] kswapd+0x0/0x399 [] kthread+0xc2/0xef [] kthread+0x0/0xef [] kernel_thread_helper+0x5/0xb and for 2.6.17-2-k7: f0872d73 Modules linked in: ipv6 ipip tunnel4 dm_snapshot dm_mirror dm_mod shpchp pci_hotplug floppy i2c_viapro parport_pc i2c_core psmouse parport 8250_pnp serio_raw evdev amd64_agp agpgart pcspkr rtc raid10 raid6 raid5 xor multipath linear ide_generic r8169 uhci_hcd ehci_hcd usbcore thermal processor fan raid0 raid1 md_mod sata_via sd_mod libata scsi_mod via82cxxx ide_core ext3 jbd mbcache EIP: 0060:[] Not tainted VLI EFLAGS: 00210246 (2.6.17-2-k7 #1) BUG: warning at kernel/softirq.c:141/local_bh_enable() local_bh_enable+0x25/0x64 lock_sock+0x85/0x8d sock_fasync+0x5c/0x111 sock_close+0x1e/0x2a __fput+0x87/0x13c filp_close+0x4e/0x54 put_files_struct+0x64/0xa6 do_exit+0x1b0/0x6be bust_spinlocks+0x3a/0x43 die+0x1d3/0x288 die+0x263/0x288 do_page_fault+0x441/0x526 do_page_fault+0x0/0x526 error_code+0x4f/0x54 ext3_xattr_delete_inode+0x5/0xab [ext3] ext3_free_inode+0x92/0x2c7 [ext3] ext3_mark_inode_dirty+0x20/0x27 [ext3] ext3_delete_inode+0xa3/0xba [ext3] ext3_delete_inode+0x0/0xba [ext3] generic_delete_inode+0x9e/0x101 iput+0x5e/0x60 dput+0xfe/0x116 sys_renameat+0x15f/0x1b9 _atomic_dec_and_lock+0x2a/0x44 sys_rename+0x11/0x15 sysenter_past_esp+0x54/0x75 BUG: warning at kernel/softirq.c:141/local_bh_enable() local_bh_enable+0x25/0x64 sock_fasync+0x105/0x111 sock_close+0x1e/0x2a __fput+0x87/0x13c filp_close+0x4e/0x54 put_files_struct+0x64/0xa6 do_exit+0x1b0/0x6be bust_spinlocks+0x3a/0x43 die+0x1d3/0x288 die+0x263/0x288 do_page_fault+0x441/0x526 do_page_fault+0x0/0x526 error_code+0x4f/0x54 ext3_xattr_delete_inode+0x5/0xab [ext3] ext3_free_inode+0x92/0x2c7 [ext3] ext3_mark_inode_dirty+0x20/0x27 [ext3] ext3_delete_inode+0xa3/0xba [ext3] ext3_delete_inode+0x0/0xba [ext3] generic_delete_inode+0x9e/0x101 iput+0x5e/0x60 dput+0xfe/0x116 sys_renameat+0x15f/0x1b9 _atomic_dec_and_lock+0x2a/0x44 sys_rename+0x11/0x15 sysenter_past_esp+0x54/0x75 BUG: warning at kernel/softirq.c:141/local_bh_enable() local_bh_enable+0x25/0x64 unix_release_sock+0x5c/0x1bf sock_release+0x11/0x85 sock_close+0x26/0x2a __fput+0x87/0x13c filp_close+0x4e/0x54 put_files_struct+0x64/0xa6 do_exit+0x1b0/0x6be bust_spinlocks+0x3a/0x43 die+0x1d3/0x288 die+0x263/0x288 do_page_fault+0x441/0x526 do_page_fault+0x0/0x526 error_code+0x4f/0x54 ext3_xattr_delete_inode+0x5/0xab [ext3] ext3_free_inode+0x92/0x2c7 [ext3] ext3_mark_inode_dirty+0x20/0x27 [ext3] ext3_delete_inode+0xa3/0xba [ext3] ext3_delete_inode+0x0/0xba [ext3] generic_delete_inode+0x9e/0x101 iput+0x5e/0x60 dput+0xfe/0x116 sys_renameat+0x15f/0x1b9 _atomic_dec_and_lock+0x2a/0x44 sys_rename+0x11/0x15 sysenter_past_esp+0x54/0x75 -- BOFH excuse #324: Your packets were eaten by the terminator From coywolf at sosdg.org Thu Nov 23 09:06:08 2006 From: coywolf at sosdg.org (Coywolf Qi Hunt) Date: Thu, 23 Nov 2006 04:06:08 -0500 Subject: how does ext3 handle no communication to storage In-Reply-To: <20060829170351.GA30599@thunk.org> References: <44F33E3A.8020805@bnl.gov> <20060828205822.GB4944@thunk.org> <44F37285.8000104@bnl.gov> <20060829082003.GM20105@schatzie.adilger.int> <44F458AF.7040506@bnl.gov> <20060829170351.GA30599@thunk.org> Message-ID: <20061123090608.GA17728@everest.sosdg.org> On Tue, Aug 29, 2006 at 01:03:51PM -0400, Theodore Tso wrote: > On Tue, Aug 29, 2006 at 11:09:35AM -0400, Sev Binello wrote: > > From a strictly practical and immediate stand point, > > what is the best way to handle this situation if it should occur again in > > the near future ? > > Without any kernel patches, the best thing to do is, (a) don't restore > the path to the device, (b) unmount the filesystem, (c) Compile the > enclosed flushb program (also found in the e2fsprogs sources, but not > compiled by most or all distributions), and run it: "flushb > /dev/hdXX", and only after completing all of these steps, you can > restore the path and do fsck of the filesystem if you are feeling > sufficiently paranoid, and then remount it. option (d) run blockdev --flushbufs /dev/hdXX Ted, you may drop flushb. blockdev from util-linux can do it. - coywolf > > I wish I could offer you something better, but that's what we have at > the moment. > > - Ted > > /* > * flushb.c --- This routine flushes the disk buffers for a disk > * > * Copyright 1997, 2000, by Theodore Ts'o. > * > * WARNING: use of flushb on some older 2.2 kernels on a heavily loaded > * system will corrupt filesystems. This program is not really useful > * beyond for benchmarking scripts. > * > * %Begin-Header% > * This file may be redistributed under the terms of the GNU Public > * License. > * %End-Header% > */ > > #include > #include > #include > #include > #include > #include > #include > > /* For Linux, define BLKFLSBUF if necessary */ > #if (!defined(BLKFLSBUF) && defined(__linux__)) > #define BLKFLSBUF _IO(0x12,97) /* flush buffer cache */ > #endif > > const char *progname; > > static void usage(void) > { > fprintf(stderr, "Usage: %s disk\n", progname); > exit(1); > } > > int main(int argc, char **argv) > { > int fd; > > progname = argv[0]; > if (argc != 2) > usage(); > > fd = open(argv[1], O_RDONLY, 0); > if (fd < 0) { > perror("open"); > exit(1); > } > /* > * Note: to reread the partition table, use the ioctl > * BLKRRPART instead of BLKFSLBUF. > */ > if (ioctl(fd, BLKFLSBUF, 0) < 0) { > perror("ioctl BLKFLSBUF"); > exit(1); > } > return 0; > } -- Coywolf Qi Hunt From lists at nerdbynature.de Sun Nov 26 01:58:19 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Sun, 26 Nov 2006 01:58:19 +0000 (GMT) Subject: BUG: warning at kernel/softirq.c:141 [SOLVED] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2006, Christian Kujau wrote: > in ~100 directories. Now, we like to rsync changes from the master to the > client nodes, which is working perfectly for 2 nodes, but our 3rd node oopses > "sometimes", rendering the machine unusable and we are forced to reboot the running memtest86 did not reveal anything but the hosting company was kind enough to replace a few parts of the box and the oopses seem to be gone now. We suspect a faulty DIMM... sorry for the noise, Christian. -- BOFH excuse #169: broadcast packets on wrong frequency From Ralf-Lists at ralfgross.de Sun Nov 26 09:49:59 2006 From: Ralf-Lists at ralfgross.de (Ralf Gross) Date: Sun, 26 Nov 2006 10:49:59 +0100 (CET) Subject: ext3 4TB fs limit on amd64 (FAQ?) Message-ID: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de> Hi, I've a question about the max. ext3 FS size. The ext3 FAQ explains that the limit is 4TB. http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html | Ext3 can support files up to 1TB. With a 2.4 kernel the filesystem size is | limited by the maximal block device size, which is 2TB. In 2.6 the maximum | (32-bit CPU) limit is of block devices is 16TB, but ext3 supports only up | to 4TB. Other sources claim that the limit is 8TB (RedHat ES). I'm using ubuntu dapper drake 6.06 with kernel 2.6.15. At the momente I'm running the amd64 port. I successfully created the ext3 FS on a 4,5TB lvm partition (standard ext3 fs options) and was able to fill the whole fs with data. Afterwards I checked the data with md5sum and did a fsck, everything seems to be fine so far. Was it just luck that I didn't see any data corruption? Can I use ext3 for fs >4TB<8TB on amd64 these days? I also tried xfs, but unlike ext3 it repeatable froze the system when I ran the tiobench benchmark. Ralf From witscher at kulturbeutel.org Thu Nov 9 12:35:44 2006 From: witscher at kulturbeutel.org (witscher) Date: Thu, 09 Nov 2006 12:35:44 -0000 Subject: Ext3 - which blocksize for small files? Message-ID: <7257363.post@talk.nabble.com> Hi, I want to use an ext3 Partition (~1TB) for Mail Storage, this means tons of small files. Has anyone recommendations about blocksize, inodes, etc. for mkfs.ext3 ? Thanks in advance, David -- View this message in context: http://www.nabble.com/Ext3----which-blocksize-for-small-files--tf2601442.html#a7257363 Sent from the Ext3 - User mailing list archive at Nabble.com. From Ralf-Lists at ralfgross.de Mon Nov 20 08:44:58 2006 From: Ralf-Lists at ralfgross.de (Ralf Gross) Date: Mon, 20 Nov 2006 09:44:58 +0100 (CET) Subject: ext3 4TB FS limit (FAQ) Message-ID: <37856.141.113.101.32.1164012298.squirrel@www.stz-softwaretechnik.com> Hi, I've a question about the max. ext3 FS size. The ext3 FAQ explains that the limit is 4TB. http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html | Ext3 can support files up to 1TB. With a 2.4 kernel the filesystem size is | limited by the maximal block device size, which is 2TB. In 2.6 the maximum | (32-bit CPU) limit is of block devices is 16TB, but ext3 supports only up | to 4TB. Other sources claim that the limit is 8TB (RedHat ES). I'm using ubuntu dapper drake 6.06 with kernel 2.6.15. At the momente I'm running the i386 port. I successfully created the ext3 FS on a 4,5TB lvm partition (standard options) and was able to fill the whole FS with data. Afterwards I checked the data with md5sum and everything seems to be fine so far. Was it just luck that I didn't see any errors? Can one use ext3 for FS >4TB<8TB these days? Would it make any difference using the amd64/x86-64 port (Xeon, Core 2 Duo CPU). Ralf From lists at nerdbynature.de Tue Nov 28 01:34:38 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Tue, 28 Nov 2006 01:34:38 +0000 (GMT) Subject: Ext3 - which blocksize for small files? In-Reply-To: <7257363.post@talk.nabble.com> References: <7257363.post@talk.nabble.com> Message-ID: On Thu, 9 Nov 2006, witscher wrote: > I want to use an ext3 Partition (~1TB) for Mail Storage, this means tons of > small files. > Has anyone recommendations about blocksize, inodes, etc. for mkfs.ext3 ? from a recent mkfs.ext3 manpage: -T fs-type Specify how the filesystem is going to be used, so that mke2fs can choose optimal filesystem parameters for that use. The filesystem types that are can be supported are defined in the configuration file /etc/mke2fs.conf(5). The default configuration file contains definitions for the filesystem types: small, floppy, news, largefile, and largefile4. and a /etc/mke2fs.conf on a debian system reveals: [defaults] base_features = sparse_super,filetype,resize_inode,dir_index blocksize = 4096 inode_ratio = 8192 [fs_types] small = { blocksize = 1024 inode_ratio = 4096 } floppy = { blocksize = 1024 } news = { inode_ratio = 4096 } largefile = { inode_ratio = 1048576 } largefile4 = { inode_ratio = 4194304 } -- BOFH excuse #433: error: one bad user found in front of screen From lists at nerdbynature.de Tue Nov 28 07:16:57 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Tue, 28 Nov 2006 07:16:57 +0000 (GMT) Subject: ext3 4TB fs limit on amd64 (FAQ?) In-Reply-To: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de> References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de> Message-ID: On Sun, 26 Nov 2006, Ralf Gross wrote: > I've a question about the max. ext3 FS size. The ext3 FAQ explains that > the limit is 4TB. Hm, strange: I'm pretty sure that mkfs.ext3 understands bigger blocksizes for quite a while now. Then again, the FAQ says "Version: 2004-10-14"... So, although I'd really love to have this information (and the FAQ!) on http://e2fsprogs.sf.net/ this is what I found: blocksize file size limit filesystem size limit 1 KiB 16448 MiB (~ 16 GiB) 2048 GiB (= 2 TiB) 2 KiB 256 GiB 8192 GiB (= 8 TiB) 4 KiB 2048 GiB (= 2 TiB) 16384 GiB (= 16 TiB) 8 KiB 65568 GiB (~ 64 TiB) 32768 GiB (= 32 TiB) Note that an 8 KiB blocksize is only supported on systems with 8 KiB pagesize (i.e. linux/alpha). So, it really looks like 16TiB shouldn't be a problem...but I just stumbled over this: https://www.redhat.com/archives/ext3-users/2006-October/msg00000.html While the ext* gurus are busy on the ext4 list, I too would appreciate a comment on the current limitation of ext2/ext3/ext4, so that we can update the FAQ. These questions really come up way too often... Thanks, Christian. -- BOFH excuse #387: Your computer's union contract is set to expire at midnight. From Ralf-Lists at ralfgross.de Tue Nov 28 09:05:48 2006 From: Ralf-Lists at ralfgross.de (Ralf Gross) Date: Tue, 28 Nov 2006 10:05:48 +0100 (CET) Subject: ext3 4TB fs limit on amd64 (FAQ?) In-Reply-To: References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de> Message-ID: <4281.141.113.101.32.1164704748.squirrel@www.stz-softwaretechnik.com> Christian Kujau said: > On Sun, 26 Nov 2006, Ralf Gross wrote: >> I've a question about the max. ext3 FS size. The ext3 FAQ explains that >> the limit is 4TB. > > Hm, strange: I'm pretty sure that mkfs.ext3 understands bigger > blocksizes for quite a while now. Then again, the FAQ says > "Version: 2004-10-14"... > > So, although I'd really love to have this information (and the FAQ!) on > http://e2fsprogs.sf.net/ this is what I found: > > blocksize file size limit filesystem size limit > 1 KiB 16448 MiB (~ 16 GiB) 2048 GiB (= 2 TiB) > 2 KiB 256 GiB 8192 GiB (= 8 TiB) > 4 KiB 2048 GiB (= 2 TiB) 16384 GiB (= 16 TiB) > 8 KiB 65568 GiB (~ 64 TiB) 32768 GiB (= 32 TiB) > > Note that an 8 KiB blocksize is only supported on systems with 8 KiB > pagesize (i.e. linux/alpha). > > So, it really looks like 16TiB shouldn't be a problem...but I just > stumbled over this: > https://www.redhat.com/archives/ext3-users/2006-October/msg00000.html Thus with 4 KiB blocksize and a < 8TiB fs I should be on the safe side. > While the ext* gurus are busy on the ext4 list, I too would appreciate > a comment on the current limitation of ext2/ext3/ext4, so that we can > update the FAQ. These questions really come up way too often... Yes, the FAQ is a bit misleading. ralf From itlistuser at rapideye.de Tue Nov 28 14:05:51 2006 From: itlistuser at rapideye.de (Sebastian Reitenbach) Date: Tue, 28 Nov 2006 14:05:51 -0000 Subject: how to prevent filesystem check Message-ID: <20061128140551.A110E4087B2@ogo.rapideye.de> Hi all, I want to setup a RAID storage system, where i have two systems connected to it. the filesystems are mapped out to both connectors. I want the master host mount them read write, and the slave read only. in my fstab on the slave I have a line like the following: /dev/sdb1 /mount ext3 acl,noauto,user_xattr,nosuid,ro 0 0 so in man 5 fstab, it is written, that when the 6. field is 0, no filesystem check will be done at mount time. and in man mount, I read that, the nocheck parameter is the default, that means, that no filesystem checks should be performed when the partition is mounted. but when I mount the filesystem on the slave, I see the following messages in /var/log/messages: EXT3-fs: mounted filesystem with ordered data mode. EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, recovered transactions 99610 to 100072 (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 12498 and revoked 589/918 blocks kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. since I test this, the master server had occassional problems with the filesystem, so he decided to mount these read-only, and I had to fsck it. I think the filesystem got destroyed because of the filesystem ckecks, while mounting it readonly on the second server. I googled around, and found a similar message from someone mounting a XFS file system. So I am not sure, whether this is a mount or a ext3 problem. my kernel is a 2.6.12.6-bigsmp, on a SuSE 10.1. Is there any other way to prevent the slave server from doing any filesystem checks? kind regards Sebastian From tweeks at rackspace.com Tue Nov 28 16:58:09 2006 From: tweeks at rackspace.com (Thomas Weeks) Date: Tue, 28 Nov 2006 10:58:09 -0600 Subject: Best Practices for Data Recovery for corrupted EXT2/3? Message-ID: <200611281058.09288.tweeks@rackspace.com> Hey all.. I had a bad IDE controller that hosed my EXT3 filesystems. A resulting fsck damaged pat of the filesystem and the root inode is gone (on my main drive AND the backup drive). I immediately DD'd the main drive over to an identical drive that I have been working on. But every time.. a fsck destroys all the data (moves everything to lost+found) and nothing that I've found is able to restore the dir structure... or allow me to superposition any of the subdirs (such as /home/*). I've tried testdisk, dd_recover, and Autopsy.. mounting and fsking with alternate superblocks, all with no success. I would like to retain file names.. as I see that SOME filename/dir structure is intact when the fsck starts nuking all my files that don't have a parent dir (e.g. ../home/user/file1 --> lost+found). Is there a way that this information can be salvaged? Or a new fake root inode be slid into place and all the links associated? My last ditch effort will be to allow the migration to lost+found and then try to copy off files based on UID/GID/date, but I would really like to retain file names. Any related into would be useful... but my hopes are not high. Tweeks From richard.c.wolber at boeing.com Tue Nov 28 20:33:42 2006 From: richard.c.wolber at boeing.com (Wolber, Richard C) Date: Tue, 28 Nov 2006 12:33:42 -0800 Subject: how to prevent filesystem check In-Reply-To: <20061128140551.A110E4087B2@ogo.rapideye.de> Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com> Running the following command on your slave server should do the trick: echo "AUTOFSCK_DEF_CHECK=\"no\"" >> /etc/sysconfig/autofsck ..Chuck.. > -----Original Message----- > From: Sebastian Reitenbach [mailto:itlistuser at rapideye.de] > Sent: Tuesday, November 28, 2006 6:06 AM > To: ext3-users at redhat.com > Subject: how to prevent filesystem check > > Hi all, > > I want to setup a RAID storage system, where i have two > systems connected to it. the filesystems are mapped out to > both connectors. I want the master host mount them read > write, and the slave read only. > > in my fstab on the slave I have a line like the following: > /dev/sdb1 /mount ext3 acl,noauto,user_xattr,nosuid,ro 0 0 > > so in man 5 fstab, it is written, that when the 6. field is > 0, no filesystem check will be done at mount time. > > and in man mount, I read that, the nocheck parameter is the > default, that means, that no filesystem checks should be > performed when the partition is mounted. > > but when I mount the filesystem on the slave, I see the > following messages in /var/log/messages: > EXT3-fs: mounted filesystem with ordered data mode. > EXT3-fs: INFO: recovery required on readonly filesystem. > EXT3-fs: write access will be enabled during recovery. > (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, > exit status 0, recovered transactions 99610 to 100072 > (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed > 12498 and revoked > 589/918 blocks > kjournald starting. Commit interval 5 seconds > EXT3-fs: recovery complete. > > > since I test this, the master server had occassional problems > with the filesystem, so he decided to mount these read-only, > and I had to fsck it. > I think the filesystem got destroyed because of the > filesystem ckecks, while mounting it readonly on the second server. > > I googled around, and found a similar message from someone > mounting a XFS file system. So I am not sure, whether this is > a mount or a ext3 problem. > > my kernel is a 2.6.12.6-bigsmp, on a SuSE 10.1. > > Is there any other way to prevent the slave server from doing > any filesystem checks? > > > kind regards > Sebastian > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > From adilger at clusterfs.com Wed Nov 29 05:20:26 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 28 Nov 2006 21:20:26 -0800 Subject: how to prevent filesystem check In-Reply-To: <8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com> References: <20061128140551.A110E4087B2@ogo.rapideye.de> <8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com> Message-ID: <20061129052026.GA6429@schatzie.adilger.int> On Nov 28, 2006 12:33 -0800, Wolber, Richard C wrote: > Running the following command on your slave server should do the trick: > > echo "AUTOFSCK_DEF_CHECK=\"no\"" >> /etc/sysconfig/autofsck This is incorrect. As soon as the ext3 code mounts the filesystem it will do journal recovery and potentially corrupt the filesystem. Then, the read-only copy will become out-of-date in the cache of that client and it will get bogus data back, eventually deciding that the filesystem is corrupt (whether it is or not). You should just mount the filesystem on the client via NFS, that's what it's SUPPOSED to do. This is a good reason for the multi-mount protection feature that I proposed previously. It would mark the filesystem as in-use on one node and the filesystem itself would refuse to mount on the second node. Unfortunately, this idea met resistance from some of the other ext3 developers from merging it upstream. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From adilger at clusterfs.com Tue Nov 28 21:35:56 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 28 Nov 2006 13:35:56 -0800 Subject: how to prevent filesystem check In-Reply-To: <20061128140551.A110E4087B2@ogo.rapideye.de> References: <20061128140551.A110E4087B2@ogo.rapideye.de> Message-ID: <20061128213556.GA5673@schatzie.adilger.int> On Nov 28, 2006 14:05 -0000, Sebastian Reitenbach wrote: > I want to setup a RAID storage system, where i have two systems connected to > it. the filesystems are mapped out to both connectors. I want the master host > mount them read write, and the slave read only. This is NOT possible with ext2 or ext3 and can result in filesystem corruption. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From pengchengzou at gmail.com Wed Nov 29 05:58:58 2006 From: pengchengzou at gmail.com (Pengcheng Zou) Date: Wed, 29 Nov 2006 13:58:58 +0800 Subject: how to prevent filesystem check In-Reply-To: <20061128140551.A110E4087B2@ogo.rapideye.de> References: <20061128140551.A110E4087B2@ogo.rapideye.de> Message-ID: <24a313060611282158i593de443iac831ad668900ba2@mail.gmail.com> use 'noload' option to mount the readonly ext3 filesystem on the slave host, so the journal will not be loaded. BTW, this kind of setting could have some cache-coherence problem. why not do it correct way by using some kind of network filesystem (NFS) or clustering filesystem (GFS,Lustre)? On 11/28/06, Sebastian Reitenbach wrote: > Hi all, > > I want to setup a RAID storage system, where i have two systems connected to > it. the filesystems are mapped out to both connectors. I want the master host > mount them read write, and the slave read only. > > in my fstab on the slave I have a line like the following: > /dev/sdb1 /mount ext3 acl,noauto,user_xattr,nosuid,ro 0 0 > > so in man 5 fstab, it is written, that when the 6. field is 0, no filesystem > check will be done at mount time. > > and in man mount, I read that, the nocheck parameter is the default, that > means, that no filesystem checks should be performed when the partition is > mounted. > > but when I mount the filesystem on the slave, I see the following messages > in /var/log/messages: > EXT3-fs: mounted filesystem with ordered data mode. > EXT3-fs: INFO: recovery required on readonly filesystem. > EXT3-fs: write access will be enabled during recovery. > (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, > recovered transactions 99610 to 100072 > (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 12498 and revoked > 589/918 blocks > kjournald starting. Commit interval 5 seconds > EXT3-fs: recovery complete. > > > since I test this, the master server had occassional problems with the > filesystem, so he decided to mount these read-only, and I had to fsck it. > I think the filesystem got destroyed because of the filesystem ckecks, while > mounting it readonly on the second server. > > I googled around, and found a similar message from someone mounting a XFS file > system. So I am not sure, whether this is a mount or a ext3 problem. > > my kernel is a 2.6.12.6-bigsmp, on a SuSE 10.1. > > Is there any other way to prevent the slave server from doing any filesystem > checks? > > > kind regards > Sebastian > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > From itlistuser at rapideye.de Wed Nov 29 08:29:16 2006 From: itlistuser at rapideye.de (Sebastian Reitenbach) Date: Wed, 29 Nov 2006 08:29:16 -0000 Subject: how to prevent filesystem check Message-ID: <20061129082916.E2A2A78C056@ogo.rapideye.de> Hi, "Pengcheng Zou" wrote: > use 'noload' option to mount the readonly ext3 filesystem on the slave > host, so the journal will not be loaded. with the noload option, following output is shown at the mount command: mount: wrong fs type, bad option, bad superblock on /dev/sdb2, missing codepage or other error In some cases useful info is found in syslog - try dmesg | tail or so and in /var/log/messages: Nov 29 09:17:17 srv3 kernel: ext3: No journal on filesystem on sdb2 as this is a valid option, the same fs type, must be a bad superblock? is ther anything I can do about it? > > BTW, this kind of setting could have some cache-coherence problem. why > not do it correct way by using some kind of network filesystem (NFS) > or clustering filesystem (GFS,Lustre)? > yes, I need to take a look at these file systems. Sebastian From pengchengzou at gmail.com Wed Nov 29 09:34:19 2006 From: pengchengzou at gmail.com (Pengcheng Zou) Date: Wed, 29 Nov 2006 17:34:19 +0800 Subject: how to prevent filesystem check In-Reply-To: <20061129082916.E2A2A78C056@ogo.rapideye.de> References: <20061129082916.E2A2A78C056@ogo.rapideye.de> Message-ID: <24a313060611290134u7c4bb309vf6bc97cc34ebc57b@mail.gmail.com> noload is deprecated. :( On 11/29/06, Sebastian Reitenbach wrote: > Hi, > > "Pengcheng Zou" wrote: > > use 'noload' option to mount the readonly ext3 filesystem on the slave > > host, so the journal will not be loaded. > with the noload option, following output is shown at the mount > command: > mount: wrong fs type, bad option, bad superblock on /dev/sdb2, > missing codepage or other error > In some cases useful info is found in syslog - try > dmesg | tail or so > and in /var/log/messages: > Nov 29 09:17:17 srv3 kernel: ext3: No journal on filesystem on sdb2 > > as this is a valid option, the same fs type, must be a bad superblock? > is ther anything I can do about it? > > > > > BTW, this kind of setting could have some cache-coherence problem. why > > not do it correct way by using some kind of network filesystem (NFS) > > or clustering filesystem (GFS,Lustre)? > > > > yes, I need to take a look at these file systems. > > Sebastian > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > From tytso at mit.edu Wed Nov 29 14:02:45 2006 From: tytso at mit.edu (Theodore Tso) Date: Wed, 29 Nov 2006 09:02:45 -0500 Subject: how to prevent filesystem check In-Reply-To: <20061129052026.GA6429@schatzie.adilger.int> References: <20061128140551.A110E4087B2@ogo.rapideye.de> <8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com> <20061129052026.GA6429@schatzie.adilger.int> Message-ID: <20061129140245.GB5771@thunk.org> On Tue, Nov 28, 2006 at 09:20:26PM -0800, Andreas Dilger wrote: > This is a good reason for the multi-mount protection feature that I > proposed previously. It would mark the filesystem as in-use on one > node and the filesystem itself would refuse to mount on the second > node. Unfortunately, this idea met resistance from some of the > other ext3 developers from merging it upstream. The resistance was because it means we have to put what is effectively a cluster filesystem's distributed lock manager (DLM) just to tell users that "News flash! ext3 isn't a cluster filesystem" and then error-out the mount. Granted, it was a relatively simple cluster DLM, but that's what you effectively need, complete with issues surrounding heartbeats for liveness detection --- and since it was a simple cluster DLM, it didn't handle temporary connectivity failure since there was no STONITH (shoot-the-other-node-in-the-head) functionality. So it didn't even solve the problem completely. Still, if a lot of users are making this fundamental mistake of trying to use ext3 as a cluster filesystem, maybe we need to revisit this question, since hopefully once the user sees the error message they won't keep doing this. It doesn't stop them from wasting a lot of time trying to set up such a system only discover that they used the wrong tool in the first place, though. So this feels more like a documentation problem; but maybe it's worth it just as a backup to some kind of documentation telling users that they really want to be using OCFS2, GFS, GPFS, or some other cluster filesystem if they want to do something like this. - Ted From tytso at mit.edu Wed Nov 29 14:12:11 2006 From: tytso at mit.edu (Theodore Tso) Date: Wed, 29 Nov 2006 09:12:11 -0500 Subject: Best Practices for Data Recovery for corrupted EXT2/3? In-Reply-To: <200611281058.09288.tweeks@rackspace.com> References: <200611281058.09288.tweeks@rackspace.com> Message-ID: <20061129141211.GC5771@thunk.org> On Tue, Nov 28, 2006 at 10:58:09AM -0600, Thomas Weeks wrote: > > I had a bad IDE controller that hosed my EXT3 filesystems. A resulting fsck > damaged pat of the filesystem and the root inode is gone (on my main drive > AND the backup drive). I immediately DD'd the main drive over to an > identical drive that I have been working on. But every time.. a fsck > destroys all the data (moves everything to lost+found) and nothing that I've > found is able to restore the dir structure... or allow me to superposition > any of the subdirs (such as /home/*). Unfortunately, if the root inode is gone, you've lost the names of the inodes in the root directory. Usually though most of the inodes in the root directory are directories, and so the directory hierarchy is moved to lost+found. So if you see a directory /lost+found/#5612 that has files such as /lost+found/#5612/passwd and /lost+found/#5612/motd, then you could probably guess that #5612 was /etc, and you could then just move /lost+found/#5612 to /etc. Of course, if more of the filesystem than just the root directory is gone, then you may have lost more directory information and so things might be not be quite that simple to recover from. > I would like to retain file names.. as I see that SOME filename/dir structure > is intact when the fsck starts nuking all my files that don't have a parent > dir (e.g. ../home/user/file1 --> lost+found). Is there a way that this > information can be salvaged? Or a new fake root inode be slid into place and > all the links associated? It's not that fsck is nuking the names --- the names were gone from your hardware corruption. Fsck is seeing that inode doesn't have a name, which is why it is moving it to /lost+found so you at least don't lose the data. Please don't blame fsck; it's doing the best job that it can! > My last ditch effort will be to allow the migration to lost+found and then try > to copy off files based on UID/GID/date, but I would really like to retain > file names. Sorry, the file names are gone; if they were there, fsck would have used them. If you have a valid locatedb database, you might be able to use that to help reconstruct the filenames, and of course as a responsible sysadmin you have been doing regular backups (RIGHT? :-), so you could use that information as well. Regards and good luck, - Ted From lists at nerdbynature.de Wed Nov 29 22:56:49 2006 From: lists at nerdbynature.de (Christian Kujau) Date: Wed, 29 Nov 2006 22:56:49 +0000 (GMT) Subject: how to prevent filesystem check In-Reply-To: <20061129052026.GA6429@schatzie.adilger.int> References: <20061128140551.A110E4087B2@ogo.rapideye.de> <8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com> <20061129052026.GA6429@schatzie.adilger.int> Message-ID: On Tue, 28 Nov 2006, Andreas Dilger wrote: > You should just mount the filesystem on the client via NFS, that's > what it's SUPPOSED to do. Would a "-o bind" mount suffice too? That way one does not need to setup NFS, not to mention the network overhead this solution might have. Like: # mount -t ext3 /dev/sdb1 /home # mount -o bind,ro /home /mnt/ I just tested this: /dev/sdb1 did not get altered with the bind-mount. C. -- BOFH excuse #201: RPC_PMAP_FAILURE From richard.c.wolber at boeing.com Thu Nov 30 00:41:38 2006 From: richard.c.wolber at boeing.com (Wolber, Richard C) Date: Wed, 29 Nov 2006 16:41:38 -0800 Subject: how to prevent filesystem check In-Reply-To: <20061129052026.GA6429@schatzie.adilger.int> Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB9902426714@XCH-NW-5V2.nw.nos.boeing.com> On Nov 28, 2006 9:20pm Andreas Dilger wrote: > > On Nov 28, 2006 12:33 -0800, Wolber, Richard C wrote: > > Running the following command on your slave server should > > do the trick: > > > > echo "AUTOFSCK_DEF_CHECK=\"no\"" >> /etc/sysconfig/autofsck > > This is incorrect. As soon as the ext3 code mounts the > filesystem it will do journal recovery and potentially > corrupt the filesystem. > Then, the read-only copy will become out-of-date in the cache > of that client and it will get bogus data back, eventually > deciding that the filesystem is corrupt (whether it is or not). Even if you "mount -oro -text2 $DEV $DIR"? ..Chuck.. From adilger at clusterfs.com Thu Nov 30 05:53:34 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Wed, 29 Nov 2006 21:53:34 -0800 Subject: how to prevent filesystem check In-Reply-To: <8C7C41A176AC0B468BEFB2EFD9BDAB9902426714@XCH-NW-5V2.nw.nos.boeing.com> References: <20061129052026.GA6429@schatzie.adilger.int> <8C7C41A176AC0B468BEFB2EFD9BDAB9902426714@XCH-NW-5V2.nw.nos.boeing.com> Message-ID: <20061130055334.GF6429@schatzie.adilger.int> On Nov 29, 2006 16:41 -0800, Wolber, Richard C wrote: > On Nov 28, 2006 9:20pm Andreas Dilger wrote: > > On Nov 28, 2006 12:33 -0800, Wolber, Richard C wrote: > > > Running the following command on your slave server should > > > do the trick: > > > > > > echo "AUTOFSCK_DEF_CHECK=\"no\"" >> /etc/sysconfig/autofsck > > > > This is incorrect. As soon as the ext3 code mounts the > > filesystem it will do journal recovery and potentially > > corrupt the filesystem. > > Then, the read-only copy will become out-of-date in the cache > > of that client and it will get bogus data back, eventually > > deciding that the filesystem is corrupt (whether it is or not). > > Even if you "mount -oro -text2 $DEV $DIR"? Even then, yes. It is NOT SAFE to access the same block device on multiple nodes at one time. Even with "-o ro" the mount will cause the journal to be recovered. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From adilger at clusterfs.com Thu Nov 30 06:54:01 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Wed, 29 Nov 2006 22:54:01 -0800 Subject: how to prevent filesystem check In-Reply-To: <20061129140245.GB5771@thunk.org> References: <20061128140551.A110E4087B2@ogo.rapideye.de> <8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com> <20061129052026.GA6429@schatzie.adilger.int> <20061129140245.GB5771@thunk.org> Message-ID: <20061130065401.GH6429@schatzie.adilger.int> On Nov 29, 2006 09:02 -0500, Theodore Tso wrote: > On Tue, Nov 28, 2006 at 09:20:26PM -0800, Andreas Dilger wrote: > > This is a good reason for the multi-mount protection feature that I > > proposed previously. It would mark the filesystem as in-use on one > > node and the filesystem itself would refuse to mount on the second > > node. Unfortunately, this idea met resistance from some of the > > other ext3 developers from merging it upstream. > > The resistance was because it means we have to put what is effectively > a cluster filesystem's distributed lock manager (DLM) just to tell > users that "News flash! ext3 isn't a cluster filesystem" and then > error-out the mount. Granted, it was a relatively simple cluster DLM, > but that's what you effectively need, complete with issues surrounding > heartbeats for liveness detection --- and since it was a simple > cluster DLM, it didn't handle temporary connectivity failure since > there was no STONITH (shoot-the-other-node-in-the-head) functionality. > So it didn't even solve the problem completely. I agree that the proposed MMP code is by no means a 100% solution, and is not intended to replace HA + STONITH. Rather, it is intended to handle the "oops, HA is broken, admin set it up incorrectly, FC routing broke, SCSI devices were renamed, etc" kind of issues. > Still, if a lot of users are making this fundamental mistake of trying > to use ext3 as a cluster filesystem, maybe we need to revisit this > question, since hopefully once the user sees the error message they > won't keep doing this. The only reason I raised this again was because this "mount ext2/3 on two nodes, one being read-only" is a fairly common thing for users to try and it really deserves some kind of attention. The ability to have multi-host block devices is only increasing I think, especially in server-type environments with FC, IB, ISCSI, etc. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From tytso at mit.edu Thu Nov 30 20:36:45 2006 From: tytso at mit.edu (Theodore Tso) Date: Thu, 30 Nov 2006 15:36:45 -0500 Subject: how to prevent filesystem check In-Reply-To: <20061130055334.GF6429@schatzie.adilger.int> References: <20061129052026.GA6429@schatzie.adilger.int> <8C7C41A176AC0B468BEFB2EFD9BDAB9902426714@XCH-NW-5V2.nw.nos.boeing.com> <20061130055334.GF6429@schatzie.adilger.int> Message-ID: <20061130203645.GA24959@thunk.org> On Wed, Nov 29, 2006 at 09:53:34PM -0800, Andreas Dilger wrote: > > Even if you "mount -oro -text2 $DEV $DIR"? > > Even then, yes. It is NOT SAFE to access the same block device on > multiple nodes at one time. Even with "-o ro" the mount will cause > the journal to be recovered. And even with a ext2 filesystem, as the filesystem changes out from under the kernel (as the system that has the filesystem mounted read/write makes changes), the system that has the filesystem mounted read-only will have see inconsistencies caused by some blocks being cached and some blocks being not cached, and this could result in security violations (when blocks containing another user are read by another non-privileged users) and possibly kernel panics. The only safe way to mount a block device in a shared mode where one or more of the systems have the shared block device mounted read/write is to use a cluster-aware filesystem, such as GFS, OCFS2, or GPFS. - Ted