From menscher at uiuc.edu Wed Mar 1 16:26:23 2006 From: menscher at uiuc.edu (Damian Menscher) Date: Wed, 1 Mar 2006 10:26:23 -0600 (CST) Subject: Status of fragment support, advantages of having fewer indoes In-Reply-To: References: Message-ID: On Tue, 28 Feb 2006, Michael Renner wrote: > My second question is regarding the bytes-per-inode ratio: What > benefits would I gain from having fewer inodes? I reckon it's only > diskspace (if so, how much?). Whenever you have to fsck, it needs to scan all inodes. So you can reduce fsck times by having fewer inodes. This isn't usually a problem on small partitions, but when you get into the TB range it can be really annoying to wait hours for a fsck. Damian Menscher -- -=#| www.uiuc.edu/~menscher/ Ofc:(650)253-2757 |#=- -=#| The above opinions are not necessarily those of my employers. |#=- From adilger at clusterfs.com Wed Mar 1 17:12:16 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Wed, 1 Mar 2006 10:12:16 -0700 Subject: Status of fragment support, advantages of having fewer indoes In-Reply-To: References: Message-ID: <20060301171216.GW26809@schatzie.adilger.int> On Feb 28, 2006 23:33 +0000, Michael Renner wrote: > There wasn't much information regarding fragment support of ext2/3 since 2003 > [1], Andreas stating that there were problems with the xattr implementation. > Has this changed in the meanwhile? > > [1] http://www.kerneltraffic.org/kernel-traffic/kt20030428_214.html#8 Still the same story. I don't think fragment support will be implemented until there is some common architecture with large pages (or other fundamental page cache rework), where it makes sense to have large (say 64kB) pages, and fragments for smaller files. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From niam.tni at googlemail.com Wed Mar 8 18:40:42 2006 From: niam.tni at googlemail.com (Stefan Drexleri) Date: Wed, 8 Mar 2006 19:40:42 +0100 Subject: Why does file's inode change? Message-ID: <35ce7f5b0603081040y1000c5e2s@mail.gmail.com> Hi, perhaps dump question: I'm using ids system and doing file monitoring. If any file changed, i see it. Recently i saw inode change of /etc/shadow. One user was added. So what might be reason for inode change of any file? Does inode size grow by-and-by? I think not. Or might only reasonable cause be that this file was deleted and recreated again? regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From adilger at clusterfs.com Wed Mar 8 18:54:26 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Wed, 8 Mar 2006 11:54:26 -0700 Subject: Why does file's inode change? In-Reply-To: <35ce7f5b0603081040y1000c5e2s@mail.gmail.com> References: <35ce7f5b0603081040y1000c5e2s@mail.gmail.com> Message-ID: <20060308185426.GF14934@schatzie.adilger.int> On Mar 08, 2006 19:40 +0100, Stefan Drexleri wrote: > perhaps dump question: > > I'm using ids system and doing file monitoring. > If any file changed, i see it. > Recently i saw inode change of /etc/shadow. > One user was added. > > So what might be reason for inode change of any file? Does inode size grow > by-and-by? I think not. Or might only reasonable cause be that this file was > deleted and recreated again? Many tools will do updates in a new copy of the file and then rename the new file over the old one. This ensures that if the program or system crash during the update that there isn't a partial update to the file that leaves corruption in the file. You can verify this by looking at the code, or running "strace useradd" (or whatever program is being used). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From vineet.chadha at gmail.com Thu Mar 9 15:34:02 2006 From: vineet.chadha at gmail.com (Vineet Chadha) Date: Thu, 9 Mar 2006 07:34:02 -0800 (PST) Subject: in-core inode Message-ID: <15480775.4131141918442255.JavaMail.nabble@talk.nabble.com> hi, My understanding of Linux OS for allocating and deallocating goes as follows: OS allocates an inode as needed by process and once process dies it put the inode in free list as means of performance. My question is that how can i get list of FREE in-core inodes. thanks vc -- View this message in context: http://www.nabble.com/in-core-inode-t1253447.html#a3321977 Sent from the Ext3 - User forum at Nabble.com. From preining at logic.at Tue Mar 14 15:18:15 2006 From: preining at logic.at (Norbert Preining) Date: Tue, 14 Mar 2006 16:18:15 +0100 Subject: inode iblocks count changes by -8 Message-ID: <20060314151815.GY7600@gamma.logic.tuwien.ac.at> Hi all! Several times now I have seen the following: fsck-ing the ext3 fs I get hundreds (if not thousands) of messages: Indoe NNNN, i_blocks is K, should be (K-8). FIXED: Interestingly the difference is *always* *always* *always* 8. Can someone explain me what was going on? Why this? Thanks a lot and all the best Norbert ------------------------------------------------------------------------------- Dr. Norbert Preining Universit? di Siena gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094 ------------------------------------------------------------------------------- PUDSEY (n.) The curious-shaped flat wads of dough left on a kitchen table after someone has been cutting scones out of it. --- Douglas Adams, The Meaning of Liff From adilger at clusterfs.com Fri Mar 17 07:53:12 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Fri, 17 Mar 2006 00:53:12 -0700 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default Message-ID: <20060317075312.GG30801@schatzie.adilger.int> I've been thinking recently that we should re-enable DIR_INDEX in mke2fs by default. When it first came out, we had done this and were bitten by a few bugs in the code. However, this code has been in heavy use for several thousand filesystem years in Lustre, if not elsewhere, and I'm inclined to think it is pretty safe these days. Likewise, RHEL/FC have had RESIZE_INODE as a standard feature for a good time now, and this should probably be merged into stock e2fsprogs. Comments? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From sct at redhat.com Fri Mar 17 22:26:57 2006 From: sct at redhat.com (Stephen C. Tweedie) Date: Fri, 17 Mar 2006 17:26:57 -0500 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060317075312.GG30801@schatzie.adilger.int> References: <20060317075312.GG30801@schatzie.adilger.int> Message-ID: <1142634418.3641.62.camel@orbit.scot.redhat.com> Hi, On Fri, 2006-03-17 at 00:53 -0700, Andreas Dilger wrote: > I've been thinking recently that we should re-enable DIR_INDEX in mke2fs > by default. When it first came out, we had done this and were bitten by > a few bugs in the code. However, this code has been in heavy use for > several thousand filesystem years in Lustre, if not elsewhere, and I'm > inclined to think it is pretty safe these days. I reckon they are safe enough for general use. The only question mark in my mind is over the change in behaviour for people who dual-boot or swap data between newer and older distros. One way around that that I've been wondering about would be to wait until we have accumulated enough new features (extent maps/64-bit, increase the default inode size etc.) and give the new feature set its own explicit flag in mke2fs. It might be something we could call ext4 (ie. enable it if mke2fs is called as "mke4fs"); we might just add a separate flag. Whatever way we chose, it would want to be something that would stand out as an obviously non-backwards-compatible formatting option. If we do want to do that, then there's less reason to want to enable small bits of that feature set in a piece-meal fashion. --Stephen From akpm at osdl.org Fri Mar 17 22:36:30 2006 From: akpm at osdl.org (Andrew Morton) Date: Fri, 17 Mar 2006 14:36:30 -0800 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <1142634418.3641.62.camel@orbit.scot.redhat.com> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> Message-ID: <20060317143630.300d82f8.akpm@osdl.org> "Stephen C. Tweedie" wrote: > > On Fri, 2006-03-17 at 00:53 -0700, Andreas Dilger wrote: > > I've been thinking recently that we should re-enable DIR_INDEX in mke2fs > > by default. When it first came out, we had done this and were bitten by > > a few bugs in the code. However, this code has been in heavy use for > > several thousand filesystem years in Lustre, if not elsewhere, and I'm > > inclined to think it is pretty safe these days. > > I reckon they are safe enough for general use. Who maintains the CONFIG_EXT3_INDEX code? From tytso at mit.edu Sat Mar 18 00:16:29 2006 From: tytso at mit.edu (Theodore Ts'o) Date: Fri, 17 Mar 2006 19:16:29 -0500 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060317143630.300d82f8.akpm@osdl.org> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> Message-ID: <20060318001629.GC17074@thunk.org> On Fri, Mar 17, 2006 at 02:36:30PM -0800, Andrew Morton wrote: > "Stephen C. Tweedie" wrote: > > > > On Fri, 2006-03-17 at 00:53 -0700, Andreas Dilger wrote: > > > I've been thinking recently that we should re-enable DIR_INDEX in mke2fs > > > by default. When it first came out, we had done this and were bitten by > > > a few bugs in the code. However, this code has been in heavy use for > > > several thousand filesystem years in Lustre, if not elsewhere, and I'm > > > inclined to think it is pretty safe these days. > > > > I reckon they are safe enough for general use. > > Who maintains the CONFIG_EXT3_INDEX code? Well, I rewrote most of the code before getting them merged in 2.5, and implemented the e2fsprogs support for the same, so that would be probably be me, under the "last person who touched it" theory. That code has been pretty stable with very few bugs reported, so it hasn't needed an active maintainer, really. If there are any problems reported, feel free to send them my way. - Ted From akpm at osdl.org Sat Mar 18 00:32:34 2006 From: akpm at osdl.org (Andrew Morton) Date: Fri, 17 Mar 2006 16:32:34 -0800 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060318001629.GC17074@thunk.org> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> <20060318001629.GC17074@thunk.org> Message-ID: <20060317163234.74791ce9.akpm@osdl.org> "Theodore Ts'o" wrote: > > On Fri, Mar 17, 2006 at 02:36:30PM -0800, Andrew Morton wrote: > > "Stephen C. Tweedie" wrote: > > > > > > On Fri, 2006-03-17 at 00:53 -0700, Andreas Dilger wrote: > > > > I've been thinking recently that we should re-enable DIR_INDEX in mke2fs > > > > by default. When it first came out, we had done this and were bitten by > > > > a few bugs in the code. However, this code has been in heavy use for > > > > several thousand filesystem years in Lustre, if not elsewhere, and I'm > > > > inclined to think it is pretty safe these days. > > > > > > I reckon they are safe enough for general use. > > > > Who maintains the CONFIG_EXT3_INDEX code? > > Well, I rewrote most of the code before getting them merged in 2.5, > and implemented the e2fsprogs support for the same, so that would be > probably be me, under the "last person who touched it" theory. That > code has been pretty stable with very few bugs reported, so it hasn't > needed an active maintainer, really. If there are any problems > reported, feel free to send them my way. > OK, thanks. btw, I have some directory readahead rework queued for 2.6.17 (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc6/2.6.16-rc6-mm1/broken-out/ext3_readdir-use-generic-readahead.patch). That's non-htree-only. Is there any sane way of doing htree directory readahead? From adilger at clusterfs.com Sat Mar 18 08:36:30 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Sat, 18 Mar 2006 01:36:30 -0700 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <1142634418.3641.62.camel@orbit.scot.redhat.com> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> Message-ID: <20060318083630.GW30801@schatzie.adilger.int> On Mar 17, 2006 17:26 -0500, Stephen C. Tweedie wrote: > I reckon they are safe enough for general use. The only question mark > in my mind is over the change in behaviour for people who dual-boot or > swap data between newer and older distros. Well, both DIR_INDEX and RESIZE_INODE are COMPAT features, so if they are dual booting it shouldn't matter a bit. FC3/FC4/RHEL4 have all been using RESIZE_INODE and it is very unlikely that it would introduce a bug. The only thing I can imagine it causing problems with is if the e2fsprogs are old, but even then it isn't fatal as the filesystem is still writable and an updated e2fsck can be installed if needed. > One way around that that I've been wondering about would be to wait > until we have accumulated enough new features (extent maps/64-bit, > increase the default inode size etc.) and give the new feature set its > own explicit flag in mke2fs. IMHO, "bundling" changes like this is counter productive. We've had DIR_INDEX and RESIZE_INODE around for a long time already, and no point in lumping them in with code that is much less-well tested or supported (e.g. only 2.6.something kernels can even mount large inodes, which is a far cry from being read-write compatible). Also, if there ends up being a problem there is a lot more code to look at for changes. I'm more a fan of smaller incremental changes. > It might be something we could call ext4 (ie. enable it if mke2fs is > called as "mke4fs"); we might just add a separate flag. Whatever way > we chose, it would want to be something that would stand out as an > obviously non-backwards-compatible formatting option. Interesting. I thought we were against calling new feature sets ext(n+1), but that would be one way of doing it. I'm also not sure why you would consider these two features as non-backwards-compatible format options? For sure, they would only be enabled in conjunction with "-j", since ext2 doesn't support them. At worst they will be ignored by older kernels. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From adilger at clusterfs.com Sat Mar 18 08:43:02 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Sat, 18 Mar 2006 01:43:02 -0700 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060317143630.300d82f8.akpm@osdl.org> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> Message-ID: <20060318084302.GX30801@schatzie.adilger.int> On Mar 17, 2006 14:36 -0800, Andrew Morton wrote: > "Stephen C. Tweedie" wrote: > > On Fri, 2006-03-17 at 00:53 -0700, Andreas Dilger wrote: > > > I've been thinking recently that we should re-enable DIR_INDEX in mke2fs > > > by default. When it first came out, we had done this and were bitten by > > > a few bugs in the code. However, this code has been in heavy use for > > > several thousand filesystem years in Lustre, if not elsewhere, and I'm > > > inclined to think it is pretty safe these days. > > > > I reckon they are safe enough for general use. > > Who maintains the CONFIG_EXT3_INDEX code? CFS made a bunch of fixes to it a few years ago, when it wasn't in either the 2.4 or 2.6 kernels. A few times in the intervening years there have been problems fixed, but nothing since then. The directory structure (or lack thereof) of the Lustre storage servers (OSTs) is such that they will always have tens to hundreds of thousands of files in each directory, so the code is definitely well-tested. We don't modify the htree code at all in either 2.6.9-RHEL or 2.6.12, only 2.6.5 SLES9 has a small patch for "." and ".." lookup. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From porton at ex-code.com Sat Mar 18 09:22:11 2006 From: porton at ex-code.com (Victor Porton,,,) Date: Sat, 18 Mar 2006 14:22:11 +0500 Subject: Bug: ./ != $PWD Message-ID: Hi, I can't tell the reason and conditions to repeat this bug... "ls" and "ls $PWD" produce different directory listing (of different times, so a subdir is existing in one listing and missing in an other). ls . $PWD Linux 2.6.15.6 GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu) Linux 2.6.15.6 PREEMPT i686 /dev/hda5 on / type ext3 (rw,errors=remount-ro,data=ordered,commit=60,quota) A spirit junklies: This bug was noted by me during development of a directory listing cache. (It cannot be the cause of the bug because I have not yet even compiled this library :-).) Anyway, maybe it is caused by hibernation... I will reboot. -- Victor Porton - http://porton.ex-code.com From adilger at clusterfs.com Sat Mar 18 10:16:15 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Sat, 18 Mar 2006 03:16:15 -0700 Subject: Bug: ./ != $PWD In-Reply-To: References: Message-ID: <20060318101615.GA30801@schatzie.adilger.int> On Mar 18, 2006 14:22 +0500, Victor Porton,,, wrote: > I can't tell the reason and conditions to repeat this bug... > > "ls" and "ls $PWD" produce different directory listing (of > different times, so a subdir is existing in one listing and > missing in an other). > > ls . $PWD This can happen if your directory is renamed and recreated: $ mkdir /tmp/foo $ pushd /tmp/foo $ echo $PWD /tmp/foo $ mv /tmp/foo /tmp/bar $ echo $PWD /tmp/foo $ mkdir /tmp/foo touch bar touch $PWD/foo ]$ ls . $PWD .: total 0 0 bar /tmp/foo: total 0 0 foo Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From aspanke at gmail.com Sat Mar 18 19:02:51 2006 From: aspanke at gmail.com (Malcom) Date: Sat, 18 Mar 2006 20:02:51 +0100 Subject: ext3 - max filesystem size In-Reply-To: <441C5926.20804@t-online.de> References: <441C5926.20804@t-online.de> Message-ID: <441C595B.7070505@gmail.com> Hi all, I am working with a pc cluster, running redhat el 4, on opteron cpus. we have several bigger RAID systems locally attached to the fileservers; now I would like to create a big striped filesystem with around 15TB. ext3 unfortunatelly only supports filesystem size up to 8TB, do you have an idea if / when this border will be increased ? I already found some discussions on LKML about it ? Which FS would be a goof alternative ? AFAIK xfs is not suported by redhat el 4 ... thanks for any hint, cheers alex From tytso at mit.edu Sat Mar 18 22:54:33 2006 From: tytso at mit.edu (Theodore Ts'o) Date: Sat, 18 Mar 2006 17:54:33 -0500 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060317163234.74791ce9.akpm@osdl.org> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> <20060318001629.GC17074@thunk.org> <20060317163234.74791ce9.akpm@osdl.org> Message-ID: <20060318225433.GJ21232@thunk.org> On Fri, Mar 17, 2006 at 04:32:34PM -0800, Andrew Morton wrote: > > btw, I have some directory readahead rework queued for 2.6.17 > (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc6/2.6.16-rc6-mm1/broken-out/ext3_readdir-use-generic-readahead.patch). > > That's non-htree-only. Is there any sane way of doing htree directory > readahead? Not really, since the htree readdir() accesses directory blocks in hash tree order. We could do a speculative read if we had kernel infrastructure to determine whether or not we had spare disk bandwidth, and only do the speculative readahead if we didn't have more important blocks to read, but it's not going to buy us much in terms of sequential readhead. In general, I doubt directory readahead actually buys you *that* much, because most workloads follow up the readdir with a stat() or an open() call for each file returned, or something which requires reading in the inode. In addition, it's rare that the directory will be contiguously allocated, which also cuts down on the value of the readahead. What we could do that would accelerate readdir() for htree would be to build an entirely separate tree keyed by inode number, and let readdir iterate on that structure. That would return files sorted by inode number, which would speed up the readdir/stat or readdir/open workload. That could be done as a COMPAT extension, at the cost of doubling the amount of space required to store a directory, and doubling the cost of adding or deleting an entry to that directory. It's not something that you would want to do for all directories, in all likelihood, but for certain application-specific directory structures, it would definitely speed things up. - Ted From tytso at mit.edu Mon Mar 20 02:24:10 2006 From: tytso at mit.edu (Theodore Ts'o) Date: Sun, 19 Mar 2006 21:24:10 -0500 Subject: [Ext2-devel] Re: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <1142762366.3018.11.camel@laptopd505.fenrus.org> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> <20060318001629.GC17074@thunk.org> <20060317163234.74791ce9.akpm@osdl.org> <20060318225433.GJ21232@thunk.org> <1142762366.3018.11.camel@laptopd505.fenrus.org> Message-ID: <20060320022410.GC17337@thunk.org> On Sun, Mar 19, 2006 at 10:59:26AM +0100, Arjan van de Ven wrote: > > > In general, I doubt directory readahead actually buys you *that* much, > > because most workloads follow up the readdir with a stat() or an > > open() call for each file returned, or something which requires > > reading in the inode. In addition, it's rare that the directory will > > be contiguously allocated, which also cuts down on the value of the > > readahead. > > what it buys you shows up on raid1 I suppose... while disk 0 is doing > the head seek for the stat/open, disk1 is doing the head seek for the > next directory entry.... so I can see that being a big gain in that > specific circumstance. > As long as you have the spare disk bandwifth and disk1 isn't needed reading the directory entry for some other make process as part of the "make -j16" kernel build... - Ted From sct at redhat.com Mon Mar 20 18:19:46 2006 From: sct at redhat.com (Stephen C. Tweedie) Date: Mon, 20 Mar 2006 13:19:46 -0500 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060318084302.GX30801@schatzie.adilger.int> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> <20060318084302.GX30801@schatzie.adilger.int> Message-ID: <1142878786.3414.27.camel@orbit.scot.redhat.com> On Sat, 2006-03-18 at 01:43 -0700, Andreas Dilger wrote: > We don't modify the htree code at all in either 2.6.9-RHEL or 2.6.12, only > 2.6.5 SLES9 has a small patch for "." and ".." lookup. Actually, RHEL-4 (based off 2.6.9) has needed one htree-related fix: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acfa1823d33859b0db77701726c9ca5ccc6e6f25 which affects NFS lookup-parent only, so Lustre probably wouldn't need it. That was the last one I'm aware of; it has been robust other than that, and the RHEL and Fedora installers have been doing tune2fs -O dir_index to all filesystems by default since RHEL-4 / FC-2 (ie. for over 18 months,) so it's definitely well tested here. --Stephen From sct at redhat.com Mon Mar 20 20:55:10 2006 From: sct at redhat.com (Stephen C. Tweedie) Date: Mon, 20 Mar 2006 15:55:10 -0500 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060318083630.GW30801@schatzie.adilger.int> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060318083630.GW30801@schatzie.adilger.int> Message-ID: <1142888111.21593.29.camel@orbit.scot.redhat.com> HI, On Sat, 2006-03-18 at 01:36 -0700, Andreas Dilger wrote: > On Mar 17, 2006 17:26 -0500, Stephen C. Tweedie wrote: > > I reckon they are safe enough for general use. The only question mark > > in my mind is over the change in behaviour for people who dual-boot or > > swap data between newer and older distros. > Well, both DIR_INDEX and RESIZE_INODE are COMPAT features, so if they > are dual booting it shouldn't matter a bit. FC3/FC4/RHEL4 have all > been using RESIZE_INODE and it is very unlikely that it would introduce > a bug. The only thing I can imagine it causing problems with is if the > e2fsprogs are old That's exactly the case I'm thinking about, because we already see complaints when newly-created filesystems cause older systems to fail to boot --- e2fsck is run by default at boot time, and the default behaviour (at least on RHEL/Fedora systems) is to drop to a (password-protected) root prompt if fsck fails. Note that the filesystem itself isn't the only reason for such boot incompatibilities, either; the new MLS support in SELinux resulted in file security contexts that older kernels can't decode, for example. But I'd rather not make it worse. > > One way around that that I've been wondering about would be to wait > > until we have accumulated enough new features (extent maps/64-bit, > > increase the default inode size etc.) and give the new feature set its > > own explicit flag in mke2fs. > > IMHO, "bundling" changes like this is counter productive. We've had > DIR_INDEX and RESIZE_INODE around for a long time already, and no point > in lumping them in with code that is much less-well tested or supported > (e.g. only 2.6.something kernels can even mount large inodes, which is > a far cry from being read-write compatible). Bundling them like that only makes sense when we've got to the point where we've been able to merge a significant portion of the outstanding INCOMPAT changes, agreed. But if that's something that is achievable reasonably soon, it would alleviate some of the pressure to change the existing defaults. > > It might be something we could call ext4 (ie. enable it if mke2fs is > > called as "mke4fs"); we might just add a separate flag. Whatever way > > we chose, it would want to be something that would stand out as an > > obviously non-backwards-compatible formatting option. > > Interesting. I thought we were against calling new feature sets ext(n+1), > but that would be one way of doing it. I'm against giving the kernel itself any knowledge about it, simply because the existing feature sets are both sufficient and way more flexible than any major revision number. But from the user's perspective, once we've merged lots of incompat changes and we think this new extended format is going to be stable for a bit, a single flag to switch between old (maximal compatibility, no new incompat features) and new (give me _all_ the new features, I know I won't be using an old kernel) makes sense. (Nearly all of the incompat features on the table do make sense to have on by default. Perhaps the most questionable one is larger inodes, because of the tradeoff between functionality and capacity that that implies; but if people really start relying on fast xattrs for samba4, fine-grained timestamps etc., then even that one might make sense to be on by default in the new feature-set.) > I'm also not sure why you would > consider these two features as non-backwards-compatible format options? It's purely because of the fsck boot-time compatibility concern; there are absolutely no kernel compatibility problems I'm aware of. And the fsck-time bits can be avoided by removing the feature with tune2fs and fsck, so it's not something I'm dead set against. --Stephen From adilger at clusterfs.com Mon Mar 20 21:14:01 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Mon, 20 Mar 2006 14:14:01 -0700 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <1142878786.3414.27.camel@orbit.scot.redhat.com> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> <20060318084302.GX30801@schatzie.adilger.int> <1142878786.3414.27.camel@orbit.scot.redhat.com> Message-ID: <20060320211401.GG6199@schatzie.adilger.int> On Mar 20, 2006 13:19 -0500, Stephen C. Tweedie wrote: > On Sat, 2006-03-18 at 01:43 -0700, Andreas Dilger wrote: > > We don't modify the htree code at all in either 2.6.9-RHEL or 2.6.12, only > > 2.6.5 SLES9 has a small patch for "." and ".." lookup. > > Actually, RHEL-4 (based off 2.6.9) has needed one htree-related fix: > > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acfa1823d33859b0db77701726c9ca5ccc6e6f25 > > which affects NFS lookup-parent only, so Lustre probably wouldn't need > it. That's why we don't need to patch 2.6.9-RHEL4 :-). > That was the last one I'm aware of; it has been robust other than that, > and the RHEL and Fedora installers have been doing tune2fs -O dir_index > to all filesystems by default since RHEL-4 / FC-2 (ie. for over 18 > months,) so it's definitely well tested here. So, would you also be in favour of making these two options the default for the next e2fsprogs? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From sct at redhat.com Mon Mar 20 21:58:57 2006 From: sct at redhat.com (Stephen C. Tweedie) Date: Mon, 20 Mar 2006 16:58:57 -0500 Subject: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060320211401.GG6199@schatzie.adilger.int> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> <20060318084302.GX30801@schatzie.adilger.int> <1142878786.3414.27.camel@orbit.scot.redhat.com> <20060320211401.GG6199@schatzie.adilger.int> Message-ID: <1142891937.21593.47.camel@orbit.scot.redhat.com> Hi, On Mon, 2006-03-20 at 14:14 -0700, Andreas Dilger wrote: > > That was the last one I'm aware of; it has been robust other than that, > > and the RHEL and Fedora installers have been doing tune2fs -O dir_index > > to all filesystems by default since RHEL-4 / FC-2 (ie. for over 18 > > months,) so it's definitely well tested here. > > So, would you also be in favour of making these two options the default for > the next e2fsprogs? I think we're probably at the right point to do so. Most people who are most likely to be affected have a reasonably recent e2fsprogs now. On the Fedora side I'm seeing very few reports of people bitten by e2fsprogs incompatibility, and more and more instances of people bitten the other way by filesystems not performing as well as expected due to missing dir_index flags. --Stephen From worleys at gmail.com Mon Mar 20 23:27:22 2006 From: worleys at gmail.com (Chris Worley) Date: Mon, 20 Mar 2006 16:27:22 -0700 Subject: fixing a corrupt /dev/hdar .. debugfs assistance... Message-ID: I used ddrescue to copy /dev/md1 to a disk of sufficient size, and re-ran e2fsck, and still get the error message that there's no root file system (I've tried most every superblock): # fsck -y -b 7962624 /dev/sdf fsck 1.36 (05-Feb-2005) e2fsck 1.36 (05-Feb-2005) Superblock has a bad ext3 journal (inode 8). Clear? yes *** ext3 journal has been deleted - filesystem is now ext2 only *** /dev/sdf was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Missing '..' in directory inode 1785876. Fix? yes Entry '..' in ... (1785876) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 1785969. Fix? yes Entry '..' in ... (1785969) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 11436033. Fix? yes Entry '..' in ... (11436033) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 23953409. Fix? yes Entry '..' in ... (23953409) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 24281345. Fix? yes Entry '..' in ... (24281345) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 24692015. Fix? yes Entry '..' in ... (24692015) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25214978. Fix? yes Entry '..' in ... (25214978) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25214995. Fix? yes Entry '..' in ... (25214995) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25214998. Fix? yes Entry '..' in ... (25214998) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215001. Fix? yes Entry '..' in ... (25215001) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215004. Fix? yes Entry '..' in ... (25215004) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215036. Fix? yes Entry '..' in ... (25215036) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215040. Fix? yes Entry '..' in ... (25215040) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215043. Fix? yes Entry '..' in ... (25215043) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215101. Fix? yes Entry '..' in ... (25215101) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215104. Fix? yes Entry '..' in ... (25215104) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215154. Fix? yes Entry '..' in ... (25215154) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215157. Fix? yes Entry '..' in ... (25215157) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215160. Fix? yes Entry '..' in ... (25215160) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215178. Fix? yes Entry '..' in ... (25215178) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215197. Fix? yes Entry '..' in ... (25215197) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215200. Fix? yes Entry '..' in ... (25215200) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215222. Fix? yes Entry '..' in ... (25215222) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215472. Fix? yes Entry '..' in ... (25215472) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215475. Fix? yes Entry '..' in ... (25215475) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215514. Fix? yes Entry '..' in ... (25215514) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215524. Fix? yes Entry '..' in ... (25215524) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25215528. Fix? yes Entry '..' in ... (25215528) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25346149. Fix? yes Entry '..' in ... (25346149) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25346160. Fix? yes Entry '..' in ... (25346160) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25346163. Fix? yes Entry '..' in ... (25346163) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25346173. Fix? yes Entry '..' in ... (25346173) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25346190. Fix? yes Entry '..' in ... (25346190) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25346197. Fix? yes Entry '..' in ... (25346197) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 25346205. Fix? yes Entry '..' in ... (25346205) points to inode (2) located in a bad block. Clear? yes Missing '..' in directory inode 26460281. Fix? yes Entry '..' in ... (26460281) points to inode (2) located in a bad block. Clear? yes Filesystem contains large files, but lacks LARGE_FILE flag in superblock. Fix? yes Pass 3: Checking directory connectivity Root inode is not a directory; aborting. e2fsck: aborted fsck.ext2 /dev/sdf failed (status 0x8). Run manually! Any ideas how to recover? Thanks, Chris ----------------------------------------------- From: "Theodore Ts'o" To: Mike Moran Cc: ext3-users redhat com Subject: Re: fixing a corrupt /dev/hdar .. debugfs assistance... Date: Mon, 28 Jul 2003 16:53:07 -0400 ________________________________ On Mon, Jul 28, 2003 at 10:42:04AM -0500, Mike Moran wrote: > Had a drive crash which a very critital inode. e2fshck > returns a stream of : > > Entry '..' in ... (#######) points to inode (2) located in a bad block > > Pass 3: Checking directory connectivity > Root inode is not a directory: aborting > e2fsck: aborted > > /dev/hda4: ***** FILE SYSTEM WAS MODIFIED ***** > Floating point exception > > I'm wondering if I can use debugfs to relocate (rebuild) inode (2) a > good block. If so, how would I go about doing it There is a bad block at the beginning of the inode table. Reconstructing the root inode is easy, but the problem is that the root inode must be at a fixed location, and you currently have a bad block located there. It might be possible to force the disk drive to use another block from its spare pool, but very often one bad block is just a prelude to another, and the value of the disk drive (< $200) is often in the noise compared to the value of the data stored on the disk drive (priceless). So the safest approach would be get another disk, and use dd to copy the filesystem to another partition: dd if=/dev/hda4 of=/dev/hdb4 bs=1k conf=sync,noerror Then run e2fsck on the new disk; it will create a new root and lost+found directory, and move all of the inodes that were in the root directory and home them in the lost+found directory. You should be able to reconstruct the names of each of the directories in the lost+found directory from their contents. Good luck! - Ted From tytso at mit.edu Tue Mar 21 04:33:33 2006 From: tytso at mit.edu (Theodore Ts'o) Date: Mon, 20 Mar 2006 23:33:33 -0500 Subject: fixing a corrupt /dev/hdar .. debugfs assistance... In-Reply-To: References: Message-ID: <20060321043333.GE8257@thunk.org> On Mon, Mar 20, 2006 at 04:27:22PM -0700, Chris Worley wrote: > I used ddrescue to copy /dev/md1 to a disk of sufficient size, and > re-ran e2fsck, and still get the error message that there's no root > file system (I've tried most every superblock): Using debugfs, copy out the the contents of "root inode"; since it might contain useful data, e2fsck didn't want to delete it out of hand. debugfs: dump <2> /tmp/contents-of-inode-2 Then purge the inode away: debugfs: clri <2> Then run e2fsck, and it will create a new root directory for you. Hope this helps! - Ted From worleys at gmail.com Tue Mar 21 20:51:20 2006 From: worleys at gmail.com (Chris Worley) Date: Tue, 21 Mar 2006 13:51:20 -0700 Subject: fixing a corrupt /dev/hdar .. debugfs assistance... In-Reply-To: <20060321043333.GE8257@thunk.org> References: <20060321043333.GE8257@thunk.org> Message-ID: Thanks for the help. Does <2> refer to a superblock? I.e.: Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616 The debugfs man page says the first arg of "clri" refers to a "file"? Thanks, Chris On 3/20/06, Theodore Ts'o wrote: > On Mon, Mar 20, 2006 at 04:27:22PM -0700, Chris Worley wrote: > > I used ddrescue to copy /dev/md1 to a disk of sufficient size, and > > re-ran e2fsck, and still get the error message that there's no root > > file system (I've tried most every superblock): > > Using debugfs, copy out the the contents of "root inode"; since it > might contain useful data, e2fsck didn't want to delete it out of > hand. > > debugfs: dump <2> /tmp/contents-of-inode-2 > > Then purge the inode away: > > debugfs: clri <2> > > Then run e2fsck, and it will create a new root directory for you. > > Hope this helps! > > - Ted > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tytso at mit.edu Wed Mar 22 02:40:12 2006 From: tytso at mit.edu (Theodore Ts'o) Date: Tue, 21 Mar 2006 21:40:12 -0500 Subject: fixing a corrupt /dev/hdar .. debugfs assistance... In-Reply-To: References: <20060321043333.GE8257@thunk.org> Message-ID: <20060322024012.GA10812@thunk.org> On Tue, Mar 21, 2006 at 01:51:20PM -0700, Chris Worley wrote: > Thanks for the help. > > Does <2> refer to a superblock? No, <2> refers to inode #2. See the debugfs man page: SPECIFYING FILES Many debugfs commands take a filespec as an argument to specify an inode (as opposed to a pathname) in the filesystem which is currently opened by debugfs. The filespec argument may be specified in two forms. The first form is an inode number surrounded by angle brackets, e.g., <2>. The second form is a pathname; if the pathname is prefixed by a forward slash ('/'), then it is interpreted relative to the root of the filesystem which is currently opened by debugfs. If not, the pathname is interpreted relative to the current working directory as maintained by debugfs. This may be modified by using the debugfs com- mand cd. - Ted From alexander.spanke at t-online.de Sat Mar 18 19:01:58 2006 From: alexander.spanke at t-online.de (Alexander Spanke) Date: Sat, 18 Mar 2006 20:01:58 +0100 Subject: ext3 - max filesystem size Message-ID: <441C5926.20804@t-online.de> Hi all, I am working with a pc cluster, running redhat el 4, on opteron cpus. we have several bigger RAID systems locally attached to the fileservers; now I would like to create a big striped filesystem with around 15TB. ext3 unfortunatelly only supports filesystem size up to 8TB, do you have an idea if / when this border will be increased ? I already found some discussions on LKML about it ? Which FS would be a goof alternative ? AFAIK xfs is not suported by redhat el 4 ... thanks for any hint, cheers alex From arjan at infradead.org Sun Mar 19 09:59:26 2006 From: arjan at infradead.org (Arjan van de Ven) Date: Sun, 19 Mar 2006 10:59:26 +0100 Subject: [Ext2-devel] Re: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060318225433.GJ21232@thunk.org> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> <20060318001629.GC17074@thunk.org> <20060317163234.74791ce9.akpm@osdl.org> <20060318225433.GJ21232@thunk.org> Message-ID: <1142762366.3018.11.camel@laptopd505.fenrus.org> > In general, I doubt directory readahead actually buys you *that* much, > because most workloads follow up the readdir with a stat() or an > open() call for each file returned, or something which requires > reading in the inode. In addition, it's rare that the directory will > be contiguously allocated, which also cuts down on the value of the > readahead. what it buys you shows up on raid1 I suppose... while disk 0 is doing the head seek for the stat/open, disk1 is doing the head seek for the next directory entry.... so I can see that being a big gain in that specific circumstance. From arjan at infradead.org Mon Mar 20 09:03:33 2006 From: arjan at infradead.org (Arjan van de Ven) Date: Mon, 20 Mar 2006 10:03:33 +0100 Subject: [Ext2-devel] Re: [RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default In-Reply-To: <20060320022410.GC17337@thunk.org> References: <20060317075312.GG30801@schatzie.adilger.int> <1142634418.3641.62.camel@orbit.scot.redhat.com> <20060317143630.300d82f8.akpm@osdl.org> <20060318001629.GC17074@thunk.org> <20060317163234.74791ce9.akpm@osdl.org> <20060318225433.GJ21232@thunk.org> <1142762366.3018.11.camel@laptopd505.fenrus.org> <20060320022410.GC17337@thunk.org> Message-ID: <1142845414.3114.17.camel@laptopd505.fenrus.org> On Sun, 2006-03-19 at 21:24 -0500, Theodore Ts'o wrote: > On Sun, Mar 19, 2006 at 10:59:26AM +0100, Arjan van de Ven wrote: > > > > > In general, I doubt directory readahead actually buys you *that* much, > > > because most workloads follow up the readdir with a stat() or an > > > open() call for each file returned, or something which requires > > > reading in the inode. In addition, it's rare that the directory will > > > be contiguously allocated, which also cuts down on the value of the > > > readahead. > > > > what it buys you shows up on raid1 I suppose... while disk 0 is doing > > the head seek for the stat/open, disk1 is doing the head seek for the > > next directory entry.... so I can see that being a big gain in that > > specific circumstance. > > > > As long as you have the spare disk bandwifth and disk1 isn't needed > reading the directory entry for some other make process as part of the > "make -j16" kernel build... sure.... on the flipside.. if the readahead request is in the elevator it may be merged with that make ;) so I agree, it's not even a guaranteed win on raid1 From evilninja at gmx.net Thu Mar 23 20:23:07 2006 From: evilninja at gmx.net (Christian) Date: Thu, 23 Mar 2006 20:23:07 -0000 (GMT) Subject: ext3 - max filesystem size In-Reply-To: <441C595B.7070505@gmail.com> References: <441C5926.20804@t-online.de> <441C595B.7070505@gmail.com> Message-ID: <54834.192.18.1.4.1143145387.squirrel@housecafe.dyndns.org> Hi, On Sat, March 18, 2006 19:02, Malcom wrote: > ext3 unfortunatelly only supports filesystem size up to 8TB, do you have > an idea if / when this border will be increased ? I already found some > discussions on LKML about it ? according to [0] the max. filesystem size seems to be "2TiB to 32TiB", which depends on the blocksize with which the fs has been created, iirc. > Which FS would be a goof alternative ? AFAIK xfs is not suported by > redhat el 4 ... doesn't Redhat ship with GFS too? might be exactly what you need for a cluster... Christian. [0] http://en.wikipedia.org/wiki/Comparison_of_file_systems -- BOFH excuse #442: Trojan horse ran out of hay From aspanke at gmail.com Fri Mar 24 09:58:32 2006 From: aspanke at gmail.com (Malcom) Date: Fri, 24 Mar 2006 10:58:32 +0100 Subject: ext3 - max filesystem size In-Reply-To: <54834.192.18.1.4.1143145387.squirrel@housecafe.dyndns.org> References: <441C5926.20804@t-online.de> <441C595B.7070505@gmail.com> <54834.192.18.1.4.1143145387.squirrel@housecafe.dyndns.org> Message-ID: Hi, On Sat, March 18, 2006 19:02, Malcom wrote: > > ext3 unfortunatelly only supports filesystem size up to 8TB, do you have > > > an idea if / when this border will be increased ? I already found some > > discussions on LKML about it ? > > according to [0] the max. filesystem size seems to be "2TiB to 32TiB", > which depends on the blocksize with which the fs has been created, iirc. You're right, but isn#t it fixed with RH 4 ? My information is RH 4 only supports ext3 with max 8 TB Is it possible to be tuned by myself ? > > Which FS would be a goof alternative ? AFAIK xfs is not suported by > > redhat el 4 ... > > doesn't Redhat ship with GFS too? might be exactly what you need for a > cluster... I will check it ... Alex > Christian. > > [0] http://en.wikipedia.org/wiki/Comparison_of_file_systems > -- > BOFH excuse #442: > Trojan horse ran out of hay > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From keld at dkuug.dk Tue Mar 28 05:28:59 2006 From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen) Date: Tue, 28 Mar 2006 07:28:59 +0200 Subject: Salvage or undelete files of damaged ext2/ext3 file systems Message-ID: <20060328052859.GA31460@rap.rap.dk> Hi! I have made some extensions to debugfs to undelete or recover files from a damaged ext3 file system. Salvage or undelete files of damaged ext2/ext3 file systems debugfs salvage command can be used to salvage files from a damaged ext3 or ext2 file system. The code is alpha, so use at your own risk. the usage is: salvage first-block count-blocks Salvage tries to salvage files found in blocks starting from first-block and then totally count-block blocks. if count-blocks is 0, or greater than the number of blocks in the file system, it is set to the number of blocks in the fs. typical use: cd directory-to-hold-salvaged-files path/debugfs /dev/damaged-file-system salvage 1 0 salvage could be useful if you have accidently remoded a lot of files in a ext2/ext3 file system like with a "rm -rf /" command, or if you accidently reformatted the file system via an mkfs/mke2fs command, or if your harddisk had severe hardware problems. The filenames will be "ix1-" or "ix2-" followed by the block number of the indirect or double indirect block number that defined the main part of the file. In this way salvage may be used to repair, rescue, recover or undelete files. If a file already exists in the current directory, and is not writable, salvage will skip writing it. This can be used to skip writing of big files. less can be used to inspect the files. newer versions of less can detect various file formats, such as gzip, bzip2, tar etc., and display something meaningful. method: Salvage goes thru all the blocks in the fs, and checks whether this could be indirect or double indirect blocks. If so, it tries to recreate the file in the current directory. There is then involved some guessing about what indirect block is corresponding to a double indirect block, and what initial data blocks are corresponding to an indirect block. The guessing, and the fact that some blocks may have been reused unfortunately means that there may be errors in the salvaged files. So check them afterwards. The code is alpha and has only been tested on a Linux i386 system Salvage is an addition to the debugfs program in the e2progs package by Ted Ts'o. Download: http://www.dkuug.dk/keld/e2fsprogs-1.38-ks1.tar.gz. Then follow INSTALL instructions. Author: Keld Simonsen, keld at dkuug.dk From worleys at gmail.com Tue Mar 28 16:21:50 2006 From: worleys at gmail.com (Chris Worley) Date: Tue, 28 Mar 2006 09:21:50 -0700 Subject: fixing a corrupt /dev/hdar .. debugfs assistance... In-Reply-To: <20060322024012.GA10812@thunk.org> References: <20060321043333.GE8257@thunk.org> <20060322024012.GA10812@thunk.org> Message-ID: Thanks for the clarification, I obviously didn't read the man page as thoroughly as I should have. I was able to zero-out the 2nd inode (but, I could not dump it to a file... the resulting file was 0 bytes), and e2fsck got beyond that point, but is no segfaulting in the 4th pass. The disk I'm DD'ing to is firewire/USB, so I can move it to different systems pretty easily. The USB/Firewire disk has more blocks than the MD1 device I'm trying to resurrect: I couldn't create a partition of the exact same size, so I just use the whole device (so there will be trailing garbage at the end of the device/partition that's not part of the file system). On one system, fsck segfaults after the 4th pass, on another system, it nicely reports the segfault and exits: Warning... fsck.ext2 for device /dev/sdc exited with signal 11. fsck.ext2 /dev/sdc failed (status 0x8). Run manually! The last message before the segfault, on either the faulty md1 or the backup system is: i_fsize for inode 7663554 (...) is 150, should be zero. Clear? yes Why always the same errors? Doesn't e2fsck commit the change when you respond "Y"? The same event occurs if I use a different superblock. Note that in using dd or dd_rescue, there are no errors reading from the bad disk device (md1). Off topic: It's also strange that my system w/ a 2.6.11 kernel won't mount or fsck the USB/Firewire drive, even after making a file system from scratch. On 3/21/06, Theodore Ts'o wrote: > On Tue, Mar 21, 2006 at 01:51:20PM -0700, Chris Worley wrote: > > Thanks for the help. > > > > Does <2> refer to a superblock? > > No, <2> refers to inode #2. See the debugfs man page: > > SPECIFYING FILES > Many debugfs commands take a filespec as an argument to specify an > inode (as opposed to a pathname) in the filesystem which is currently > opened by debugfs. The filespec argument may be specified in two > forms. The first form is an inode number surrounded by angle brackets, > e.g., <2>. The second form is a pathname; if the pathname is prefixed > by a forward slash ('/'), then it is interpreted relative to the root > of the filesystem which is currently opened by debugfs. If not, the > pathname is interpreted relative to the current working directory as > maintained by debugfs. This may be modified by using the debugfs com- > mand cd. > > - Ted > From worleys at gmail.com Tue Mar 28 17:15:03 2006 From: worleys at gmail.com (Chris Worley) Date: Tue, 28 Mar 2006 10:15:03 -0700 Subject: fixing a corrupt /dev/hdar .. debugfs assistance... In-Reply-To: References: <20060321043333.GE8257@thunk.org> <20060322024012.GA10812@thunk.org> Message-ID: Note that I was able to mount the file system, and as Ted said, all the directories were under a directory in lost+found. Can I assume that if a file exists in the subdirectories that it's okay? Or, in other words, what's the best way to assure the the files found are good? Thanks, Chris On 3/28/06, Chris Worley wrote: > Thanks for the clarification, I obviously didn't read the man page as > thoroughly as I should have. > > I was able to zero-out the 2nd inode (but, I could not dump it to a > file... the resulting file was 0 bytes), and e2fsck got beyond that > point, but is no segfaulting in the 4th pass. > > The disk I'm DD'ing to is firewire/USB, so I can move it to different > systems pretty easily. The USB/Firewire disk has more blocks than the > MD1 device I'm trying to resurrect: I couldn't create a partition of > the exact same size, so I just use the whole device (so there will be > trailing garbage at the end of the device/partition that's not part of > the file system). > > On one system, fsck segfaults after the 4th pass, on another system, > it nicely reports the segfault and exits: > > Warning... fsck.ext2 for device /dev/sdc exited with signal 11. > fsck.ext2 /dev/sdc failed (status 0x8). Run manually! > > The last message before the segfault, on either the faulty md1 or the > backup system is: > > i_fsize for inode 7663554 (...) is 150, should be zero. > Clear? yes > > Why always the same errors? Doesn't e2fsck commit the change when you > respond "Y"? > > The same event occurs if I use a different superblock. > > Note that in using dd or dd_rescue, there are no errors reading from > the bad disk device (md1). > > Off topic: It's also strange that my system w/ a 2.6.11 kernel won't > mount or fsck the USB/Firewire drive, even after making a file system > from scratch. > > > On 3/21/06, Theodore Ts'o wrote: > > On Tue, Mar 21, 2006 at 01:51:20PM -0700, Chris Worley wrote: > > > Thanks for the help. > > > > > > Does <2> refer to a superblock? > > > > No, <2> refers to inode #2. See the debugfs man page: > > > > SPECIFYING FILES > > Many debugfs commands take a filespec as an argument to specify an > > inode (as opposed to a pathname) in the filesystem which is currently > > opened by debugfs. The filespec argument may be specified in two > > forms. The first form is an inode number surrounded by angle brackets, > > e.g., <2>. The second form is a pathname; if the pathname is prefixed > > by a forward slash ('/'), then it is interpreted relative to the root > > of the filesystem which is currently opened by debugfs. If not, the > > pathname is interpreted relative to the current working directory as > > maintained by debugfs. This may be modified by using the debugfs com- > > mand cd. > > > > - Ted > > > From hahaha_30k at yahoo.com Tue Mar 28 21:52:57 2006 From: hahaha_30k at yahoo.com (Robinson Tiemuqinke) Date: Tue, 28 Mar 2006 13:52:57 -0800 (PST) Subject: FC5: "ext_attr" and "large_file" features for ext3 file systems ??? Message-ID: <20060328215257.28237.qmail@web36702.mail.mud.yahoo.com> Hi, Fedora Core ext3 file systems newbie questions: Just interested in the Linux ext3 features but got confused with "large_file" and "ext_attr". First, what's the "large_file" feature REALLY means? For file systems created with same commands and options some file systems have it on while some not. It is said that the feature is automatic -- If there is a large file then it is lit on otherwise it is off. Then, what's the size of "large file" to light this feature on? 2GB, or 2TB? or varies with the kernel version? Second, the "ext_attr" feature seems another automatic one: it only appears after the first "setfacl" command runs on the file system and then the feature will keep on there forever even ACL is removed. What's the indication of "ext_attr" feature and what are the reasons behind to have this feature? Thanks a lot. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From digvijoy_chatterjee at infosys.com Wed Mar 29 05:55:08 2006 From: digvijoy_chatterjee at infosys.com (Digvijoy Chatterjee) Date: Wed, 29 Mar 2006 11:25:08 +0530 Subject: FC5: "ext_attr" and "large_file" features for ext3 file systems ??? In-Reply-To: <20060328215257.28237.qmail@web36702.mail.mud.yahoo.com> References: <20060328215257.28237.qmail@web36702.mail.mud.yahoo.com> Message-ID: <1143611708.29731.1.camel@linux.site> ext_attr , is for extendended attributes , things like access control are not part of an ext3 FS by default ,if you want acl ,then the ext_attr feature will tell you. large_file is for files beyond 2TB ,but when it swtiches on automatically , i have no idea.. Digz On Tue, 2006-03-28 at 13:52 -0800, Robinson Tiemuqinke wrote: > Hi, > > Fedora Core ext3 file systems newbie questions: > > Just interested in the Linux ext3 features but got > confused with "large_file" and "ext_attr". > > First, what's the "large_file" feature REALLY means? > For file systems created with same commands and > options some file systems have it on while some not. > It is said that the feature is automatic -- If there > is a large file then it is lit on otherwise it is off. > Then, what's the size of "large file" to light this > feature on? 2GB, or 2TB? or varies with the kernel > version? > > Second, the "ext_attr" feature seems another > automatic one: it only appears after the first > "setfacl" command runs on the file system and then the > feature will keep on there forever even ACL is > removed. What's the indication of "ext_attr" feature > and what are the reasons behind to have this feature? > > Thanks a lot. > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS***