From philipp.marek at bmlv.gv.at Wed Mar 2 12:14:01 2005 From: philipp.marek at bmlv.gv.at (Ph. Marek) Date: Wed, 2 Mar 2005 13:14:01 +0100 Subject: searching for ext3 defrag/file move program Message-ID: <200503021314.01396.philipp.marek@bmlv.gv.at> Hello everybody, reading about the speed improvements possible with (on boot) preloaded files (which should be continuous on disk) I searched for a ext3 defrag program. I found an ext2 defrag program (http://www.ibiblio.org/pub/Linux/system/filesystems/defrag-0.70.tar.gz, available in debian as defrag) which would have an optimal feature (moving files by a list) but refuses to work on ext3. Is there a version which does ext3? Or has somebody a program which allows me to move files on the disk? And BTW, is there an easy way (no kernel patching if possible) to determine which files are used in which order during boot? Regards, Phil From tytso at mit.edu Wed Mar 2 14:49:21 2005 From: tytso at mit.edu (Theodore Ts'o) Date: Wed, 2 Mar 2005 09:49:21 -0500 Subject: 1.36 again In-Reply-To: <20050228173935.GE30585@ti64.telemetry-investments.com> References: <200502250824.45902.gene@czarc.net> <20050225162815.GA6082@thunk.org> <20050225182714.GT27352@schnapps.adilger.int> <20050225191807.GA21103@ti64.telemetry-investments.com> <20050226163812.GA15346@thunk.org> <20050228173935.GE30585@ti64.telemetry-investments.com> Message-ID: <20050302144921.GA12744@thunk.org> On Mon, Feb 28, 2005 at 12:39:35PM -0500, Bill Rugolsky Jr. wrote: > On Sat, Feb 26, 2005 at 11:38:12AM -0500, Theodore Ts'o wrote: > > E2fsprogs.spec *is* supposed to be a distro-neutral spec file, but I > > don't regularly use an rpm-based distribution these days, so I am > > depending on others to report bugs and suggest patches. It would be > > helpful though if people actually *tried* to use it as opposed making > > incorrect assertions on the mailing list. :-) > > Sorry Ted, ENOCAFFEINE. It turns out that rpmbuild -ta is incompatible > with my ~/.rpmmacro file (which include %{name} in the paths). Using > the standard layout, the %find_lang fails on FC3. As penance, I'll > figure out what's wrong and offer up a patch. :-) I think I know what's going on here. My fault; when I moved the .gmo files to the source tree, instead of making them be pure generated file (the GNU i18n tools and makefiles having rather loose definitions of backwards compatibility, and this was recommended by the GNU i18n maintainer as the best way to be compatible with systems using older versions of the tools), I forgot to fix the gen-tarball script so it wouldn't remove the .gmo files from the source tarball. As a result, the reason why %find_lang doesn't work is that the .gmo files aren't present, and the po Makefile doesn't generate normally anymore (since that would break on, for example, on old Slackware systems that don't have the most recent version of the GNU i18n tools). I didn't notice this problem since I normally build out of my BK tree, and while I do a test build of the generated final tarball, I didn't test "make install". The fix is to either backout the change which supresses the generation of the .gmo files (which should work for FC3, since it should have a recent enough version of the i18n so that it's compatible with the po/Makefile.in.in shipped with e2fsprogs), or to spin a new release of e2fsprogs that actually has the .gmo files. I suspect I'll probably put out an e2fsprogs 1.37 fairly rapidly to avoid this problem on other distributions. - Ted From tytso at mit.edu Wed Mar 2 16:45:04 2005 From: tytso at mit.edu (Theodore Ts'o) Date: Wed, 2 Mar 2005 11:45:04 -0500 Subject: Failures they e2fsck doesn't find In-Reply-To: <20050213212639.GA24996@alea.gnuu.de> References: <20050204164527.GA24817@alea.gnuu.de> <20050208180547.GH2635@schnapps.adilger.int> <20050208234449.GA8405@alea.gnuu.de> <20050209182657.GO2635@schnapps.adilger.int> <20050210134003.GA14472@alea.gnuu.de> <20050210172341.GS2635@schnapps.adilger.int> <20050213212639.GA24996@alea.gnuu.de> Message-ID: <20050302164502.GA15671@thunk.org> Joerg, How big is the filesystem? Would you be willing to send me the output of e2image -r /dev/hdXX - | bzip2 > hdXX.img.gz? Many thanks!! - Ted From joerg at alea.gnuu.de Wed Mar 2 18:06:55 2005 From: joerg at alea.gnuu.de (=?iso-8859-1?Q?J=F6rg?= Sommer) Date: Wed, 2 Mar 2005 19:06:55 +0100 Subject: Failures they e2fsck doesn't find In-Reply-To: <20050302164502.GA15671@thunk.org> References: <20050204164527.GA24817@alea.gnuu.de> <20050208180547.GH2635@schnapps.adilger.int> <20050208234449.GA8405@alea.gnuu.de> <20050209182657.GO2635@schnapps.adilger.int> <20050210134003.GA14472@alea.gnuu.de> <20050210172341.GS2635@schnapps.adilger.int> <20050213212639.GA24996@alea.gnuu.de> <20050302164502.GA15671@thunk.org> Message-ID: <20050302180655.GA6225@alea.gnuu.de> Theodore Ts'o schrieb am Wed 02. Mar, 11:45 (-0500) : > Joerg, > > How big is the filesystem? Would you be willing to send me nearly 17GB. > the output of e2image -r /dev/hdXX - | bzip2 > hdXX.img.gz? Ehmm, no. My PGP-Key is on this filesystem and I don't have space to create the image. Can I extract only parts? BTW: On idea: I have an apple machine. Might there a problem with endianes or with signed/unsigned int? J?rg. -- Wenn unser Hirn so einfach w?re, dass es sich selbst begreifen k?nnte, dann k?nnte es sich selbst nicht mehr begreifen, weil es zu einfach w?re ;) Nur mal so als philosophischer Denkanstoss... -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From theman at josephdwagner.info Wed Mar 2 21:15:54 2005 From: theman at josephdwagner.info (Joseph D. Wagner) Date: Wed, 2 Mar 2005 15:15:54 -0600 Subject: searching for ext3 defrag/file move program In-Reply-To: <200503021314.01396.philipp.marek@bmlv.gv.at> Message-ID: <200503022114.j22LEmHi011055@josephdwagner.info> I'm on the e2fsprogs team. I worked on a defrag program for EXT3 for about six months. In the end, we abandoned the program as unfeasible. I have to go to work now, but I can send you more details later. Joseph D. Wagner From evilninja at gmx.net Thu Mar 3 14:38:59 2005 From: evilninja at gmx.net (Christian) Date: Thu, 03 Mar 2005 15:38:59 +0100 Subject: Failures they e2fsck doesn't find In-Reply-To: <20050302180655.GA6225@alea.gnuu.de> References: <20050204164527.GA24817@alea.gnuu.de> <20050208180547.GH2635@schnapps.adilger.int> <20050208234449.GA8405@alea.gnuu.de> <20050209182657.GO2635@schnapps.adilger.int> <20050210134003.GA14472@alea.gnuu.de> <20050210172341.GS2635@schnapps.adilger.int> <20050213212639.GA24996@alea.gnuu.de> <20050302164502.GA15671@thunk.org> <20050302180655.GA6225@alea.gnuu.de> Message-ID: <42272183.6000302@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 J?rg Sommer wrote: > >>the output of e2image -r /dev/hdXX - | bzip2 > hdXX.img.gz? > > Ehmm, no. My PGP-Key is on this filesystem and I don't have space to > create the image. Can I extract only parts? for the sake of correctness, man 8 e2image tells us: This [the command above] will only send the metadata information, without any data blocks. However, the filenames in the directory blocks can still reveal information about the contents of the filesystem that the bug reporter may wish to keep confidential. To address this concern, the - -s option can be specified. This will cause e2image to scramble directory entries and zero out any unused portions of the directory blocks before writing them to the image file. being curious about e2image by myself i just had to test it on a loop-aes mounted 4GB ext3 partition, the product is a 6,2MB loop6.img.gz... Christian. - -- BOFH excuse #21: POSIX compliance problem -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCJyGDC/PVm5+NVoYRAotiAJ48aFK5B3bWyLVCO/MfekvFZtJcaQCgseWb pdjz1bm6LUXVV6h6OHtR0f8= =4eYf -----END PGP SIGNATURE----- From tytso at mit.edu Thu Mar 3 16:10:41 2005 From: tytso at mit.edu (Theodore Ts'o) Date: Thu, 3 Mar 2005 11:10:41 -0500 Subject: searching for ext3 defrag/file move program In-Reply-To: <200503021314.01396.philipp.marek@bmlv.gv.at> References: <200503021314.01396.philipp.marek@bmlv.gv.at> Message-ID: <20050303161041.GB10315@thunk.org> On Wed, Mar 02, 2005 at 01:14:01PM +0100, Ph. Marek wrote: > Hello everybody, > > reading about the speed improvements possible with (on boot) preloaded files > (which should be continuous on disk) I searched for a ext3 defrag program. > > I found an ext2 defrag program > (http://www.ibiblio.org/pub/Linux/system/filesystems/defrag-0.70.tar.gz, > available in debian as defrag) which would have an optimal feature (moving > files by a list) but refuses to work on ext3. > > Is there a version which does ext3? Or has somebody a program which allows me > to move files on the disk? The e2defrag program had some problems where it only worked on 1k blocksizes, if I remember correctly. It was also extremely dangerous in that if it crashed or you had a system crash/powerfailure in middle of the operation, your filesystem would be totally scrambled. Therefore, the only safe way to use it was to do a full backup, and if the system crashed, restore from the backup. Given that the filesystem had to be unmounted during the e2defrag process, and combined with the fact that if you wanted to be safe, you had to do a full backup of the data *anyway*, the time difference between doing "backup; e2defrag; mke2fs and restore if your system crashed" and "backup; mke2fs; restore" was such that it really wasn't worth it. Mainly, there hasn't been sufficient interest to write a (safe, effective) ext2 defragger. (There was one crazy person who didn't believe me when I told him that an ext3 defragger couldn't be done purely in userspace, until he banged his head against the wall enough times, but that doesn't count. :-) Instead there has been more interest in tweaking algorithsm that try to avoid the fragmentation problem in the first place --- for example, such as the Orlov allocator that got introduced during Linux 2.5. Another example is the delayed allocation code plus the extent mapping extension that has been currently discussed on ext2-devel. There has been talk about writing a kernel extension which implements a few safe, atomic operations, such as relocating a logical block #w in inode #x from block #y to #z, and "here are all the pathnames that point at inode #x, relocate that inode to be stored at location #y", and then implementing the rest in userspace. But it just hasn't risen to the top of anyone's todo lists yet. - Ted From marc.gerritzen at t-online.de Thu Mar 3 17:35:42 2005 From: marc.gerritzen at t-online.de (Marc Gerritzen) Date: Thu, 3 Mar 2005 18:35:42 +0100 Subject: Killed my filesystem?! (data moved to lost&found) Message-ID: <232c6e7fdde2d55469b4e06eb11d663d@t-online.de> Hello, I have/had a 300gb ext3-partition, which was mounted for a while without an fsck. Some time ago I noticed, that it had "lost" 10gb (avail 0mb, used 290gb, size 300gb). So I tried to unmount the partition and do a fsck. But the partition was "busy", so I did a umount -l on that partition (yes I know now that this wasn?t the best idea at all) and fscked -y it. The fsck linked many inodes to lost&found, but I thought "fsck does know what it?s doing" ;) After this I remounted the partition and found it COMPLETLY EMPTY :(, except of thousands of files with funny names like "#31604834" in lost&found. So to my question: Does anybody know how to get my data back? :) Thanks very much and greetings from Germany, Marc Gerritzen -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 817 bytes Desc: not available URL: From theman at josephdwagner.info Thu Mar 3 18:00:28 2005 From: theman at josephdwagner.info (Joseph D. Wagner) Date: Thu, 3 Mar 2005 12:00:28 -0600 Subject: searching for ext3 defrag/file move program In-Reply-To: <20050303161041.GB10315@thunk.org> Message-ID: <200503031759.j23HxLuo016028@josephdwagner.info> > There was one crazy person who didn't believe me when > I told him that an ext3 defragger couldn't be done > purely in userspace, until he banged his head against > the wall enough times Gee, thanks, Ted. Maybe you could see the glass as half full and refer to me as a determined, optimistic, young programmer instead. ;-) Joseph D. Wagner From adilger at clusterfs.com Thu Mar 3 20:40:31 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 3 Mar 2005 13:40:31 -0700 Subject: Failures they e2fsck doesn't find In-Reply-To: <42272183.6000302@gmx.net> References: <20050204164527.GA24817@alea.gnuu.de> <20050208180547.GH2635@schnapps.adilger.int> <20050208234449.GA8405@alea.gnuu.de> <20050209182657.GO2635@schnapps.adilger.int> <20050210134003.GA14472@alea.gnuu.de> <20050210172341.GS2635@schnapps.adilger.int> <20050213212639.GA24996@alea.gnuu.de> <20050302164502.GA15671@thunk.org> <20050302180655.GA6225@alea.gnuu.de> <42272183.6000302@gmx.net> Message-ID: <20050303204031.GE27352@schnapps.adilger.int> On Mar 03, 2005 15:38 +0100, Christian wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > J?rg Sommer wrote: > > Ehmm, no. My PGP-Key is on this filesystem and I don't have space to > > create the image. Can I extract only parts? In any case, your PGP private key should have a passphrase... > for the sake of correctness, man 8 e2image tells us: > > This [the command above] will only send the metadata information, without > any data blocks. However, the filenames in the directory blocks can > still reveal information about the contents of the filesystem that the bug > reporter may wish to keep confidential. To address this concern, the > - -s option can be specified. This will cause e2image to scramble directory > entries and zero out any unused portions of the directory blocks before > writing them to the image file. Note that in this case the -s option would be useless, since the problem is specifically in the filenames, so if they are scrambled there is nothing useful to work with. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From puhuri at iki.fi Fri Mar 4 06:37:17 2005 From: puhuri at iki.fi (Markus Peuhkuri) Date: Fri, 04 Mar 2005 08:37:17 +0200 Subject: mke2fs options for very large filesystems In-Reply-To: <20050210175528.GA10041@thunk.org> References: <20050210175528.GA10041@thunk.org> Message-ID: <4228021D.4060808@iki.fi> Theodore Ts'o wrote: >sizes on the filesystem, but on average this tends to happen when the >utilization rises to 90-95%. > > Is there any studies how a count of files relate to fragmentation? I have an situation that I store fairly large files (about 500 MB) to 200 GiB paritition (seems to have now 700 files). There are only one or two active writers at each time and then I run once a hour cleanup script that deletes oldest files until there is at least 90% free -- thus peak utilisation is something like 91-92%. Each file has lifetime of few days. So far, there has not been any performance problems but as system will be running for years, is there a possibility that the file system will be fragmented at later time. Can one check fragmentation without unmounting fs (as it is not possible to bring system down for any long period of time)? From philipp.marek at bmlv.gv.at Fri Mar 4 07:33:03 2005 From: philipp.marek at bmlv.gv.at (Ph. Marek) Date: Fri, 4 Mar 2005 08:33:03 +0100 Subject: searching for ext3 defrag/file move program In-Reply-To: <20050303161041.GB10315@thunk.org> References: <200503021314.01396.philipp.marek@bmlv.gv.at> <20050303161041.GB10315@thunk.org> Message-ID: <200503040833.03943.philipp.marek@bmlv.gv.at> On Thursday 03 March 2005 17:10, Theodore Ts'o wrote: > The e2defrag program had some problems where it only worked on 1k > blocksizes, if I remember correctly. It was also extremely dangerous > in that if it crashed or you had a system crash/powerfailure in middle > of the operation, your filesystem would be totally scrambled. > Therefore, the only safe way to use it was to do a full backup, and if > the system crashed, restore from the backup. > > Given that the filesystem had to be unmounted during the e2defrag > process, and combined with the fact that if you wanted to be safe, you > had to do a full backup of the data *anyway*, the time difference > between doing "backup; e2defrag; mke2fs and restore if your system > crashed" and "backup; mke2fs; restore" was such that it really wasn't > worth it. > > Mainly, there hasn't been sufficient interest to write a (safe, > effective) ext2 defragger. (There was one crazy person who didn't > believe me when I told him that an ext3 defragger couldn't be done > purely in userspace, until he banged his head against the wall enough > times, but that doesn't count. :-) Instead there has been more > interest in tweaking algorithsm that try to avoid the fragmentation > problem in the first place --- for example, such as the Orlov > allocator that got introduced during Linux 2.5. Another example is > the delayed allocation code plus the extent mapping extension that has > been currently discussed on ext2-devel. > > There has been talk about writing a kernel extension which implements > a few safe, atomic operations, such as relocating a logical block #w > in inode #x from block #y to #z, and "here are all the pathnames that > point at inode #x, relocate that inode to be stored at location #y", > and then implementing the rest in userspace. But it just hasn't risen > to the top of anyone's todo lists yet. Ok, given that's it not easily possible. How about a program that moves just the file's data to the start of the disk? AFAIK it doesn't work just to copy the files - you won't get them copied to a defined place, they'll end up in the various groups. Any idea how to make the startup process faster? Regards, Phil From theman at josephdwagner.info Fri Mar 4 10:03:48 2005 From: theman at josephdwagner.info (Joseph D. Wagner) Date: Fri, 4 Mar 2005 04:03:48 -0600 Subject: mke2fs options for very large filesystems In-Reply-To: <4228021D.4060808@iki.fi> Message-ID: <200503041002.j24A2fGc009198@josephdwagner.info> > Can one check fragmentation without unmounting fs > (as it is not possible to bring system down for > any long period of time)? Sure, there's a program in e2fsprogs called filefrag. You specify a file; it will tell you how fragmented the file is compared to how defragmented it could be. Jospeh D. Wagner From theman at josephdwagner.info Fri Mar 4 10:05:24 2005 From: theman at josephdwagner.info (Joseph D. Wagner) Date: Fri, 4 Mar 2005 04:05:24 -0600 Subject: searching for ext3 defrag/file move program In-Reply-To: <200503021314.01396.philipp.marek@bmlv.gv.at> Message-ID: <200503041004.j24A4Hei011069@josephdwagner.info> I approached the problem from several different angles. Approach #1 - Partition Must Be Unmounted to Defrag Advantages: * Minimize chance of file system corruption because no other processes could be modifying files. * Can manipulate the file system in ways that would otherwise be unsafe if the file system were mounted. Disadvantages: * Must directly manipulate the journal (i.e. a lot of tedious programming) so that the file system can remain consistent in the event of a power failure * The partition would be completely unavailable during the entire defrag process resulting in significant down time. Ultimately rejected because of the last disadvantage. Approach #2 - On-the-fly Defragmentation using the EXT2FS library for Block Allocation Advantages: * No down time. * Utilizes existing EXT3 journal programming by performing the defragmentation using normal read-write operations. Disadvantages: * Without mandatory locking, there would be no way to ensure that another process has not modified the file. * Because of the way the kernel buffers I/O data, another process could be attempting to utilize the same blocks as the defrag program. Attempted, but ultimately rejected because of the last disadvantage. Approach #3 - On-the-fly Defragmentation without a Block Allocation Policy Advantages: * This is a close as we can get to a "safe" on-the-fly defragmentation program. * No down time. * Utilizes existing EXT3 journal programming by performing the defragmentation using normal read-write operations. Disadvantages: * Without a block allocation policy, defrag would be nothing more than a sophisticated copy program. * While it is likely that the copy would be less fragmented than the original, without a block allocation policy there would be no guarantee. The copy would have to be checked against the original before overwriting it. * Without mandatory locking, there would be no way to ensure that another process has not modified the file. Attempted, but ultimately rejected because of the last disadvantage. In order to make a defrag for EXT3 safe, you would need to do one of two things: 1) Develop a Defrag API or some sort of File System Maintenance API which included Defrag support and successfully integrate it into the kernel. - OR - 2) Extend Mandatory Locking to every file the system opens and closes and integrate such a patch into the kernel. Either task is uphill sledding. Joseph D. Wagner From philipp.marek at bmlv.gv.at Fri Mar 4 10:12:51 2005 From: philipp.marek at bmlv.gv.at (Ph. Marek) Date: Fri, 4 Mar 2005 11:12:51 +0100 Subject: searching for ext3 defrag/file move program In-Reply-To: <200503041004.j24A4Hei011069@josephdwagner.info> References: <200503041004.j24A4Hei011069@josephdwagner.info> Message-ID: <200503041112.52200.philipp.marek@bmlv.gv.at> > I approached the problem from several different angles. > >... > In order to make a defrag for EXT3 safe, you would need to do one of > two things: > > 1) Develop a Defrag API or some sort of File System Maintenance API > which included Defrag support and successfully integrate it into > the kernel. > - OR - > 2) Extend Mandatory Locking to every file the system opens and closes > and integrate such a patch into the kernel. > > Either task is uphill sledding. And a single defrag run to speedup the booting would soon be worthless, if the system configuration gets changed (at least after some "apt-get dist-upgrade"s :-). That could work for a "fire and forget" workstation - setup, defrag, move some files to front, work only ever as user without changing /bin, /etc, and so on ... So it won't be possible. I read that there was a patch for in-kernel-movement of blocks; what has happened to it? (Do we want reiserfs, xfs, whatever to have this advantage, and ext2/3 left behind?) I feel a strong urge to shoot the messenger or at least the message, but thank you for your answer :-) Regards, Phil From puhuri at iki.fi Fri Mar 4 14:20:50 2005 From: puhuri at iki.fi (Markus Peuhkuri) Date: Fri, 04 Mar 2005 16:20:50 +0200 Subject: mke2fs options for very large filesystems In-Reply-To: <4228021D.4060808@iki.fi> References: <20050210175528.GA10041@thunk.org> <4228021D.4060808@iki.fi> Message-ID: <42286EC2.7070506@iki.fi> Markus Peuhkuri wrote: > fragmentation without unmounting fs (as it is not possible to bring > system down for any long period of time)? Thanks Joseph for pointing for filefrag (the system is running woody, that has older e2fsprogs, but found 1.35 from backports.org). However, I'm a bit unsure how to interept figures (man page is quite short). As I run the filefrag for system, I get on average 1000 extents (for those 500 MB files) and perfection seems to be 2 to 5. I think "extent" is count of "segments" or count of fragments file is stored on disk. Thus average size for a fragment is about half megabyte, while average size of each file ranges from 100 kB to 1.5 MB. Is that figure bad? Would it help to periodicaly clean disk more? The 90% disk full is to have safe margin in case there are problems with data analysis. If I make sure that all data is analysed properly then I could clean disk more to something below 50% full. There seems to be a difference between partitions: for the other partition I got quite different figures: average is 1700 extents and minimum average framgent size is 81kB (5627 extents for 450 MB file). Both partitions usage is about the same, the latter one has sligthly larger files. I also tested defragmenting Berkeley DB files using "cp -rp db db.new": the other db improved quite a lot, but for other it did not help much. Disk was about 87% full. db/en: 41254 extents found, perfection would be 5 extents db/ip: 68873 extents found, perfection would be 6 extents db.new/en: 32606 extents found, perfection would be 5 extents db.new/ip: 25 extents found, perfection would be 6 extents Should test timing; it probably helped DB performance quite a lot. Many disk I have, are mainly for archive purposes, thus a file is stored there once (one at time) and the file will stay there as long as disk works, only read every now and then. I think in that use the fragmentation does not have a lot? Maybe only the few last files are fragmented and have lower performance. From jonathan.purcell at veritas.com Fri Mar 4 17:26:05 2005 From: jonathan.purcell at veritas.com (Jonathan Purcell) Date: Fri, 04 Mar 2005 17:26:05 +0000 Subject: ext2online difficulty Message-ID: <42289A2D.3000007@veritas.com> Hi all I am having some trouble using the ext2online utility, I have reduced the problem down to its simplist form, and it goes soemthing like this: Start with a regular msdos labelled disk (I have tried lvm volumes): Command (m for help): p Disk /dev/sdb: 18.3 GB, 18351967232 bytes 64 heads, 32 sectors/track, 17501 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 17501 17921008 83 Linux Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. Create an undersized filesystem ontop: [root at blah ~]# mke2fs /dev/sdb1 1048576 mke2fs 1.35 (28-Feb-2004) max_blocks 268435456, rsv_groups = 8192, rsv_gdb = 63 Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 131072 inodes, 262144 blocks 13107 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=268435456 8 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Writing inode tables: done inode.i_blocks = 2528, i_size = 4243456 Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 25 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [root at blah ~]# mount /dev/sdb1 /mnt Try an online resize: [root at blah ~]# ext2online /dev/sdb1 ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b ext2online: ext2_ioctl: Inappropriate ioctl for device ext2online: unable to resize /dev/sdb1 I am using RedHat AS4: [root at blah ~]# uname -rsmpio Linux 2.6.9-5.ELsmp x86_64 x86_64 x86_64 GNU/Linux [root at blah ~]# rpm -qa | grep e2fs e2fsprogs-1.35-11.6.EL4 e2fsprogs-devel-1.35-11.6.EL4 Any help greatly appreciated. Thanks Jonathan From daniel at rimspace.net Fri Mar 4 17:56:40 2005 From: daniel at rimspace.net (Daniel Pittman) Date: Sat, 05 Mar 2005 04:56:40 +1100 Subject: mke2fs options for very large filesystems References: <20050210175528.GA10041@thunk.org> <4228021D.4060808@iki.fi> <42286EC2.7070506@iki.fi> Message-ID: <87hdjrthuv.fsf@enki.rimspace.net> On 5 Mar 2005, Markus Peuhkuri wrote: > Markus Peuhkuri wrote: > >> fragmentation without unmounting fs (as it is not possible to bring >> system down for any long period of time)? > > Thanks Joseph for pointing for filefrag (the system is running woody, > that has older e2fsprogs, but found 1.35 from backports.org). > > However, I'm a bit unsure how to interept figures (man page is quite > short). As I run the filefrag for system, I get on average 1000 extents > (for those 500 MB files) and perfection seems to be 2 to 5. I think > "extent" is count of "segments" or count of fragments file is stored on > disk. Thus average size for a fragment is about half megabyte, while > average size of each file ranges from 100 kB to 1.5 MB. An extent is a single contiguous run of disk sectors. Since ext3 breaks up the disk with inode tables, block groups, and other bits of accounting stuff, a file that size can't simply run as one single extent. So, your files have around a thousand locations when, in theory, they could fit in no more than 2 to 5 on a perfectly clean filesystem. > Is that figure bad? ...er, maybe? Is it causing performance problems for you? Do you need to increase throughput, or anything like that? If the answer to those questions isn't "yes", then that isn't bad. > Would it help to periodicaly clean disk more? Sure. Reformat it occasionally, and you will get less fragmentation. :) [...] > I also tested defragmenting Berkeley DB files using "cp -rp db db.new": > the other db improved quite a lot, but for other it did not help much. This works as long as suitable extents can be found; they can't always. The less full the filesystem is, the better this works. [...] > Many disk I have, are mainly for archive purposes, thus a file is stored > there once (one at time) and the file will stay there as long as disk > works, only read every now and then. I think in that use the > fragmentation does not have a lot? Correct. > Maybe only the few last files are fragmented and have lower > performance. Files written later, after the disk is already mostly full, will tend to have more fragmentation, because they have less chance of finding a large enough contiguous chunk of disk. Also, files that are written slowly tend toward fragmentation, since the kernel may not predict the final size well, so may put them in too small an area initially. Again, the "problem" part of your question depends on use: unless you actually want to improve performance, don't bother about fragmentation. Regards, Daniel -- Most American television stations reproduce all night long what only a Roman could have seen in the Coliseum during the reign of Nero. -- George Faludy From adilger at clusterfs.com Fri Mar 4 18:14:17 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Fri, 4 Mar 2005 11:14:17 -0700 Subject: ext2online difficulty In-Reply-To: <42289A2D.3000007@veritas.com> References: <42289A2D.3000007@veritas.com> Message-ID: <20050304181417.GS27352@schnapps.adilger.int> On Mar 04, 2005 17:26 +0000, Jonathan Purcell wrote: > I am having some trouble using the ext2online utility, I have reduced > the problem down to its simplist form, and it goes soemthing like this: > > [root at blah ~]# mke2fs /dev/sdb1 1048576 > [root at blah ~]# mount /dev/sdb1 /mnt > [root at blah ~]# ext2online /dev/sdb1 > ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b > ext2online: ext2_ioctl: Inappropriate ioctl for device > > ext2online: unable to resize /dev/sdb1 There is a bit of a misnomer - the online resize support only exists for ext3 and not ext2. If you do "mke2fs -j /dev/sdb1 1048576" it should work. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From tytso at mit.edu Fri Mar 4 19:08:44 2005 From: tytso at mit.edu (Theodore Ts'o) Date: Fri, 4 Mar 2005 14:08:44 -0500 Subject: searching for ext3 defrag/file move program In-Reply-To: <200503040833.03943.philipp.marek@bmlv.gv.at> References: <200503021314.01396.philipp.marek@bmlv.gv.at> <20050303161041.GB10315@thunk.org> <200503040833.03943.philipp.marek@bmlv.gv.at> Message-ID: <20050304190844.GA8880@thunk.org> On Fri, Mar 04, 2005 at 08:33:03AM +0100, Ph. Marek wrote: > How about a program that moves just the file's data to the start of the disk? > AFAIK it doesn't work just to copy the files - you won't get them copied to a > defined place, they'll end up in the various groups. Moving the file's data to the beginning of a disk doesn't necessarily solve the fragmentation problem; in fact FAT filesystems are much more succeptible to fragmentation *because* they only store file blocks starting at the beginning of the disk. >I read that there was a patch for in-kernel-movement of blocks; what >has happened to it? (Do we want reiserfs, xfs, whatever to have this >advantage, and ext2/3 left behind?) I'm not aware of any other Linux filesystem having this capability. As I said, there is far more interest in trying to prevent fragmentation in the first place. The reservation changes and the delayed allocation patches are more examples of changes that try to prevent fragmentation. The goal is for us to be better than the Microsoft FAT filesystem, not to create a defragger just because that's what people who are used to MS-DOS want to be able to find.... > Any idea how to make the startup process faster? If you're trying to solve the "speed up the boot time" problem, that's actually a different problem than defragmentation. A number of people have been looking at it, and a huge part of the problem is simply that we're doing _too_ _much_ before the login prompt. Why is it that that we have to start the cups daemon and the ntp daemon, and the apache server, etc., before firing up X and throwing up the login window? Other solutions that attack this problem, especially for laptops, but also for desktops, is to make suspend-to-ram and suspend-to-disk faster and more reliable, so that you aren't rebooting as much in the first place. But even ignoring the suspend solution, there's an awful lot of wasted time and effort in the boot process that needs to be optimized out before trying to play filesystem block placement games would be the most effective way to speed up the time before the user can start doing useful work. - Ted From joerg at alea.gnuu.de Fri Mar 4 10:51:51 2005 From: joerg at alea.gnuu.de (=?iso-8859-1?Q?J=F6rg?= Sommer) Date: Fri, 4 Mar 2005 11:51:51 +0100 Subject: Failures they e2fsck doesn't find In-Reply-To: <20050302164502.GA15671@thunk.org> References: <20050204164527.GA24817@alea.gnuu.de> <20050208180547.GH2635@schnapps.adilger.int> <20050208234449.GA8405@alea.gnuu.de> <20050209182657.GO2635@schnapps.adilger.int> <20050210134003.GA14472@alea.gnuu.de> <20050210172341.GS2635@schnapps.adilger.int> <20050213212639.GA24996@alea.gnuu.de> <20050302164502.GA15671@thunk.org> Message-ID: <20050304105151.GA4454@alea.gnuu.de> Theodore Ts'o schrieb am Wed 02. Mar, 11:45 (-0500) : > Joerg, > > How big is the filesystem? Would you be willing to send me > the output of e2image -r /dev/hdXX - | bzip2 > hdXX.img.gz? Many > thanks!! OK, I've created the file and it is 13MB. I thought it is a full image of my partition. I send you the file per mail. A problem: I've remove the dir_index after I found out it was the problem. But I didn't touch the directory with the problems. I hope nothing got lost. J?rg. -- Unsere Zweifel sind Verr?ter und oft genug verspielen wir den m?glichen Gewinn, weil wir den Versuch nicht wagen. From jp at enix.org Thu Mar 10 11:10:44 2005 From: jp at enix.org (=?ISO-8859-1?Q?J=E9r=F4me_Petazzoni?=) Date: Thu, 10 Mar 2005 12:10:44 +0100 Subject: a few questions about ext3 journal Message-ID: <42302B34.5090107@enix.org> A few wild ideas/questions : 1) Is there a way to check the size of the journal of an ext3 filesystem ? I mean - the actually used size ; not the total size of the journal. 2) Would it be difficult to implement "freeze" of ext3 filesystem - that is, blocking all I/O to the filesystem until it's "unfrozen" (XFS can do that), for two purposes : A/ allowing "freezing" in a clean state, to allow clean snapshotting B/ allowing "freezing" while moving a SCSI disk or a network-connected disk without umounting filesystem A/ would require some work at the FS layer I guess, but B/ might be doable at the devicemapper layer or something like that. 3) Is it possible to allow data to stay in the journal for a very long time ? Rationale : for laptops with a lot of memory and some solid-state memory, this would allow to shutdown the hard disk (if all read data is in the cache, and all written data goes to the log on the solid-state disk). From evilninja at gmx.net Fri Mar 11 15:04:32 2005 From: evilninja at gmx.net (Christian) Date: Fri, 11 Mar 2005 16:04:32 +0100 Subject: a few questions about ext3 journal In-Reply-To: <42302B34.5090107@enix.org> References: <42302B34.5090107@enix.org> Message-ID: <4231B380.3060209@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 J?r?me Petazzoni wrote: > A few wild ideas/questions : > > 1) Is there a way to check the size of the journal of an ext3 filesystem ? > I mean - the actually used size ; not the total size of the journal. perhaps "logdump -ac" (within debugfs) will help - i you can tell from its output what parts are "used". > 2) Would it be difficult to implement "freeze" of ext3 filesystem - that > is, blocking all I/O to the filesystem until it's "unfrozen" (XFS can do > that), for two purposes : > A/ allowing "freezing" in a clean state, to allow clean snapshotting would "remount,ro" be sufficient? > B/ allowing "freezing" while moving a SCSI disk or a network-connected > disk without umounting filesystem err, "unplug the cable without unmounting the filesystem"?? you'd have to hold the entire fs in ram for the "move" or i don't understand what you mean. > 3) Is it possible to allow data to stay in the journal for a very long > time ? > Rationale : for laptops with a lot of memory and some solid-state > memory, this would allow to shutdown the hard disk (if all read data is > in the cache, and all written data goes to the log on the solid-state > disk). the only tuneable which comes to my mind right now is the "commit" paramater for mount: commit=nrsec Sync all data and metadata every nrsec seconds. The default value is 5 seconds. Zero means default. and there are the laptop-mode-tools [1] using some kernel hacks to spin down disks and continue working. Christian. [1] http://www.xs4all.nl/~bsamwel/laptop_mode/ - -- BOFH excuse #375: Root name servers corrupted. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCMbOAC/PVm5+NVoYRAnYcAKCmAxFY2f9D+OepVXHj4PbYbX8amACgmlcn QNAcp1eUHkFxr7qv38RZmvA= =1DzE -----END PGP SIGNATURE----- From adilger at clusterfs.com Fri Mar 11 06:08:23 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Fri, 11 Mar 2005 01:08:23 -0500 Subject: a few questions about ext3 journal In-Reply-To: <42302B34.5090107@enix.org> References: <42302B34.5090107@enix.org> Message-ID: <20050311060823.GB1638@schnapps.adilger.int> On Mar 10, 2005 12:10 +0100, J?r?me Petazzoni wrote: > 1) Is there a way to check the size of the journal of an ext3 filesystem ? > I mean - the actually used size ; not the total size of the journal. There is no current statistics on any journal usage (though it would be nice to have this). Knowing how much space there currently is in the journal, some sort of average of the free journal space (e.g. abs(head-tail) as each new handle started), how often the journal was totally full and had to be flushed, etc. This would go a long way to telling a user and the ext3 developers how large a journal is needed under their workload. Currently Lustre just creates very large (400MB) journals on all of the filesystems because we know that a large journal improves the performance dramatically, but we have never done the trial+error approach of finding the "optimal" size. > 2) Would it be difficult to implement "freeze" of ext3 filesystem - that > is, blocking all I/O to the filesystem until it's "unfrozen" (XFS can do > that), for two purposes : > A/ allowing "freezing" in a clean state, to allow clean snapshotting > B/ allowing "freezing" while moving a SCSI disk or a network-connected > disk without umounting filesystem This is already done, and is used by the LVM/device mapper subsystem to do snapshots of the filesystem. However, I'm not sure if there is a user-space API to access this. > 3) Is it possible to allow data to stay in the journal for a very long > time ? > Rationale : for laptops with a lot of memory and some solid-state > memory, this would allow to shutdown the hard disk (if all read data is > in the cache, and all written data goes to the log on the solid-state disk). Yes, this can be done (I think) by tuning the journal flush time and having a large enough journal to avoid filling it up. However, I don't think this would be practical because the only common way to do this would be e.g. flash memory and the heavy usage of the journal would quickly wear out such devices, and it would also be slow. Cheers, Andreas -- Andreas Dilger http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ From benadler at gmx.net Mon Mar 14 16:49:43 2005 From: benadler at gmx.net (Benjamin Adler) Date: Mon, 14 Mar 2005 17:49:43 +0100 Subject: ext3 filesystem corrupt. Which files are affected? Message-ID: <200503141649.j2EGniNT027758@mx1.redhat.com> Hi! I am using a 377GB ext3-filesystem with evms, spanned over two disks. This filesystem has worked for more than two years without any problems, it stores around 30.000 images with sizes between 50kb and 200mb. Recently, I noticed that images started to disappear. A fsck.ext3 (which I've unfortunately never run before) revealed a lot of problems, which I repaired. A log is attached. Ext3 is journaled, so this shouldn't have happened, right? Is it normal that this happens every couple of years? Maybe a hardware problem? After fsck, can I now trust that the filesystem really is in a consistent state? Does fsck find ALL possible errors? I do have backups. But since I've been adding and editing and renaming files all the time and some other files were silently corrupted and vanished, I do not know which files are missing now. I cannot just restore an old backup, since I might overwrite edited files etc. So, I have to know WHICH IMAGES were affected by this corruption. Is it possible to see which images were affected by looking at the fsck-output? If yes, please tell me how to do this! AIX seems to have a command named "ncheck" that can output the filename for a given inode-number. Does this exist for ext? Are the numbers in my fsck-log inode-numbers? Thanks for your help! Ben Adler -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fsck repair 2005-03-03.txt URL: From nitin2ahuja at myrealbox.com Tue Mar 15 09:14:18 2005 From: nitin2ahuja at myrealbox.com (nitin ahuja) Date: Tue, 15 Mar 2005 02:14:18 -0700 Subject: a few questions about ext3 journal Message-ID: <1110878058.8c427a5cnitin2ahuja@myrealbox.com> Hi folks, > 2) Would it be difficult to implement "freeze" of > ext3 filesystem - that is, blocking all I/O to the > filesystem until it's "unfrozen" (XFS can do > that), for two purposes : > A/ allowing "freezing" in a clean state, to allow clean snapshotting > B/ allowing "freezing" while moving a SCSI disk or a network-connected > disk without umounting filesystem > This is already done, and is used by the LVM/device > mapper subsystem to do snapshots of the filesystem. > However, I'm not sure if there is a user-space API > to access this. Yes, there exists a function "freeze_bdev()" in fs/buffer.c which freezes the file system on the specified block device without unmounting. If the file system is "ext3" then, it calls journal_lock_updates()" to ensure that no more transactions take place. "thaw_bdev()" is its counterpart to continue operations. You can provide an ioctl call in fs/ext3/ioctl.c which will look like : { sb = freeze_bdev(bdev); /* do your stuff */ thaw_bdev(bdev, sb); return 0; } >From user land you can always call this ioctl routine. > 3) Is it possible to allow data to stay in the > journal for a very long time ? > Yes, this can be done (I think) by tuning the journal > flush time and having a large enough journal to avoid > filling it up. However, I don't think this would be > practical because the only common way to do > this would be e.g. flash memory and the heavy usage > of the journal would quickly wear out such devices, > and it would also be slow. This ca be done by changing the commit interval of the journaling thread viz. "kjournald". By default it is 5 seconds but you can change its value by changing JBD_DEFAULT_MAX_COMMIT_AGE. But, if a inode is being used as journal log then, there are chances of journal running out of blocks. So it is better to experiment this with an auxiliary device for external journal log. - Nitin From tim at transtech.net.au Tue Mar 15 22:45:06 2005 From: tim at transtech.net.au (Tim Allen) Date: Wed, 16 Mar 2005 09:45:06 +1100 Subject: unattended reboot/fsck Message-ID: <1110926706.3379.12.camel@vic-ash.transtech.net.au> Hi, We've got some units in client's vehicles which are running Fedora core 1. We've can log into them over ssh remotely there is no console attached to them. I suspect one of them has some filesystem corruption, and I'd like to both force a fsck at next reboot (which I think I can do with shutdown -F) but I'd also like to make this fsck not require any human intervention. In particular, I am concerned about the case where fsck decides that it need manual intervention and requests you log in for maintainance. How can I ensure a non-user interaction fsck that will boot normally (and hence put the box back into a state where a gprs connection is re-established and I can log in again.) Thanks in advance, Tim Allen. (please CC me because I'm not on the list.) From tytso at mit.edu Wed Mar 16 13:30:25 2005 From: tytso at mit.edu (Theodore Ts'o) Date: Wed, 16 Mar 2005 08:30:25 -0500 Subject: unattended reboot/fsck In-Reply-To: <1110926706.3379.12.camel@vic-ash.transtech.net.au> References: <1110926706.3379.12.camel@vic-ash.transtech.net.au> Message-ID: <20050316133025.GA28515@thunk.org> On Wed, Mar 16, 2005 at 09:45:06AM +1100, Tim Allen wrote: > We've got some units in client's vehicles which are running Fedora core > 1. We've can log into them over ssh remotely there is no console > attached to them. I suspect one of them has some filesystem corruption, > and I'd like to both force a fsck at next reboot (which I think I can do > with shutdown -F) but I'd also like to make this fsck not require any > human intervention. > > In particular, I am concerned about the case where fsck decides that it > need manual intervention and requests you log in for maintainance. > > How can I ensure a non-user interaction fsck that will boot normally > (and hence put the box back into a state where a gprs connection is > re-established and I can log in again.) You can force the boot scripts to use the fsck -y option, but I'd also use the logsave program so you can see what fsck had to fix --- so if a system or appliction program/data file gets deleted, you can find out about it and fix it, for example: logsave -asv /var/log/fsck.log e2fsck -y /dev/hda1 (In fact, distributions should be encouraged to use logsave by default, since it means that any automatic fixes made by e2fsck during the boot process are saved in a log file for later analysis.) - Ted From evilninja at gmx.net Fri Mar 18 21:38:04 2005 From: evilninja at gmx.net (Christian) Date: Fri, 18 Mar 2005 22:38:04 +0100 (CET) Subject: unattended reboot/fsck In-Reply-To: <1110926706.3379.12.camel@vic-ash.transtech.net.au> References: <1110926706.3379.12.camel@vic-ash.transtech.net.au> Message-ID: <56555.195.126.66.126.1111181884.squirrel@housecafe.dyndns.org> On Tue, March 15, 2005 23:45, Tim Allen said: > with shutdown -F) but I'd also like to make this fsck not require any > human intervention. if it's not "/" what is to be checked, then simply unmount the partition, or mount it readonly and fsck it while you're logged in. if it is "/" you can only mount it readonly and try to fsck, but be careful letting fsck "fix" things to quickly. unmounting is really the better choice. perhaps you can do some nasty chroot tricks to unmount the "real /" on a running system, but i've never done that. another solution: setup a new partition (50M) where you populate a working "/" directory, reboot with "root=/dev/new_partition", then you can check the real "/". Christian. -- make bzImage, not war From nitind at pobox.com Sat Mar 19 10:44:48 2005 From: nitind at pobox.com (Nitin Dahyabhai) Date: Sat, 19 Mar 2005 05:44:48 -0500 Subject: Ext3 Journal corruption on hitachi deskstars In-Reply-To: <420956AC.8010502@gmx.net> References: <420956AC.8010502@gmx.net> Message-ID: <20050319104448.GA10993@servo.mazolama.net> On Wed, Feb 09, 2005 at 01:17:48AM +0100, Christian wrote: > > maybe you can elaborate a bit more on the "corrupted journals": what does > "fsck" say, what's in the kernel log (during mount). if we know the > symptoms, perhaps someone can find the root of the problem... > I'm seeing the same behavior, but after only a few hours under heavy load and also with two new Hitachi SATA drives, showing as sda and sdb. System is Fedora Core 3 running 2.6.10-1.770_FC3. I had to use the "irqpoll" kernel option to not lock hard when the sata driver loads. >From /var/log/dmesg: SCSI subsystem initialized libata version 1.10 loaded. sata_sil version 0.8 ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11 PCI: setting IRQ 11 as level-triggered ACPI: PCI interrupt 0000:00:11.0[A] -> GSI 11 (level, low) -> IRQ 11 ata1: SATA max UDMA/100 cmd 0xE083A080 ctl 0xE083A08A bmdma 0xE083A000 irq 11 ata2: SATA max UDMA/100 cmd 0xE083A0C0 ctl 0xE083A0CA bmdma 0xE083A008 irq 11 irq 11: nobody cared (try booting with the "irqpoll" option. [] __report_bad_irq+0x2b/0x68 [] note_interrupt+0x73/0x96 [] __do_IRQ+0x1bd/0x249 [] do_IRQ+0x5e/0x7a ======================= [] common_interrupt+0x1a/0x20 [] __do_softirq+0x2c/0x79 [] do_softirq+0x38/0x3f ======================= [] do_IRQ+0x70/0x7a [] common_interrupt+0x1a/0x20 [] acpi_processor_idle+0xf1/0x1f6 [] cpu_idle+0x1f/0x34 [] start_kernel+0x16b/0x16d handlers: [] (ata_interrupt+0x0/0x210 [libata]) Disabling IRQ #11 ata1: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e8 86:3c02 87:4023 88:203f ata1: dev 0 ATA, max UDMA/100, 488397168 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e8 86:3c02 87:4023 88:203f ata2: dev 0 ATA, max UDMA/100, 488397168 sectors: lba48 ata2: dev 0 configured for UDMA/100 scsi1 : sata_sil Vendor: ATA Model: HDS722525VLSA80 Rev: V36O Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: HDS722525VLSA80 Rev: V36O Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sdb: drive cache: write back SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sdb: drive cache: write back sdb: sdb1 Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 In the past 6 hours, I've recorded the following (grepped from dmesg with -i ext3): EXT3-fs error (device sdb1): ext3_new_block: Allocating block in system zone - block = 2588673 EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sdb1) in ext3_prepare_write: Journal has aborted ext3_abort called. EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal EXT3-fs error (device sdb1) in start_transaction: Journal has aborted EXT3-fs warning (device sdb1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure EXT3-fs warning (device sdb1): ext3_clear_journal_err: Marking fs in need of filesystem check. EXT3-fs warning: mounting fs with errors, running e2fsck is recommended EXT3 FS on sdb1, internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with journal data mode. EXT3-fs warning (device sdb1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure EXT3-fs warning (device sdb1): ext3_clear_journal_err: Marking fs in need of filesystem check. EXT3-fs warning: mounting fs with errors, running e2fsck is recommended EXT3 FS on sdb1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #7618561: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=58182, name_len=139 ext3_abort called. EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #7618561: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=58182, name_len=139 EXT3-fs error (device sda1) in start_transaction: Journal has aborted EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #7618561: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=58182, name_len=139 EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #7618561: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=58182, name_len=139 EXT3 FS on sda1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3 FS on hdg1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3 FS on hdh1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3 FS on sdb1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3-fs error (device sda1): ext3_free_blocks_sb: bit already cleared for block 15261701 EXT3-fs error (device sda1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sda1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sda1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sda1) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sda1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sda1) in ext3_truncate: Journal has aborted EXT3-fs error (device sda1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sda1) in ext3_orphan_del: Journal has aborted EXT3-fs error (device sda1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sda1) in ext3_delete_inode: Journal has aborted ext3_abort called. EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal EXT3 FS on sda1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3 FS on hdg1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3 FS on hdh1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3 FS on sdb1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3-fs error (device sdb1): ext3_add_entry: bad entry in directory #1982465: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=46658, name_len=117 ext3_abort called. EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal EXT3-fs error (device sdb1) in start_transaction: Journal has aborted EXT3-fs error (device sdb1) in ext3_create: IO failure EXT3 FS on sdb1, internal journal EXT3-fs: mounted filesystem with journal data mode. EXT3-fs error (device sdb1): ext3_new_block: Allocating block in system zone - block = 19431424 EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sdb1) in ext3_prepare_write: Journal has aborted ext3_abort called. EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal EXT3-fs error (device sdb1) in start_transaction: Journal has aborted EXT3-fs warning (device sdb1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure EXT3-fs warning (device sdb1): ext3_clear_journal_err: Marking fs in need of filesystem check. EXT3-fs warning: mounting fs with errors, running e2fsck is recommended EXT3 FS on sdb1, internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with journal data mode. EXT3 FS on hdf1, internal journal EXT3-fs: mounted filesystem with ordered data mode. EXT3 FS on sda1, internal journal EXT3-fs: mounted filesystem with ordered data mode. EXT3 FS on hdg1, internal journal EXT3-fs: mounted filesystem with ordered data mode. EXT3 FS on hdh1, internal journal EXT3-fs: mounted filesystem with ordered data mode. EXT3-fs warning (device sdb1): ext3_clear_journal_err: Filesystem error recorded from previous mount: error -87241522 EXT3-fs warning (device sdb1): ext3_clear_journal_err: Marking fs in need of filesystem check. EXT3-fs warning: mounting fs with errors, running e2fsck is recommended EXT3 FS on sdb1, internal journal EXT3-fs: mounted filesystem with ordered data mode. fsck was giving me more output and showing more errors earlier, but now it is unable to fully repair the FS and every run just reports block bitmap differences: root at servo:~$ fsck -fy /dev/sdb1 fsck 1.35 (28-Feb-2004) e2fsck 1.35 (28-Feb-2004) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: +(3966976--3966983) +(55412736--55412739) +55412743 +(55449602--55449603) +(55449606--55449607) Fix? yes /dev/sdb1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sdb1: 343/30539776 files (2.3% non-contiguous), 25691648/61049000 blocks root at servo:~$ fsck -fy /dev/sdb1 fsck 1.35 (28-Feb-2004) e2fsck 1.35 (28-Feb-2004) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: +(3966976--3966983) +(55449600--55449607) Fix? yes /dev/sdb1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sdb1: 343/30539776 files (2.3% non-contiguous), 25691648/61049000 blocks Any ideas? On an unrelated note, is the irqpoll option the cause of this oft-repeated message? Mar 19 05:38:58 servo kernel: hdc: cdrom_pc_intr: The drive appears confused (ireason = 0x01) --- Nitin Dahyabhai From brian.blow at mymail.champlain.edu Sat Mar 19 15:26:36 2005 From: brian.blow at mymail.champlain.edu (Blow, Brian) Date: Sat, 19 Mar 2005 10:26:36 -0500 Subject: EXT3 information Message-ID: <6539289D6E8F3C489929AF4C39D526C8F1CB4C@MyMailAD.champlain.edu> Good Day I am writing a paper comparing the features of NTFS and the EXT3 file systems. Could anyone point me to any websites where I could get more information on EXT3? I have performed the normal google searches and have not been able to get any solid information on the features of EXT3. Any help would be appreciated. Thank you Brian A. Blow From hans at picht.org Tue Mar 22 20:43:25 2005 From: hans at picht.org (Hans-Joachim Picht) Date: Tue, 22 Mar 2005 21:43:25 +0100 Subject: ext2fs_read_bb_inode: Invalid argument && Can't read an block bitmap Message-ID: <20050322204325.GA20896@picht.org> Hello, sorry if the question was asked a couple of times before but I couldn't find any useful hinds to this problem using google and the listman search-engine of the archive of this list. Somehow the ext3 filesystem on one of my machines died an after a reboot grub wouldn't come up again. What I did so far: fsck -y /dev/hda4 fsck 1.35 (28-Feb-2004) e2fsck 1.35 (28-Feb-2004) Group descriptors look bad... trying backup blocks... Block bitmap for group 0 is not in group. (block 2553887680) Relocate? yes Inode bitmap for group 0 is not in group. (block 16777216) Relocate? yes Inode table for group 0 is not in group. (block 2238581760) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? yes [ more like this ] Block bitmap for group 127 is not in group. (block 0) Relocate? yes Inode bitmap for group 127 is not in group. (block 0) Relocate? yes Inode table for group 127 is not in group. (block 0) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? yes fsck.ext2: e2fsck_read_bitmaps: illegal bitmap block(s) for /dev/hda4 dd if=/dev/hda of=/dev/sda conv=noerror,sync dumpe2fs /dev/sda4 dumpe2fs 1.35 (28-Feb-2004) ext2fs_read_bb_inode: Invalid argument Filesystem volume name: Last mounted on: Filesystem UUID: a300ff1d-5cf0-4d8a-9336-ae39bc64ed95 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal filetype sparse_super large_file Default mount options: (none) Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 7274496 Block count: 14542841 Reserved block count: 727142 Free blocks: 1454431 Free inodes: 4906461 First block: 0 Block size: 4096 Fragment size: 4096 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16384 Inode blocks per group: 512 Last mount time: Thu Mar 17 09:17:22 2005 Last write time: Tue Mar 22 18:11:10 2005 Mount count: 35 Maximum mount count: 100 Last checked: Thu Mar 3 09:12:04 2005 Check interval: 0 () Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 First orphan inode: 775060 Journal backup: inode blocks Group 0: (Blocks 0-32767) Primary superblock at 0, Group descriptors at 1-4 Block bitmap at 2553887680, Inode bitmap at 16777216 (+16777216) Inode table at 2238581760-2238582271 5376 free blocks, 4096 free inodes, 0 directories Group 1: (Blocks 32768-65535) Backup superblock at 32768, Group descriptors at 32769-32772 Block bitmap at 0, Inode bitmap at 822088960 (+822056192) Inode table at 33554432-33554943 (+33521664) 20480 free blocks, 6016 free inodes, 0 directories Group 2: (Blocks 65536-98303) Block bitmap at 33554432 (+33488896), Inode bitmap at 2789758720 Inode table at 33554432-33554943 (+33488896) [...] Group 443: (Blocks 14516224-14542840) Block bitmap at 14516224 (+0), Inode bitmap at 14516225 (+1) Inode table at 14516231-14516742 (+7) 20485 free blocks, 16165 free inodes, 0 directories dumpe2fs: /dev/sda4: error reading bitmaps: Can't read an block bitmap dumpe2fs -b /dev/sda4 dumpe2fs 1.35 (28-Feb-2004) ext2fs_read_bb_inode: Invalid argument Any hinds what to do? With best regards, Hans-Joachim Picht -- Picht Consulting http://www.picht.org Hans-Joachim Picht hans at picht.org Eichgaertenallee 88 Tel. +49 (0)641 41 600 35394 Giessen, Germany Fax. +49 (0)641 41 300 From kewlemer at gmail.com Wed Mar 23 09:46:17 2005 From: kewlemer at gmail.com (kewlemer at gmail.com) Date: Wed, 23 Mar 2005 01:46:17 -0800 Subject: Changes from 2.4 to 2.6 Message-ID: <79cbe67505032301461376f36a@mail.gmail.com> Hello All, I am trying to port an Ext3 utility that was implemented for the 2.4 kernel to 2.6. Having searched the archives, I was unable to find a thread that discussed the Ext3 changes from 2.4 to 2.6. So can anyone please shed some light on this please ? Suggestions on the best way to go about porting are also welcome since I am a newbie. Thanks! KM From hans at picht.org Wed Mar 23 22:12:25 2005 From: hans at picht.org (Hans-Joachim Picht) Date: Wed, 23 Mar 2005 23:12:25 +0100 Subject: ext2fs_read_bb_inode: Invalid argument && Can't read an block bitmap In-Reply-To: <20050322204325.GA20896@picht.org> References: <20050322204325.GA20896@picht.org> Message-ID: <20050323221224.GA27579@picht.org> On Tue, Mar 22, 2005 at 09:43:25PM +0100, Hans-Joachim Picht wrote: Hello, even if I reply to my own posting, I presuaded fsck to produce some probaly more helpful output which might help to find someone who gives an advice how to fix this > sorry if the question was asked a couple of times before but I couldn't > find any useful hinds to this problem using google and the listman > search-engine of the archive of this list. > > Somehow the ext3 filesystem on one of my machines died an after a reboot > grub wouldn't come up again. > > What I did so far: fsck -n /dev/hda4 resulted in fsck 1.35 (28-Feb-2004) e2fsck 1.35 (28-Feb-2004) Group descriptors look bad... trying backup blocks... Block bitmap for group 0 is not in group. (block 2553887680) Relocate? no Inode bitmap for group 0 is not in group. (block 16777216) Relocate? no Inode table for group 0 is not in group. (block 2238581760) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no [...] Block bitmap for group 127 is not in group. (block 0) Relocate? no Inode bitmap for group 127 is not in group. (block 0) Relocate? no Inode table for group 127 is not in group. (block 0) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no /dev/hda4 was not cleanly unmounted, check forced. Error reading block 2238581760 (Invalid argument). Ignore error? no fsck.ext3: Invalid argument while reading bad blocks inode This doesn't bode well, but we'll try to go on... Pass 1: Checking inodes, blocks, and sizes Illegal block number passed to ext2fs_test_block_bitmap #2238581760 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #2238581760 for in-use block map Illegal block number passed to ext2fs_test_block_bitmap #2238581761 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #2238581761 for in-use block map Illegal block number passed to ext2fs_test_block_bitmap #2238581762 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #2238581762 for in-use block map Illegal block number passed to ext2fs_test_block_bitmap #2238581763 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #2238581763 for in-use block map [...] Illegal block number passed to ext2fs_test_block_bitmap #167772670 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #167772670 for in-use block map Illegal block number passed to ext2fs_test_block_bitmap #167772671 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #167772671 for in-use block map Illegal block number passed to ext2fs_test_block_bitmap #33554432 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #33554432 for in-use block map Illegal block number passed to ext2fs_test_block_bitmap #2892119552 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #2892119552 for in-use block map Error reading block 2238581760 (Invalid argument) while doing inode scan. Ignore error? no Error reading block 2238581760 (Invalid argument) while doing inode scan. Ignore error? no Error while scanning inodes (0): Can't read next inode e2fsck: aborted With best regards, Hans-Joachim Picht -- Picht Consulting http://www.picht.org Hans-Joachim Picht hans at picht.org Eichgaertenallee 88 Tel. +49 (0)641 41 600 35394 Giessen, Germany Fax. +49 (0)641 41 300 From juhl-lkml at dif.dk Fri Mar 25 22:11:45 2005 From: juhl-lkml at dif.dk (Jesper Juhl) Date: Fri, 25 Mar 2005 23:11:45 +0100 (CET) Subject: [PATCH] kfree() NULL pointer cleanups - no need to check - fs/ext3/ Message-ID: kfree() handles NULL pointers fine - checking is redundant. Signed-off-by: Jesper Juhl --- linux-2.6.12-rc1-mm3-orig/fs/ext3/acl.c 2005-03-02 08:37:55.000000000 +0100 +++ linux-2.6.12-rc1-mm3/fs/ext3/acl.c 2005-03-25 22:41:41.000000000 +0100 @@ -197,8 +197,7 @@ ext3_get_acl(struct inode *inode, int ty acl = NULL; else acl = ERR_PTR(retval); - if (value) - kfree(value); + kfree(value); if (!IS_ERR(acl)) { switch(type) { @@ -267,8 +266,7 @@ ext3_set_acl(handle_t *handle, struct in error = ext3_xattr_set_handle(handle, inode, name_index, "", value, size, 0); - if (value) - kfree(value); + kfree(value); if (!error) { switch(type) { case ACL_TYPE_ACCESS: --- linux-2.6.12-rc1-mm3-orig/fs/ext3/super.c 2005-03-25 15:28:59.000000000 +0100 +++ linux-2.6.12-rc1-mm3/fs/ext3/super.c 2005-03-25 22:42:53.000000000 +0100 @@ -395,10 +395,8 @@ static void ext3_put_super (struct super percpu_counter_destroy(&sbi->s_dirs_counter); brelse(sbi->s_sbh); #ifdef CONFIG_QUOTA - for (i = 0; i < MAXQUOTAS; i++) { - if (sbi->s_qf_names[i]) - kfree(sbi->s_qf_names[i]); - } + for (i = 0; i < MAXQUOTAS; i++) + kfree(sbi->s_qf_names[i]); #endif /* Debugging code just in case the in-memory inode orphan list @@ -883,10 +881,8 @@ clear_qf_name: "quota turned on.\n"); return 0; } - if (sbi->s_qf_names[qtype]) { - kfree(sbi->s_qf_names[qtype]); - sbi->s_qf_names[qtype] = NULL; - } + kfree(sbi->s_qf_names[qtype]); + sbi->s_qf_names[qtype] = NULL; break; case Opt_jqfmt_vfsold: sbi->s_jquota_fmt = QFMT_VFS_OLD; From agruen at suse.de Tue Mar 29 13:54:30 2005 From: agruen at suse.de (Andreas Gruenbacher) Date: Tue, 29 Mar 2005 15:54:30 +0200 Subject: [PATCH] kfree() NULL pointer cleanups - no need to check - fs/ext3/ In-Reply-To: References: Message-ID: <200503291554.30497.agruen@suse.de> On Friday 25 March 2005 23:11, Jesper Juhl wrote: > kfree() handles NULL pointers fine - checking is redundant. Looks good. Can you also fix that in fs/ext2/acl.c? Thanks, -- Andreas Gruenbacher SUSE Labs, SUSE LINUX PRODUCTS GMBH From juhl-lkml at dif.dk Wed Mar 30 19:02:37 2005 From: juhl-lkml at dif.dk (Jesper Juhl) Date: Wed, 30 Mar 2005 21:02:37 +0200 (CEST) Subject: [PATCH] kfree() NULL pointer cleanups - no need to check - fs/ext3/ In-Reply-To: <200503291554.30497.agruen@suse.de> References: <200503291554.30497.agruen@suse.de> Message-ID: On Tue, 29 Mar 2005, Andreas Gruenbacher wrote: > On Friday 25 March 2005 23:11, Jesper Juhl wrote: > > kfree() handles NULL pointers fine - checking is redundant. > > Looks good. Can you also fix that in fs/ext2/acl.c? > No problem. I already send this patch to lkml earlier, but here it is again : Remove redundant NULL checks before kfree() in fs/ext2/acl.c Signed-off-by: Jesper Juhl --- linux-2.6.12-rc1-mm3-orig/fs/ext2/acl.c 2005-03-02 08:38:18.000000000 +0100 +++ linux-2.6.12-rc1-mm3/fs/ext2/acl.c 2005-03-25 22:41:07.000000000 +0100 @@ -194,8 +194,7 @@ ext2_get_acl(struct inode *inode, int ty acl = NULL; else acl = ERR_PTR(retval); - if (value) - kfree(value); + kfree(value); if (!IS_ERR(acl)) { switch(type) { @@ -262,8 +261,7 @@ ext2_set_acl(struct inode *inode, int ty error = ext2_xattr_set(inode, name_index, "", value, size, 0); - if (value) - kfree(value); + kfree(value); if (!error) { switch(type) { case ACL_TYPE_ACCESS: -- Jesper Juhl