From cchan at outblaze.com Sun May 2 15:38:42 2004 From: cchan at outblaze.com (Christopher Chan) Date: Sun, 02 May 2004 23:38:42 +0800 Subject: 2.6.5 and latest Fedora Core 1 kernels cannot handle files over 2.x GB? In-Reply-To: <4089B9B6.2000706@outblaze.com> References: <408876C1.80601@outblaze.com> <1082719383.2100.9.camel@sisko.scot.redhat.com> <40893D7A.4070706@outblaze.com> <1082738929.2100.23.camel@sisko.scot.redhat.com> <4089B9B6.2000706@outblaze.com> Message-ID: <40951602.2070403@outblaze.com> Christopher Chan wrote: > Stephen C. Tweedie wrote: > >> Hi, >> >> On Fri, 2004-04-23 at 16:59, Christopher Chan wrote: >> >> >>>> Your filesystem is corrupt. You need to run e2fsck to fix it up, and >>>> check the files against a backup. >>>> There's not enough information here to begin to diagnose _why_ they are >>>> corrupt, but on 2.4 systems it's bad hardware 99% of the time. >>>> "memtest86" is usually a good place to start. disk replacements solved the problem. Just FYI in case you do memtest and what not and still got no clue. From guolin at alexa.com Wed May 5 22:29:34 2004 From: guolin at alexa.com (Guolin Cheng) Date: Wed, 5 May 2004 15:29:34 -0700 Subject: recover data from failed hard drives on Fedora Core 1 Message-ID: <41089CB27BD8D24E8385C8003EDAF7ABBA493F@karl.alexa.com> Hi, I got a problem to recover data from Fedora core 1 hosts when hard drives fail. I know that the disks fail because there are error messages like the following logged in /var/log/messages. ..... arc144: Apr 24 12:52:53 arc144 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } arc144: Apr 24 12:52:53 arc144 kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90186647, sector=52432952 arc144: Apr 24 12:52:53 arc144 kernel: end_request: I/O error, dev 03:0b (hda), sector 52432952 ..... My question is: How to figure out the file|directory occupying the failed sector|LBAsect? If I can figure it out then I can skipped the files|directories since the failed files will sometimes bring the failed drive to completely inaccessible status on Fedora Core 1 hosts, which is quite different from my former Redhat 8.0. Another questions is, what is the exact difference between LABsect and sector in the above message? Can I find any helpful&complete info on ext2|ext3 internals? At least related to disk space allocation. Any suggestions are greatly appreciated. Thanks. --Guolin Cheng -------------- next part -------------- An HTML attachment was scrubbed... URL: From philip at texas.net Mon May 10 18:48:59 2004 From: philip at texas.net (Philip Molter) Date: Mon, 10 May 2004 13:48:59 -0500 Subject: EIO vs. ENOENT on disk failure Message-ID: <20040510184859.GI39626@staff.texas.net> I've got a filesystem-based JBOD setup. During testing, I failed out one of the drives and tried to access the filesystem on it. Here are the results I got: stat /disk succeeds open /disk for reading succeeds readdir (getdents) for /disk fails EIO open /disk/noexist for reading fails ENOENT open /disk/noexist for writing fails EIO open /disk/exists for reading fails ENOENT open /disk/exit for writing fails EIO open /disk/subdir/noexist for reading fails ENOENT open /disk/subdir/noexist for writing fails ENOENT open /disk/subdir/exist for reading fails ENOENT open /disk/subdir/exist for writing fails ENOENT I would expect every one of these to fail with an EIO given that the underlying disk is gone and that's the cause of the failure. What's the logic behind returning ENOENT for stats and opens when the disk isn't there? Thanks, Philip * Philip Molter * Texas.Net Internet * http://www.texas.net/ * philip at texas.net From maheshext3 at yahoo.com Thu May 13 02:09:40 2004 From: maheshext3 at yahoo.com (M K) Date: Wed, 12 May 2004 19:09:40 -0700 (PDT) Subject: EXT3 performance on Large (multi-TeraByte) RAID Message-ID: <20040513020940.67583.qmail@web61006.mail.yahoo.com> Has anyone experienced a significant degradation in ext3 performance when using it on a Multi-TeraByte RAID? As part of an experimental setup, I hooked up three 300GB drives and made an EXT3 RAID5 out of them, using the entire space one each drive, and started throwing a large number of files in the size-range 3KB to 50 KB. Then, I deleted the raid, and created a new one, but this time, I used only 3 Gigs from each drive (a very small RAID compared to the earlier one). After repeating the same test, a huge improvement in performance was see - hence, the question : does ext3 performance degrade significantly as the file system size increases? Thanks in Advance, MK --------------------------------- Do you Yahoo!? Yahoo! Movies - Buy advance tickets for 'Shrek 2' -------------- next part -------------- An HTML attachment was scrubbed... URL: From cpwright at cpwright.com Thu May 13 02:49:35 2004 From: cpwright at cpwright.com (Charles P. Wright) Date: Wed, 12 May 2004 22:49:35 -0400 Subject: EXT3 performance on Large (multi-TeraByte) RAID In-Reply-To: <20040513020940.67583.qmail@web61006.mail.yahoo.com> References: <20040513020940.67583.qmail@web61006.mail.yahoo.com> Message-ID: <1084416574.3098.5.camel@arcticfox.foo> This can easily be explained by seek time. If you have a 3GB partition on a 300GB disk, you are only using 1% of the surface of your disk. During the test on a smaller partition, head doesn't have to move as far as it does with the larger partition. Charles On Wed, 2004-05-12 at 22:09, M K wrote: > Has anyone experienced a significant degradation in ext3 performance > when using it on a Multi-TeraByte RAID? As part of an experimental > setup, I hooked up three 300GB drives and made an EXT3 RAID5 out of > them, using the entire space one each drive, and started throwing a > large number of files in the size-range 3KB to 50 KB. Then, I deleted > the raid, and created a new one, but this time, I used only 3 Gigs > from each drive (a very small RAID compared to the earlier one). After > repeating the same test, a huge improvement in performance was see - > hence, the question : does ext3 performance degrade significantly as > the file system size increases? > > Thanks in Advance, > MK > > > ______________________________________________________________________ > Do you Yahoo!? > Yahoo! Movies - Buy advance tickets for 'Shrek 2' > > ______________________________________________________________________ > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From adilger at clusterfs.com Thu May 13 06:08:33 2004 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 13 May 2004 00:08:33 -0600 Subject: EXT3 performance on Large (multi-TeraByte) RAID In-Reply-To: <20040513020940.67583.qmail@web61006.mail.yahoo.com> References: <20040513020940.67583.qmail@web61006.mail.yahoo.com> Message-ID: <20040513060833.GX9641@schnapps.adilger.int> On May 12, 2004 19:09 -0700, M K wrote: > Has anyone experienced a significant degradation in ext3 performance > when using it on a Multi-TeraByte RAID? As part of an experimental setup, > I hooked up three 300GB drives and made an EXT3 RAID5 out of them, using > the entire space one each drive, and started throwing a large number > of files in the size-range 3KB to 50 KB. Then, I deleted the raid, and > created a new one, but this time, I used only 3 Gigs from each drive (a > very small RAID compared to the earlier one). After repeating the same > test, a huge improvement in performance was see - hence, the question: > does ext3 performance degrade significantly as the file system size > increases? Are you using 2.4 or 2.6 kernels? In the 2.4 kernel files are allocated evenly across all of the filesystem space. However, for large filesystems this is not very effective. In 2.6 kernels the Orlov allocator will keep files created by the same process (e.g. tar) in a single group if possible, which localizes block allocation and avoids seeks. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ From lists at luko.org Thu May 13 06:27:41 2004 From: lists at luko.org (Luke Rosenthal) Date: Thu, 13 May 2004 16:27:41 +1000 (EST) Subject: EXT3 performance on Large (multi-TeraByte) RAID In-Reply-To: <6.0.2.0.0.20040506232706.03bb40e0@192.168.15.1> Message-ID: On Thu, 13 May 2004, Andreas Dilger wrote: > Are you using 2.4 or 2.6 kernels? In the 2.4 kernel files are allocated > evenly across all of the filesystem space. However, for large > filesystems this is not very effective. In 2.6 kernels the Orlov > allocator will keep uh oh. I've been lurking on this list for some time, trying to learn as much about ext3 as possible. This looks bad. Can I ask for some advice? Say I had a 45gb disk, with a 30gb ext3 partition at the end on which I had some important stuff stored. I remove all partitions on the disk, create one large ext3 partition and begin filling the disk with data. I realise my mistake after about 8gb was written. Is it too late to restore about 300mb of JPG files from this disk? Do writes happen sequentially, or are they scattered all over the place? If they are scattered, would they have hosed the data completely or can parts be recovered? Luke. From maheshext3 at yahoo.com Thu May 13 12:10:51 2004 From: maheshext3 at yahoo.com (M K) Date: Thu, 13 May 2004 05:10:51 -0700 (PDT) Subject: EXT3 performance on Large (multi-TeraByte) RAID In-Reply-To: <1084416574.3098.5.camel@arcticfox.foo> Message-ID: <20040513121051.61183.qmail@web61002.mail.yahoo.com> Oh I see.. Thanks.. So HDD's RPM and on-drive buffer would matter a lot on large RAIDs then? is there a way to tune the file system to minimise this impact ? Again, Thanks in advance! MK "Charles P. Wright" wrote: This can easily be explained by seek time. If you have a 3GB partition on a 300GB disk, you are only using 1% of the surface of your disk. During the test on a smaller partition, head doesn't have to move as far as it does with the larger partition. Charles On Wed, 2004-05-12 at 22:09, M K wrote: > Has anyone experienced a significant degradation in ext3 performance > when using it on a Multi-TeraByte RAID? As part of an experimental > setup, I hooked up three 300GB drives and made an EXT3 RAID5 out of > them, using the entire space one each drive, and started throwing a > large number of files in the size-range 3KB to 50 KB. Then, I deleted > the raid, and created a new one, but this time, I used only 3 Gigs > from each drive (a very small RAID compared to the earlier one). After > repeating the same test, a huge improvement in performance was see - > hence, the question : does ext3 performance degrade significantly as > the file system size increases? > > Thanks in Advance, > MK > > > ______________________________________________________________________ > Do you Yahoo!? > Yahoo! Movies - Buy advance tickets for 'Shrek 2' > > ______________________________________________________________________ > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users --------------------------------- Do you Yahoo!? Yahoo! Movies - Buy advance tickets for 'Shrek 2' -------------- next part -------------- An HTML attachment was scrubbed... URL: From maheshext3 at yahoo.com Thu May 13 13:02:51 2004 From: maheshext3 at yahoo.com (M K) Date: Thu, 13 May 2004 06:02:51 -0700 (PDT) Subject: Preferable bdflush values for EXT3 performance on Large (multi-TeraByte) RAID In-Reply-To: <20040513121051.61183.qmail@web61002.mail.yahoo.com> Message-ID: <20040513130251.23879.qmail@web61003.mail.yahoo.com> Sorry I forgot to ask: are there any values for bdflush that work better for large / very large ext3 partitions with a very large number of writes? M K wrote: Oh I see.. Thanks.. So HDD's RPM and on-drive buffer would matter a lot on large RAIDs then? is there a way to tune the file system to minimise this impact ? Again, Thanks in advance! MK "Charles P. Wright" wrote: This can easily be explained by seek time. If you have a 3GB partition on a 300GB disk, you are only using 1% of the surface of your disk. During the test on a smaller partition, head doesn't have to move as far as it does with the larger partition. Charles On Wed, 2004-05-12 at 22:09, M K wrote: > Has anyone experienced a significant degradation in ext3 performance > when using it on a Multi-TeraByte RAID? As part of an experimental > setup, I hooked up three 300GB drives and made an EXT3 RAID5 out of > them, using the entire space one each drive, and started throwing a > large number of files in the size-range 3KB to 50 KB. Then, I deleted > the raid, and created a new one, but this time, I used only 3 Gigs > from each drive (a very small RAID compared to the earlier one). After > repeating the same ! test, a huge improvement in performance was see - > hence, the question : does ext3 performance degrade significantly as > the file system size increases? > > Thanks in Advance, > MK > > > ______________________________________________________________________ > Do you Yahoo!? > Yahoo! Movies - Buy advance tickets for 'Shrek 2' > > ______________________________________________________________________ > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users --------------------------------- Do you Yahoo!? Yahoo! Movies - Buy advance tickets for 'Shrek 2' _______________________________________________ Ext3-users mailing list Ext3-users at redhat.com https://www.redhat.com/mailman/listinfo/ext3-users --------------------------------- Do you Yahoo!? Yahoo! Movies - Buy advance tickets for 'Shrek 2' -------------- next part -------------- An HTML attachment was scrubbed... URL: From paolo at php3.it Wed May 19 08:07:45 2004 From: paolo at php3.it (Paolo Dina) Date: Wed, 19 May 2004 10:07:45 +0200 Subject: cp weird behaviour, some copied files differ from original. Message-ID: <40AB15D1.9000104@php3.it> Hi. I know that for the problem that will follow many things other than ext3 are involved, like a kernel upgrade, a possible coreutils mis-behaviour and other, so I beg your pardon if I'm not in the right place ... Now the problem. I have met a trouble upgrading web/dns server running linux. Precisely, I have upgraded the kernel from 2.4.9 to 2.6.6 and the hard disk also, from 20Gb to 80Gb. Ok for the kernel, all is ok. Problems came out with hard disk upgrade. I followed the "Hard Disk Upgrade Mini How-To" (http://www.tldp.org/HOWTO/Hard-Disk-Upgrade/). After the copy of files from the old disk (ext3) to the new one (ext3 also, but with larger partitions) some problems arose. In fact the command to compare the two disks after the copy, find / -path /proc -prune -o -path /new-disk -prune -o -xtype f -exec cmp {} /new-disk{} \; tells that there are some differences in some files. Looking with vi I saw that it's true indeed.. Some characters are different! Example, in a log file, there is a line where a '1' become a 'y' in the word 'May', and so on. Manually replacing the "corrupted" file with the original one and running again the find command I have that other files differ. And running again, all seems to be ok.. The behaviour is quite umpredictable :\ I have copied some hundreds of thousand of files, and just a little percentage seems to be "damaged", but I need to know the cause of this fact. Can you help in some way? Thaks for any reply, Paolo P.S. I ran memtest and hard disk diagnostic, hardware results to be ok. From stevew at aui.com Thu May 20 20:40:45 2004 From: stevew at aui.com (Steve Watford) Date: Thu, 20 May 2004 16:40:45 -0400 Subject: HD Partition Lost?? Message-ID: <200405201640.45015.stevew@aui.com> Help! I am running FC1, with a maxtor 160GB HD installed. Drive is only about a month old. I had just finished moving data from an older HD that was making some funny sounds and throwing some random errors. All was well. Last night we lost power for several hours and 1 machine's UPS/shutdown system didn't power down the workstation. The other 7 in our office did fine. The power was out for several hours. Anyway, I now have an unbootable partition with quite bit of unbacked up data on it I am trying to recover. In the past, whenever we would have a disk problem I was always able to recover with tomsrt, e2fsck etc. Sometimes it took a little while, but I can't even get started on this one. I have been looking at lde but don't think I know enough of what I'm doing to make that work. During the booting of the machine it throws the following errors: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772, high=5, low=7009692, sector=0 end_request: I/O error, dev 03:03 (hda), sector 0 ... repeats 10 time kjournald starting. Commit interval 5 seconds I have commented the particular partition (/dev/hda3) out of the fstab file so it isn't even trying to mount it. When trying to mount it manually I get the usual: steve]# mount -t ext3 /dev/hda3 /data mount: wrong fs type, bad option, bad superblock on /dev/hda3, or too many mounted file systems e2fsck yields the following: steve]# /sbin/e2fsck /dev/hda3 e2fsck 1.34 (25-Jul-2003) /sbin/e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/hda3 Could this be a zero-length partition? [root at mercury steve]# /sbin/fdisk -l /dev/hda Disk /dev/hda: 163.9 GB, 163928604672 bytes 255 heads, 63 sectors/track, 19929 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 522 4192933+ b Win95 FAT32 /dev/hda2 523 535 104422+ 83 Linux /dev/hda3 5659 19929 114631807+ 83 Linux /dev/hda4 536 5658 41150497+ f Win95 Ext'd (LBA) /dev/hda5 536 3722 25599546 83 Linux /dev/hda6 3723 4359 5116671 83 Linux /dev/hda7 4360 4996 5116671 83 Linux /dev/hda8 4997 5123 1020096 83 Linux /dev/hda9 5124 5250 1020096 83 Linux /dev/hda10 5251 5377 1020096 83 Linux /dev/hda11 5378 5604 1823346 82 Linux swap /dev/hda12 5605 5649 361431 83 Linux /dev/hda13 5650 5657 64228+ 83 Linux Partition table entries are not in disk order ________________________________________________ We were set for the disk to mirror to another machine at 3 AM. It never made it. I really need to get to the data if at all possible. Any suggestions would be much appreciated. Steve From evilninja at gmx.net Thu May 20 22:22:52 2004 From: evilninja at gmx.net (evilninja) Date: Fri, 21 May 2004 00:22:52 +0200 Subject: HD Partition Lost?? In-Reply-To: <40AD2A62.1090102@g-house.de> References: <200405201640.45015.stevew@aui.com> <40AD2A62.1090102@g-house.de> Message-ID: <40AD2FBC.4030203@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Steve Watford schrieb: | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772, | high=5, | low=7009692, sector=0 | end_request: I/O error, dev 03:03 (hda), sector 0 | ... repeats 10 time | kjournald starting. Commit interval 5 seconds hardware errors :-( try at least to "dd" from the damaged partition, then try to fsck the image. Christian. - -- BOFH excuse #43: boss forgot system password -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFArS+8C/PVm5+NVoYRAk35AJ9DdT3hP0YmBDZ1SVszLWw7WprCAwCg6/5p LT7apS8i4qcagiAl25j6A5w= =D5et -----END PGP SIGNATURE----- From evilninja at gmx.net Fri May 21 00:07:49 2004 From: evilninja at gmx.net (evilninja) Date: Fri, 21 May 2004 02:07:49 +0200 Subject: HD Partition Lost?? In-Reply-To: <200405201812.52935.stevew@aui.com> References: <200405201640.45015.stevew@aui.com> <40AD2A62.1090102@g-house.de> <200405201812.52935.stevew@aui.com> Message-ID: <40AD4855.5090107@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Steve Watford schrieb: | Thanks, | I'll give dd a try for backup puposes. I have already tried to fsck | the | partition it just comes back with a short read error. Asking if it is | a zero always be careful to check an already damaged disk: | Steve Watford schrieb: | | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } | | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772, being not a professional but a normal user i think this really looks like some hw issues. so i'd suggest better to copy (dd) all the data from the disk, as long as you have time to. every additional use of the disk may cause its final death. i wonder why your other partitions *seem* to still be ok (e.g. "mount" and using its data then succeeds)... Christian. PS: sorry for contacting you directly, Steve, without cc-ing to the list...my fault. - -- BOFH excuse #406: Bad cafeteria food landed all the sysadmins in the hospital. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFArUhVC/PVm5+NVoYRAmrwAKDB0zoNjyodGqgQsTfDR74mdJH8kQCg1kcq IrXjP2q9ecIEXRigOLB8Y9Y= =9vbd -----END PGP SIGNATURE----- From tkb9 at adelphia.net Fri May 21 02:51:49 2004 From: tkb9 at adelphia.net (Toby Bluhm) Date: Thu, 20 May 2004 22:51:49 -0400 Subject: HD Partition Lost?? In-Reply-To: <40AD4855.5090107@gmx.net> References: <200405201640.45015.stevew@aui.com> <40AD2A62.1090102@g-house.de> <200405201812.52935.stevew@aui.com> <40AD4855.5090107@gmx.net> Message-ID: <40AD6EC5.1060408@adelphia.net> evilninja wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Steve Watford schrieb: > | Thanks, > | I'll give dd a try for backup puposes. I have already tried to fsck > | the > | partition it just comes back with a short read error. Asking if it is > | a zero > > always be careful to check an already damaged disk: > > | Steve Watford schrieb: > | | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } > | | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772, > > being not a professional but a normal user i think this really looks > like some hw issues. so i'd suggest better to copy (dd) all the data > from the disk, as long as you have time to. every additional use of the > disk may cause its final death. > > i wonder why your other partitions *seem* to still be ok (e.g. "mount" > and using its data then succeeds)... > Ack! I've had too many failures with Maxtor disks. Anyway, use conv=sync,noerror with your dd command, preferably to an identical disk. Do the entire disk /dev/hda. Yes, it will take a very long time, but you have no other recourse if you want maintain all your data. Then fsck the new disk. Fsck on the bad disk may just make matters worse. Download Maxtor's ide utility & see if you can fix the bad one - may require a total disk rewrite. Nothing unusual about the other partitions being okay - just a matter of location of the sectors on the platter(s). Be suspicious of the entire disk though. -- Toby Bluhm From carles at unlimitedmail.org Fri May 21 13:58:19 2004 From: carles at unlimitedmail.org (Carles Xavier Munyoz =?iso-8859-15?q?Bald=F3?=) Date: Fri, 21 May 2004 15:58:19 +0200 Subject: Some bytes removed. Message-ID: <200405211558.19172.carles@unlimitedmail.org> Hi, I have a 1 GigaByte size file and want to remove some bytes from it. For this I must follow the next steps: (1) Create a new file. (2) Copy into it the bytes I will left from the original file. (3) Remove the original file. (4) Move the new file to be the original file. The problem with this process is that it uses lot of disk I/O. Actually only one disk block of the file is modified. Is there any other way to do that using an ext3 file system ? Greetings. --- Carles Xavier Munyoz Bald? carles at unlimitedmail.org http://www.unlimitedmail.net/ --- From evilninja at gmx.net Fri May 21 20:50:50 2004 From: evilninja at gmx.net (evilninja) Date: Fri, 21 May 2004 22:50:50 +0200 Subject: Some bytes removed. In-Reply-To: <200405211558.19172.carles@unlimitedmail.org> References: <200405211558.19172.carles@unlimitedmail.org> Message-ID: <40AE6BAA.7040700@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Carles Xavier Munyoz Bald? schrieb: | Hi, | I have a 1 GigaByte size file and want to remove some bytes from it. | For this I must follow the next steps: | (1) Create a new file. | (2) Copy into it the bytes I will left from the original file. | (3) Remove the original file. | (4) Move the new file to be the original file. | | The problem with this process is that it uses lot of disk I/O. | Actually only one disk block of the file is modified. | Is there any other way to do that using an ext3 file system ? um, use and editor for doing things like this? *gg* you'll need a proper editor and lots of RAM anyway.... or try "dd". it can seek to a given block number and then copy out the bits you want to have. Christian. - -- BOFH excuse #120: we just switched to FDDI. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFArmupC/PVm5+NVoYRAiStAJ9eSTbBXu8MyRdlzeWyz6ZETUjnlACcDoR3 2YJ7UdicqEjRtlVGrRbyb14= =Eg3T -----END PGP SIGNATURE----- From stevew at aui.com Fri May 21 22:29:27 2004 From: stevew at aui.com (Steve Watford) Date: Fri, 21 May 2004 18:29:27 -0400 Subject: HD Partition Lost?? In-Reply-To: <40AD2FBC.4030203@gmx.net> References: <200405201640.45015.stevew@aui.com> <40AD2A62.1090102@g-house.de> <40AD2FBC.4030203@gmx.net> Message-ID: <200405211829.27656.stevew@aui.com> Yes, I downloaded the Maxtor diagnostic program and installed to a diskette. It reports that the drive is failing and to back up right away returning a diagnostic code for an RMA. Oh well, guess it really is hardware. I will be dd'ing it over to another drive and sending this one back. Then I'll work on it from there. Thanks, Steve On Thursday 20 May 2004 6:22 pm, evilninja wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Steve Watford schrieb: > | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } > | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772, > | high=5, > | low=7009692, sector=0 > | end_request: I/O error, dev 03:03 (hda), sector 0 > | ... repeats 10 time > | kjournald starting. Commit interval 5 seconds > > hardware errors :-( > try at least to "dd" from the damaged partition, then try to fsck the > image. > > Christian. > - -- > BOFH excuse #43: > > boss forgot system password > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFArS+8C/PVm5+NVoYRAk35AJ9DdT3hP0YmBDZ1SVszLWw7WprCAwCg6/5p > LT7apS8i4qcagiAl25j6A5w= > =D5et > -----END PGP SIGNATURE----- > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users -- NOTICE: This e-mail message and any attachment to this e-mail message contains confidential information that is legally privileged. If you are not the intended recipient, you must not review, retransmit, convert to hard copy, copy, use or disseminate this e-mail or any attachments to it. If you have received this e-mail in error, please notify us immediately by return e-mail or by telephone at 727-372-0115 and delete the original and all copies of this transmission (including any attachments). Thank you From stevew at aui.com Fri May 21 22:36:54 2004 From: stevew at aui.com (Steve Watford) Date: Fri, 21 May 2004 18:36:54 -0400 Subject: HD Partition Lost?? In-Reply-To: <40AD6EC5.1060408@adelphia.net> References: <200405201640.45015.stevew@aui.com> <40AD4855.5090107@gmx.net> <40AD6EC5.1060408@adelphia.net> Message-ID: <200405211836.54553.stevew@aui.com> Yes, I downloaded the Maxtor diagnostic program and installed to a diskette. It reports that the drive is failing and to back up right away returning a diagnostic code for an RMA. Oh well, guess it really is hardware. I will be dd'ing it over to another drive and sending this one back. Then I'll work on it from there. What would be the exact syntax for the dd command for the entire disk as you suggested? I will be installing a parallel drive in the morning, also a 160GB drive as /dev/hdc with the bad one being /dev/hda. The bad drive is nowhere near full, but is the max file size a problem still? I shouldn't do just partitions instead? Although the one with the actual problem is a 100GB partition, although only about 20% full. Around 22 Gig all together on the drive. Thanks for the help, Steve On Thursday 20 May 2004 10:51 pm, Toby Bluhm wrote: > evilninja wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Steve Watford schrieb: > > | Thanks, > > | I'll give dd a try for backup puposes. I have already tried to fsck > > | the > > | partition it just comes back with a short read error. Asking if it is > > | a zero > > > > always be careful to check an already damaged disk: > > | Steve Watford schrieb: > > | | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } > > | | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772, > > > > being not a professional but a normal user i think this really looks > > like some hw issues. so i'd suggest better to copy (dd) all the data > > from the disk, as long as you have time to. every additional use of the > > disk may cause its final death. > > > > i wonder why your other partitions *seem* to still be ok (e.g. "mount" > > and using its data then succeeds)... > > Ack! I've had too many failures with Maxtor disks. > > Anyway, use conv=sync,noerror with your dd command, preferably to an > identical disk. Do the entire disk /dev/hda. Yes, it will take a very > long time, but you have no other recourse if you want maintain all your > data. Then fsck the new disk. Fsck on the bad disk may just make matters > worse. > > Download Maxtor's ide utility & see if you can fix the bad one - may > require a total disk rewrite. > > Nothing unusual about the other partitions being okay - just a matter > of location of the sectors on the platter(s). Be suspicious of the > entire disk though. From tkb9 at adelphia.net Sat May 22 03:03:06 2004 From: tkb9 at adelphia.net (Toby Bluhm) Date: Fri, 21 May 2004 23:03:06 -0400 Subject: HD Partition Lost?? In-Reply-To: <200405211836.54553.stevew@aui.com> References: <200405201640.45015.stevew@aui.com> <40AD4855.5090107@gmx.net> <40AD6EC5.1060408@adelphia.net> <200405211836.54553.stevew@aui.com> Message-ID: <40AEC2EA.1030908@adelphia.net> Steve Watford wrote: >Yes, I downloaded the Maxtor diagnostic program and installed to a diskette. >It reports that the drive is failing and to back up right away returning a >diagnostic code for an RMA. Oh well, guess it really is hardware. I will be >dd'ing it over to another drive and sending this one back. Then I'll work on >it from there. > >What would be the exact syntax for the dd command for the entire disk as you >suggested? I will be installing a parallel drive in the morning, also a >160GB drive as /dev/hdc with the bad one being /dev/hda. The bad drive is >nowhere near full, but is the max file size a problem still? I shouldn't do >just partitions instead? Although the one with the actual problem is a 100GB >partition, although only about 20% full. Around 22 Gig all together on the >drive. > >Thanks for the help, >Steve > >On Thursday 20 May 2004 10:51 pm, Toby Bluhm wrote: > > >>evilninja wrote: >> >> >>>-----BEGIN PGP SIGNED MESSAGE----- >>>Hash: SHA1 >>> >>>Steve Watford schrieb: >>>| Thanks, >>>| I'll give dd a try for backup puposes. I have already tried to fsck >>>| the >>>| partition it just comes back with a short read error. Asking if it is >>>| a zero >>> >>>always be careful to check an already damaged disk: >>>| Steve Watford schrieb: >>>| | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } >>>| | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772, >>> >>>being not a professional but a normal user i think this really looks >>>like some hw issues. so i'd suggest better to copy (dd) all the data >>>from the disk, as long as you have time to. every additional use of the >>>disk may cause its final death. >>> >>>i wonder why your other partitions *seem* to still be ok (e.g. "mount" >>>and using its data then succeeds)... >>> >>> >>Ack! I've had too many failures with Maxtor disks. >> >>Anyway, use conv=sync,noerror with your dd command, preferably to an >>identical disk. Do the entire disk /dev/hda. Yes, it will take a very >>long time, but you have no other recourse if you want maintain all your >>data. Then fsck the new disk. Fsck on the bad disk may just make matters >>worse. >> >>Download Maxtor's ide utility & see if you can fix the bad one - may >>require a total disk rewrite. >> >>Nothing unusual about the other partitions being okay - just a matter >>of location of the sectors on the platter(s). Be suspicious of the >>entire disk though. >> >> > > > > You may as well do the entire disk: dd if=/dev/hda of=/dev/hdb bs=1k conf=sync,noerror Install the new disk as hda & run your fsck on hda3. -- Toby Bluhm From adam.cassar at netregistry.com.au Sun May 23 02:36:54 2004 From: adam.cassar at netregistry.com.au (Adam Cassar) Date: Sun, 23 May 2004 12:36:54 +1000 Subject: ext3 htree issues Message-ID: <1085279814.523.13.camel@akira> Hi Guys, I am running ext3 on kernel v2.6.5. I have an ext3 filesystem with dir_index and data=journal for /var/spool/exim Today I noticed in the exim logs a bunch of 'failed to unlink /var/spool/exim/input/P/1BRbSP-0006hy-Jp-D' I also noticed these in the kernel logs: EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file (612870), 0 EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file (22203), 0 I have never seen this before with exim and ext3. This is the first time I have tried running htree on the exim spool however. From adam.cassar at netregistry.com.au Sun May 23 09:14:39 2004 From: adam.cassar at netregistry.com.au (Adam Cassar) Date: Sun, 23 May 2004 19:14:39 +1000 Subject: ext3 htree issues In-Reply-To: <1085279814.523.13.camel@akira> References: <1085279814.523.13.camel@akira> Message-ID: <1085303678.519.1.camel@akira> In addition I received the following stack trace when unmounting the file system. sb orphan head is 22203 sb_info orphan list: inode hda12:29794 at d85e6b1c: mode 100640, nlink -1, next 0 Assertion failure in ext3_put_super() at fs/ext3/super.c:412: "list_empty(&sbi->s_orphan)" ------------[ cut here ]------------ kernel BUG at fs/ext3/super.c:412! invalid operand: 0000 [#1] SMP CPU: 0 EIP: 0060:[] Not tainted EFLAGS: 00010206 (2.6.5) EIP is at ext3_put_super+0xde/0x144 eax: 0000005e ebx: 00000001 ecx: 00000000 edx: c02fe3c4 esi: f0d8a000 edi: f0d8b108 ebp: f797b200 esp: c6d59f04 ds: 007b es: 007b ss: 0068 Process umount (pid: 1937, threadinfo=c6d58000 task=f787c280) Stack: c02bd800 c02bd7e0 c02bd7d0 0000019c c02bd7b5 f797b250 f797b200 c0304440 c6d58000 c014b341 f797b200 f797b200 f7c1f500 0804ff20 c014bd3e f797b200 f797b200 c03045c0 c014b19e f797b200 f7fe7540 f797b200 c015ea02 f797b200 Call Trace: [] generic_shutdown_super+0x9d/0x160 [] kill_block_super+0x12/0x28 [] deactivate_super+0x46/0x94 [] __mntput+0x1e/0x24 [] path_release+0x28/0x30 [] sys_umount+0x69/0x74 [] sys_munmap+0x38/0x58 [] sys_oldumount+0xc/0x10 [] syscall_call+0x7/0xb Code: 0f 0b 9c 01 d0 d7 2b c0 83 c4 14 6a 00 8b 85 94 00 00 00 50 On Sun, 2004-05-23 at 12:36, Adam Cassar wrote: > Hi Guys, > > I am running ext3 on kernel v2.6.5. > > I have an ext3 filesystem with dir_index and data=journal for > /var/spool/exim > > Today I noticed in the exim logs a bunch of 'failed to unlink > /var/spool/exim/input/P/1BRbSP-0006hy-Jp-D' > > I also noticed these in the kernel logs: > > EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file > (612870), 0 > EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file > (22203), 0 > > I have never seen this before with exim and ext3. This is the first time > I have tried running htree on the exim spool however. > > > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From maheshext3 at yahoo.com Mon May 24 19:38:15 2004 From: maheshext3 at yahoo.com (M K) Date: Mon, 24 May 2004 12:38:15 -0700 (PDT) Subject: Req. For Info: External Journal Pros/Cons, advisable size Message-ID: <20040524193815.78638.qmail@web61007.mail.yahoo.com> I have a basic question, being new to EXT3. What are the pros and cons of using an external journal? Also, for a Terabyte-sized system, what should I use as the external journal's size? is there a general rule-of-thumb to follow when choosing the external journal's size? I know these are very basic questions, if they have been already answered in some FAQ, please let me know. Any suggestions and help are very much appreciated! Thanks in advance! Cheers, Mahesh __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ From maheshext3 at yahoo.com Mon May 24 19:45:09 2004 From: maheshext3 at yahoo.com (M K) Date: Mon, 24 May 2004 12:45:09 -0700 (PDT) Subject: Separate common journal device In-Reply-To: <20040421092301.GD2938@schnapps.adilger.int> Message-ID: <20040524194509.67338.qmail@web61006.mail.yahoo.com> On a related note, wouldnt it be more efficient to have a single dedicated hard drive, with multiple partitions to store journals - one for each ext3 system? --- Andreas Dilger wrote: > On Apr 20, 2004 23:56 -0500, Vijayan Prabhakaran > wrote: > > Is it possible to use a separate journal device > (one on a separate > > drive or a partition) shared among more than 1 > Ext3 file systems ? > > It is possible now to use an external block device > for a single filesystem. > The on-disk format is designed to allow multiple > filesystems to share the > same device, but that has never been fully > implemented. > > At one point I had implemented a patch to mount a > "jbd" filesystem with the > journal device as the first step of having a shared > journal device. Having > the "jbd" device in /etc/fstab (before filesystems > that use it) allows e2fsck > to do journal replay on all of the filesystems > before the journal starts to > be used, or alternately dumps the journal data to an > external file for later > replay (e.g. if block devices are not available when > e2fsck is run on the > jbd device). It also allows the jbd code to > configure the in-core code to > be ready for external filesystems to connect to it. > Finally, it also marks > the block device as in-use so it is less likely that > it will be overwritten > accidentally. > > See the following email for the (ancient) patch. > Most of the comments > and a large fraction of the code in that email are > still relevant, with > the exception that all of the UUID handling already > exists as libblkid > in e2fsprogs, and it doesn't say what kernel version > this is for (I'd > suspect 2.3, but I'm not totally sure. Sadly, > nobody commented on it > at the time and it was lost in the mists of > antiquity. > > > Subject: [PATCH][RFC] mountable journal devices > > To: Ext2 development mailing list > > > Date: Wed, 8 Aug 2001 02:08:23 -0600 (MDT) > http://marc.theaimsgroup.com/?l=ext2-devel&m=99725819513803 > > And the thread starting at discusses shared external > journal devices: > https://listman.redhat.com/archives/ext3-users/2001-November/msg00182.html > > Cheers, Andreas > -- > Andreas Dilger > http://sourceforge.net/projects/ext2resize/ > http://www-mddsp.enel.ucalgary.ca/People/adilger/ > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ From maheshext3 at yahoo.com Mon May 24 19:52:49 2004 From: maheshext3 at yahoo.com (M K) Date: Mon, 24 May 2004 12:52:49 -0700 (PDT) Subject: logging disk activity In-Reply-To: <1081882546.17960.1.camel@rkalaskar> Message-ID: <20040524195249.43829.qmail@web61004.mail.yahoo.com> Try using iostat it comes as part of the sysstat package. You can find the RPM for it in rpmfind.net --- Rahul Kalaskar wrote: > Hi all, > > I would like to know how often a writes happen on > ext3 fs. Is there any > way to find this out? > > Thanks > Rahul > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ From maheshext3 at yahoo.com Mon May 24 19:56:32 2004 From: maheshext3 at yahoo.com (M K) Date: Mon, 24 May 2004 12:56:32 -0700 (PDT) Subject: logging disk activity In-Reply-To: <407C4D7E.8000604@excelcia.org> Message-ID: <20040524195632.19391.qmail@web61003.mail.yahoo.com> If I am not mistaken, that number can be changed in linux/fs/jbd/journal.c also, Stephen Tweedie has a patch which allows you to specify the journal update time as a parameter to the mount command plus, you can tweak bdflush (see man proc for details, and linux/fs/buffer.c) to tune the timing of the buffer flushes. --- Kurt Fitzner wrote: > Rahul Kalaskar wrote: > > I would like to know how often a writes happen on > ext3 fs. Is there any > > way to find this out? > > Data is flushed to the journal every 5 seconds, as > opposed to ext2 where > it is flushed every 30. > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ From maheshext3 at yahoo.com Mon May 24 19:51:01 2004 From: maheshext3 at yahoo.com (M K) Date: Mon, 24 May 2004 12:51:01 -0700 (PDT) Subject: EXT3 on raid with external journal... In-Reply-To: <407FEB9D.1000002@excelcia.org> Message-ID: <20040524195101.22885.qmail@web61002.mail.yahoo.com> Kurt, was there any consensus on this thread? I am running tests on the same configuration (ext3 on RAID, with external journal)- are there any particular tests that you had run? I could run it on my systems too and post the results in the mailing list. --- Kurt Fitzner wrote: > Matt Bernstein wrote: > > On Apr 13 Kurt Fitzner wrote: > > > > There could be metadata which is only in the > journal, so failure probably > > means reboot + full fsck, so you may as well use > ext2 if your machine > > doesn't otherwise crash. > > > > Far preferable, I think, would be to put your > journal on a RAID 1 pair. > > I would like to think that if the ext3 driver > encountered an error > writing to the journal, that it would then skip the > journal and write > straight to the device - reverting to ext2 behavior. > There should never > be any loss of data (meta or otherwise) upon the > failure of a journal > device. That is, unless the failure of the > journaling device coincides > with a power failure. That is: > > 1) Failure of journaling device > 2) Attempted write of metadata to journal device > 3) Power failure before ext3 gives up on the > journaling device > > In that scenario, the ramification is the array > requiring a full fsck. > The benefit of running the journal on an external > device would far > outweigh the cost of a full fsck in the unlikely > event the above happens. > > I need to know, though, what exactly is the behavior > of ext3 in the > following situations: > - At system startup if there is a failure to > "mount" an external journal > - During operation if the external journal device > fails. > > Does ext3 then revert to non-journaled (ext2) > behavior in those instances? > > - > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ From maheshext3 at yahoo.com Mon May 24 20:29:53 2004 From: maheshext3 at yahoo.com (M K) Date: Mon, 24 May 2004 13:29:53 -0700 (PDT) Subject: nature of ext3 journal Message-ID: <20040524202953.29517.qmail@web61003.mail.yahoo.com> Is the ext3 journal a transient journal - meaning that its emptied out as and when the transactions recorded in the journal are completed - thus maintaining almost a constant size? Or does it keep growing as time progresses ? Thanks in advance! __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ From adilger at clusterfs.com Tue May 25 16:30:58 2004 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 25 May 2004 10:30:58 -0600 Subject: ext3 htree issues In-Reply-To: <1085279814.523.13.camel@akira> References: <1085279814.523.13.camel@akira> Message-ID: <20040525163058.GB2603@schnapps.adilger.int> On May 23, 2004 12:36 +1000, Adam Cassar wrote: > I am running ext3 on kernel v2.6.5. > > I have an ext3 filesystem with dir_index and data=journal for > /var/spool/exim > > Today I noticed in the exim logs a bunch of 'failed to unlink > /var/spool/exim/input/P/1BRbSP-0006hy-Jp-D' > > I also noticed these in the kernel logs: > > EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file > (612870), 0 > EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file > (22203), 0 > > I have never seen this before with exim and ext3. This is the first time > I have tried running htree on the exim spool however. I will posted an updated patch to ext3-devel for this problem in the thread "problems with ext3 fs, kernels up to 2.6.6-rc2". Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ From maheshext3 at yahoo.com Wed May 26 19:14:28 2004 From: maheshext3 at yahoo.com (M K) Date: Wed, 26 May 2004 12:14:28 -0700 (PDT) Subject: HD Partition Lost?? In-Reply-To: <40AEC2EA.1030908@adelphia.net> Message-ID: <20040526191428.45704.qmail@web61009.mail.yahoo.com> Though this is not a hard-drive info thread, someone's complaint about Maxtor drives caught my attention, since I have been experiencing similar issues - quite a few drives have been just going bad all of a sudden after working fine for a month or two - the time-to-failure duration is random; My experience has been primarily with the Maxstor 250 and 300 GB drives - which the company claims are "near-line" drives. are there any other reliable drives that anyone has tested ? Thanks in Advance! --- Toby Bluhm wrote: > Steve Watford wrote: > > >Yes, I downloaded the Maxtor diagnostic program and > installed to a diskette. > >It reports that the drive is failing and to back up > right away returning a > >diagnostic code for an RMA. Oh well, guess it > really is hardware. I will be > >dd'ing it over to another drive and sending this > one back. Then I'll work on > >it from there. > > > >What would be the exact syntax for the dd command > for the entire disk as you > >suggested? I will be installing a parallel drive > in the morning, also a > >160GB drive as /dev/hdc with the bad one being > /dev/hda. The bad drive is > >nowhere near full, but is the max file size a > problem still? I shouldn't do > >just partitions instead? Although the one with the > actual problem is a 100GB > >partition, although only about 20% full. Around 22 > Gig all together on the > >drive. > > > >Thanks for the help, > >Steve > > > >On Thursday 20 May 2004 10:51 pm, Toby Bluhm wrote: > > > > > >>evilninja wrote: > >> > >> > >>>-----BEGIN PGP SIGNED MESSAGE----- > >>>Hash: SHA1 > >>> > >>>Steve Watford schrieb: > >>>| Thanks, > >>>| I'll give dd a try for backup puposes. I have > already tried to fsck > >>>| the > >>>| partition it just comes back with a short read > error. Asking if it is > >>>| a zero > >>> > >>>always be careful to check an already damaged > disk: > >>>| Steve Watford schrieb: > >>>| | hda: dma_intr: status=0x51 { DriveReady > SeekComplete Error } > >>>| | hda: dma_intr: error=0x40 { > UncorrectableError }, LBAsect=90895772, > >>> > >>>being not a professional but a normal user i > think this really looks > >>>like some hw issues. so i'd suggest better to > copy (dd) all the data > >>>from the disk, as long as you have time to. every > additional use of the > >>>disk may cause its final death. > >>> > >>>i wonder why your other partitions *seem* to > still be ok (e.g. "mount" > >>>and using its data then succeeds)... > >>> > >>> > >>Ack! I've had too many failures with Maxtor disks. > >> > >>Anyway, use conv=sync,noerror with your dd > command, preferably to an > >>identical disk. Do the entire disk /dev/hda. Yes, > it will take a very > >>long time, but you have no other recourse if you > want maintain all your > >>data. Then fsck the new disk. Fsck on the bad disk > may just make matters > >>worse. > >> > >>Download Maxtor's ide utility & see if you can fix > the bad one - may > >>require a total disk rewrite. > >> > >>Nothing unusual about the other partitions being > okay - just a matter > >>of location of the sectors on the platter(s). Be > suspicious of the > >>entire disk though. > >> > >> > > > > > > > > > > You may as well do the entire disk: > > dd if=/dev/hda of=/dev/hdb bs=1k conf=sync,noerror > > Install the new disk as hda & run your fsck on hda3. > > -- > > Toby Bluhm > > > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ From maheshext3 at yahoo.com Wed May 26 21:32:04 2004 From: maheshext3 at yahoo.com (M K) Date: Wed, 26 May 2004 14:32:04 -0700 (PDT) Subject: HD Partition Lost?? In-Reply-To: <20040526191428.45704.qmail@web61009.mail.yahoo.com> Message-ID: <20040526213204.87320.qmail@web61004.mail.yahoo.com> Answering my own question: did some digging through the linux-raid archives, found a huge number of posts where people had problems with virtually all drive manufacturers. bad . :-( --- M K wrote: > Though this is not a hard-drive info thread, > someone's > complaint about Maxtor drives caught my attention, > since I have been experiencing similar issues - > quite > a few drives have been just going bad all of a > sudden > after working fine for a month or two - the > time-to-failure duration is random; My experience > has > been primarily with the Maxstor 250 and 300 GB > drives > - which the company claims are "near-line" drives. > are there any other reliable drives that anyone has > tested ? > Thanks in Advance! > --- Toby Bluhm wrote: > > Steve Watford wrote: > > > > >Yes, I downloaded the Maxtor diagnostic program > and > > installed to a diskette. > > >It reports that the drive is failing and to back > up > > right away returning a > > >diagnostic code for an RMA. Oh well, guess it > > really is hardware. I will be > > >dd'ing it over to another drive and sending this > > one back. Then I'll work on > > >it from there. > > > > > >What would be the exact syntax for the dd command > > for the entire disk as you > > >suggested? I will be installing a parallel drive > > in the morning, also a > > >160GB drive as /dev/hdc with the bad one being > > /dev/hda. The bad drive is > > >nowhere near full, but is the max file size a > > problem still? I shouldn't do > > >just partitions instead? Although the one with > the > > actual problem is a 100GB > > >partition, although only about 20% full. Around > 22 > > Gig all together on the > > >drive. > > > > > >Thanks for the help, > > >Steve > > > > > >On Thursday 20 May 2004 10:51 pm, Toby Bluhm > wrote: > > > > > > > > >>evilninja wrote: > > >> > > >> > > >>>-----BEGIN PGP SIGNED MESSAGE----- > > >>>Hash: SHA1 > > >>> > > >>>Steve Watford schrieb: > > >>>| Thanks, > > >>>| I'll give dd a try for backup puposes. I have > > already tried to fsck > > >>>| the > > >>>| partition it just comes back with a short > read > > error. Asking if it is > > >>>| a zero > > >>> > > >>>always be careful to check an already damaged > > disk: > > >>>| Steve Watford schrieb: > > >>>| | hda: dma_intr: status=0x51 { DriveReady > > SeekComplete Error } > > >>>| | hda: dma_intr: error=0x40 { > > UncorrectableError }, LBAsect=90895772, > > >>> > > >>>being not a professional but a normal user i > > think this really looks > > >>>like some hw issues. so i'd suggest better to > > copy (dd) all the data > > >>>from the disk, as long as you have time to. > every > > additional use of the > > >>>disk may cause its final death. > > >>> > > >>>i wonder why your other partitions *seem* to > > still be ok (e.g. "mount" > > >>>and using its data then succeeds)... > > >>> > > >>> > > >>Ack! I've had too many failures with Maxtor > disks. > > >> > > >>Anyway, use conv=sync,noerror with your dd > > command, preferably to an > > >>identical disk. Do the entire disk /dev/hda. > Yes, > > it will take a very > > >>long time, but you have no other recourse if you > > want maintain all your > > >>data. Then fsck the new disk. Fsck on the bad > disk > > may just make matters > > >>worse. > > >> > > >>Download Maxtor's ide utility & see if you can > fix > > the bad one - may > > >>require a total disk rewrite. > > >> > > >>Nothing unusual about the other partitions > being > > okay - just a matter > > >>of location of the sectors on the platter(s). Be > > suspicious of the > > >>entire disk though. > > >> > > >> > > > > > > > > > > > > > > > > You may as well do the entire disk: > > > > dd if=/dev/hda of=/dev/hdb bs=1k conf=sync,noerror > > > > Install the new disk as hda & run your fsck on > hda3. > > > > -- > > > > Toby Bluhm > > > > > > > > > > _______________________________________________ > > Ext3-users mailing list > > Ext3-users at redhat.com > > https://www.redhat.com/mailman/listinfo/ext3-users > > > > > > __________________________________ > Do you Yahoo!? > Friends. Fun. Try the all-new Yahoo! Messenger. > http://messenger.yahoo.com/ > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ From cwelton at jumpnowusa.com Thu May 27 01:33:02 2004 From: cwelton at jumpnowusa.com (Christopher Welton) Date: 26 May 2004 18:33:02 -0700 Subject: HELP! after power loss, system boots through mount of root fs then stalls Message-ID: <1085621582.3262.38.camel@localhost.localdomain> I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium 266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID controller and array. All file systems are ext3. The server has been in service for a couple of years now. From time to time we will lose power in our office or have another situtation that causes the server to lose power without a proper shutdown. We had such a situation today. Usually the server reboots to runlevel 5 without a problem. However, today the server rebooted to the point in the boot process just after mounting the root filesystem. It then stalls indefinitely and does not continue to boot. I used a recovery CD to boot into a rescue shell. Once there I successfully mounted all the partitions on all drives and examined the files successfully, so the data, filesystem and hardware all look good. The filesystems were mounted ext2, not ext3. At this point, my suspicions are that some portion of the kernel required for booting was damaged during the power-loss shutdown or that the ext3 journal was damaged in such a way as to block booting. I need suggestions on possible causes of the problem and, better yet, possible solutions. I'm cross-posting this message on the RH 7.3 list. From awilliam at mdah.state.ms.us Thu May 27 01:36:16 2004 From: awilliam at mdah.state.ms.us (Adam Williams) Date: Wed, 26 May 2004 20:36:16 -0500 Subject: HELP! after power loss, system boots through mount of root fs then stalls In-Reply-To: <1085621582.3262.38.camel@localhost.localdomain> References: <1085621582.3262.38.camel@localhost.localdomain> Message-ID: <40B54610.9050607@mdah.state.ms.us> in rescue mode did you do e2fsck -c -y /dev/sdx# Christopher Welton wrote: >I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium >266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID >controller and array. All file systems are ext3. > >The server has been in service for a couple of years now. From time to >time we will lose power in our office or have another situtation that >causes the server to lose power without a proper shutdown. We had such a >situation today. > >Usually the server reboots to runlevel 5 without a problem. However, >today the server rebooted to the point in the boot process just after >mounting the root filesystem. It then stalls indefinitely and does not >continue to boot. > >I used a recovery CD to boot into a rescue shell. Once there I >successfully mounted all the partitions on all drives and examined the >files successfully, so the data, filesystem and hardware all look good. >The filesystems were mounted ext2, not ext3. > >At this point, my suspicions are that some portion of the kernel >required for booting was damaged during the power-loss shutdown or that >the ext3 journal was damaged in such a way as to block booting. > >I need suggestions on possible causes of the problem and, better yet, >possible solutions. > >I'm cross-posting this message on the RH 7.3 list. > > >_______________________________________________ >Ext3-users mailing list >Ext3-users at redhat.com >https://www.redhat.com/mailman/listinfo/ext3-users > > > From cwelton at jumpnowusa.com Thu May 27 02:13:19 2004 From: cwelton at jumpnowusa.com (Christopher Welton) Date: 26 May 2004 19:13:19 -0700 Subject: HELP! after power loss, system boots through mount of root fs then stalls In-Reply-To: <40B54610.9050607@mdah.state.ms.us> References: <1085621582.3262.38.camel@localhost.localdomain> <40B54610.9050607@mdah.state.ms.us> Message-ID: <1085623999.3262.42.camel@localhost.localdomain> No, I did not. On Wed, 2004-05-26 at 18:36, Adam Williams wrote: > in rescue mode did you do e2fsck -c -y /dev/sdx# > > Christopher Welton wrote: > > >I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium > >266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID > >controller and array. All file systems are ext3. > > > >The server has been in service for a couple of years now. From time to > >time we will lose power in our office or have another situtation that > >causes the server to lose power without a proper shutdown. We had such a > >situation today. > > > >Usually the server reboots to runlevel 5 without a problem. However, > >today the server rebooted to the point in the boot process just after > >mounting the root filesystem. It then stalls indefinitely and does not > >continue to boot. > > > >I used a recovery CD to boot into a rescue shell. Once there I > >successfully mounted all the partitions on all drives and examined the > >files successfully, so the data, filesystem and hardware all look good. > >The filesystems were mounted ext2, not ext3. > > > >At this point, my suspicions are that some portion of the kernel > >required for booting was damaged during the power-loss shutdown or that > >the ext3 journal was damaged in such a way as to block booting. > > > >I need suggestions on possible causes of the problem and, better yet, > >possible solutions. > > > >I'm cross-posting this message on the RH 7.3 list. > > > > > >_______________________________________________ > >Ext3-users mailing list > >Ext3-users at redhat.com > >https://www.redhat.com/mailman/listinfo/ext3-users > > > > > > > > > > From awilliam at mdah.state.ms.us Thu May 27 02:15:08 2004 From: awilliam at mdah.state.ms.us (Adam Williams) Date: Wed, 26 May 2004 21:15:08 -0500 Subject: HELP! after power loss, system boots through mount of root fs then stalls In-Reply-To: <1085623999.3262.42.camel@localhost.localdomain> References: <1085621582.3262.38.camel@localhost.localdomain> <40B54610.9050607@mdah.state.ms.us> <1085623999.3262.42.camel@localhost.localdomain> Message-ID: <40B54F2C.1070405@mdah.state.ms.us> well try it and see what happens :) Christopher Welton wrote: >No, I did not. > >On Wed, 2004-05-26 at 18:36, Adam Williams wrote: > > >>in rescue mode did you do e2fsck -c -y /dev/sdx# >> >>Christopher Welton wrote: >> >> >> >>>I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium >>>266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID >>>controller and array. All file systems are ext3. >>> >>>The server has been in service for a couple of years now. From time to >>>time we will lose power in our office or have another situtation that >>>causes the server to lose power without a proper shutdown. We had such a >>>situation today. >>> >>>Usually the server reboots to runlevel 5 without a problem. However, >>>today the server rebooted to the point in the boot process just after >>>mounting the root filesystem. It then stalls indefinitely and does not >>>continue to boot. >>> >>>I used a recovery CD to boot into a rescue shell. Once there I >>>successfully mounted all the partitions on all drives and examined the >>>files successfully, so the data, filesystem and hardware all look good. >>>The filesystems were mounted ext2, not ext3. >>> >>>At this point, my suspicions are that some portion of the kernel >>>required for booting was damaged during the power-loss shutdown or that >>>the ext3 journal was damaged in such a way as to block booting. >>> >>>I need suggestions on possible causes of the problem and, better yet, >>>possible solutions. >>> >>>I'm cross-posting this message on the RH 7.3 list. >>> >>> >>>_______________________________________________ >>>Ext3-users mailing list >>>Ext3-users at redhat.com >>>https://www.redhat.com/mailman/listinfo/ext3-users >>> >>> >>> >>> >>> >> >> >> >> > > > From ext3 at linuxfarms.com Thu May 27 14:11:12 2004 From: ext3 at linuxfarms.com (Arthur Perry) Date: Thu, 27 May 2004 10:11:12 -0400 (EDT) Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd) Message-ID: ---------- Forwarded message ---------- Date: Thu, 27 May 2004 08:43:30 -0400 (EDT) From: Arthur Perry To: ext3 at linuxfarms.com Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd) ---------- Forwarded message ---------- Date: Thu, 27 May 2004 08:11:17 -0400 (EDT) From: Arthur Perry To: Christopher Welton Cc: ext3-users at redhat.com Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd) Hi Chris, I put the whole thread into ext3-users at redhat.com, so others can benefit. e2fsck will repair filesystems damage, but it will do nothing for the RAID container. If your damage exists primarily on the RAID container, then what you want to do is repair that first! Otherwise, you may be making corrections to a filesystem and writing those changes to what could be mapped as bad blocks, and only worsening your situation. It all depends on what kind of RAID container you have, the type of damage, and the extent of damage. So in a simple answer, the filesystem check MAY appear to fix it for you in the really short term, or it may not. It's really all about your RAID container first. In my experience, we have had bad luck with RAID5 on certain controllers. I do not know anything about the particular one you have listed below. I would be sure to back up anything you can with a rescue disk boot before making any changes to your disk, simply because at this time, it is unknown exactly what you are dealing with. Best Regards, Art Perry On Wed, 26 May 2004, Christopher Welton wrote: > Thanks Art, your response is useful. I have posted my original question > to the RH 7.3 and RH ext3 lists. Feel free to post your question and/or > answers. > > One thing I would like to know is how to recover from the damage to the > RAID container scenario. Is this just a matter of running e2fsck? Pls > let me know. > > Thanks for your help! > > Chris > > On Wed, 2004-05-26 at 17:20, Arthur Perry wrote: > > Hi Chris, > > > > I just wanted to know if you have tried to fix this problem yet, and if > > any of my info and suggestions were helpful. > > > > Best of Luck, > > Art Perry > > > > > > > > On Tue, 25 May 2004, Arthur Perry wrote: > > > > > > > > > > > Oh, I just wanted to add a very important suggestion: > > > Do not run fsck or anything that may modify the filesystem on that > > > container until you have attempted to back up the important data first! > > > At least, not until you are confident that what you are dealing with here > > > is not any related to any damage to that container. > > > In theory, the fsck may work out fine, but we (at least I not being there) > > > are not sure about what is really going on. > > > > > > In practice, when there is any question about a medium's integrity, don't > > > do anything further to it until the important data is extracted from it > > > and backed up the best that you can before proceeding. > > > I would mount read-only. > > > > > > Just a heads up! > > > > > > > > > > > > ---------- Forwarded message ---------- > > > Date: Tue, 25 May 2004 12:14:16 -0400 (EDT) > > > From: Arthur Perry > > > To: Christopher Welton > > > Cc: alp at linuxfarms.com > > > Subject: Re: Linux consultation needed > > > > > > Hi Christopher, > > > > > > By your description, I do believe that I can fix this problem and it may be rather easy. > > > > > > Unfortunately, I live in Massachusetts and it may be rather expensive to get me down there. > > > At the moment, I am working full-time for a large international computer corporation. So to make that trip, it would cut into my > > > vacation time and so I would have to be compensated for that properly. > > > > > > > > > > > > Off the top of my head, I see two possible scenarios: > > > > > > 1) damage to the raid container > > > if you were able to mount the filesystem with a rescue system, (and you > > > monted it in ext3, not ext2), then I can assume that the filesystem may > > > believe that it is in order. (One would know for sure once you run fsck). > > > However, that does not mean the underlaying block layer is ok. > > > The RAID container presents itself to the OS as a uniform block device, > > > which the filesystem sits on top of. > > > How each block gets distributed across the physical disks is entirely up > > > to the RAID hardware, and identifying whether or not a problem exists with > > > the RAID container is also the responsibility of the RAID hardware. > > > That being said, if the OS has no drivers that would directly interface > > > with the RAID hardware to collect the status of the container, there is no > > > way you could tell, unless this hardware had some sort of beep or buzzer to warn you > > > of this. You could also enter the setup screen of the RAID controller at > > > boot time and check the health of the container there. > > > > > > The reason why I think this is a possibility is because there may be > > > "corrupted" blocks in this RAID container that exist in areas that are > > > read during the boot process (and I use this term generally), and not > > > necessarily in locations where the kernel itself reside. > > > The kernel has such a small footprint, that this is not only > > > unlikely but probably wouldn't happen because the kernel would not be able > > > to completely decompress successfully to continue excecution before boot. > > > If this were the case, the failure mode would probably be more severe. > > > > > > > > > > > > 2) damage to the hardware > > > It is possible that the hardware has become somehow damaged or changed, > > > where at boot time when a hardware probe is performed by Kudzu, it locks > > > up the system entirely. > > > An example of this is a bad DIMM or possibly some other peripheral. > > > Maybe there is a hung-up SCSI device on the chain that is on a separate > > > power supply that just needs to be "reset" during the next reboot. > > > > > > > > > > > > If you were able to mount the filesystem from a rescue disk (in ext3 not > > > ext2), then there is no reason why it would not work at boot time, granted > > > the configs (/etc/fstab and kernel parameters for root) have not been > > > changed. > > > Therefore, the journal is probably fine and it is not the root cause. > > > We have already ruled out kernel damage. > > > > > > > > > > > > > > > My suggestion: > > > 1) check out the RAID container status in the setup screen (USE GREAT > > > CAUTION!! DO NOT CHANGE ANYTHING OR MAKE IT DO ANYTHING THAT YOU DO NOT > > > UNERSTAND OR YOU CAN LOSE ALL OF YOUR DATA!!!). > > > This can give you a rough idea of what may be going on. > > > > > > You can begin recovery by: > > > 1) Get another disk onto the machine that is large enough to store the > > > data that you think is necessary to salvage. > > > 2) boot into that rescue image again, mount your RAID container, and > > > slowly (little at a time) copy over the necessary files that you need to > > > the new disk. > > > > > > > > > There may be more possibilities here, but I am jsut going by the > > > information presented and the first things that come to mind. > > > > > > > > > I wish you good luck. > > > If you think you really need my assistance, I can fly out there.. We will > > > just have to go over the costs. > > > If this is enough to help you on your way, then that is great. > > > > > > > > > Also, I would like to post this back onto the newsgroups just so that > > > other people who experience the same problem can benefit, if that is ok > > > with you. > > > I have been working with Linux professionally for over 7 years, and have > > > not contributed back to the community much at all. ;) > > > > > > > > > > > > Let me know if this helps, or if you would like to move forward with > > > consultation. > > > > > > > > > > > > Best Regards, > > > Art Perry > > > alp at perryconsulting.net > > > http://www.perryconsulting.net > > > http://www.linuxfarms.com > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 25 May 2004, Christopher Welton wrote: > > > > > > > Arthur: > > > > > > > > I read your reply to a problem in the redhat ext3 mailing list. I'd like > > > > to request your assistance in solving a serious problem we are having > > > > with one of our servers. Here are the details: > > > > > > > > I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium > > > > 266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID > > > > controller and array. All file systems are ext3. > > > > > > > > The server has been in service for a couple of years now. From time to > > > > time we will lose power in our office or have another situtation that > > > > causes the server to lose power without a proper shutdown. We had such a > > > > situation today. > > > > > > > > Usually the server reboots to runlevel 5 without a problem. However, > > > > today the server rebooted to the point in the boot process just after > > > > mounting the root filesystem. It then stalls indefinitely and does not > > > > continue to boot. > > > > > > > > I used a recovery CD to boot into a rescue shell. Once there I > > > > successfully mounted all the partitions on all drives and examined the > > > > files successfully, so the data, filesystem and hardware all look good. > > > > The filesystems were mounted ext2, not ext3. > > > > > > > > At this point, my suspicions are that some portion of the kernel > > > > required for booting was damaged during the power-loss shutdown or that > > > > the ext3 journal was damaged in such a way as to block booting. > > > > > > > > I need suggestions on possible causes of the problem and, better yet, > > > > possible solutions. > > > > > > > > Pls let me know: > > > > 1. If you think you can solve this issue. > > > > and > > > > 2. What rate you would bill at > > > > > > > > Thank you in advance > > > > > > > > Chris Welton > > > > Owner > > > > JumpNowUSA! > > > > 562-946-6683 > > > > cwelton at jumpnowusa.com > > > > > > > > > > > > From tytso at mit.edu Thu May 27 19:04:45 2004 From: tytso at mit.edu (Theodore Ts'o) Date: Thu, 27 May 2004 15:04:45 -0400 Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd) In-Reply-To: References: Message-ID: <20040527190445.GC29793@thunk.org> The one thing I would add to this is that if you are using any kind of partitioning scheme on top of the hardware RAID device, make sure your partition table is sane first. If the starting block or the size of the partition is wrong, running e2fsck will also do much more damage. In general, the rule is: 1) Make sure the block device is sane. 2) Make sure the partition table is sane 3) Run e2fsck If you're not sure, you can try running e2fsck -n first, or making a full image backup first. - Ted On Thu, May 27, 2004 at 10:11:12AM -0400, Arthur Perry wrote: > > > ---------- Forwarded message ---------- > Date: Thu, 27 May 2004 08:43:30 -0400 (EDT) > From: Arthur Perry > To: ext3 at linuxfarms.com > Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd) > > > > ---------- Forwarded message ---------- > Date: Thu, 27 May 2004 08:11:17 -0400 (EDT) > From: Arthur Perry > To: Christopher Welton > Cc: ext3-users at redhat.com > Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd) > > Hi Chris, > > I put the whole thread into ext3-users at redhat.com, so others can benefit. > > e2fsck will repair filesystems damage, but it will do nothing for the RAID > container. > If your damage exists primarily on the RAID container, then what you want > to do is repair that first! > Otherwise, you may be making corrections to a filesystem and writing those > changes to what could be mapped as bad blocks, and only worsening your > situation. > > It all depends on what kind of RAID container you have, the type of > damage, and the extent of damage. > > So in a simple answer, the filesystem check MAY appear to fix it for you > in the really short term, > or it may not. It's really all about your RAID container first. > > In my experience, we have had bad luck with RAID5 on certain controllers. > I do not know anything about the particular one you have listed below. > > I would be sure to back up anything you can with a rescue disk boot before > making any changes to your disk, simply because at this time, it is > unknown exactly what you are dealing with. > > > Best Regards, > Art Perry > > > > > On Wed, 26 May 2004, Christopher Welton wrote: > > > Thanks Art, your response is useful. I have posted my original question > > to the RH 7.3 and RH ext3 lists. Feel free to post your question and/or > > answers. > > > > One thing I would like to know is how to recover from the damage to the > > RAID container scenario. Is this just a matter of running e2fsck? Pls > > let me know. > > > > Thanks for your help! > > > > Chris > > > > On Wed, 2004-05-26 at 17:20, Arthur Perry wrote: > > > Hi Chris, > > > > > > I just wanted to know if you have tried to fix this problem yet, and if > > > any of my info and suggestions were helpful. > > > > > > Best of Luck, > > > Art Perry > > > > > > > > > > > > On Tue, 25 May 2004, Arthur Perry wrote: > > > > > > > > > > > > > > > Oh, I just wanted to add a very important suggestion: > > > > Do not run fsck or anything that may modify the filesystem on that > > > > container until you have attempted to back up the important data first! > > > > At least, not until you are confident that what you are dealing with here > > > > is not any related to any damage to that container. > > > > In theory, the fsck may work out fine, but we (at least I not being there) > > > > are not sure about what is really going on. > > > > > > > > In practice, when there is any question about a medium's integrity, don't > > > > do anything further to it until the important data is extracted from it > > > > and backed up the best that you can before proceeding. > > > > I would mount read-only. > > > > > > > > Just a heads up! > > > > > > > > > > > > > > > > ---------- Forwarded message ---------- > > > > Date: Tue, 25 May 2004 12:14:16 -0400 (EDT) > > > > From: Arthur Perry > > > > To: Christopher Welton > > > > Cc: alp at linuxfarms.com > > > > Subject: Re: Linux consultation needed > > > > > > > > Hi Christopher, > > > > > > > > By your description, I do believe that I can fix this problem and it may be rather easy. > > > > > > > > Unfortunately, I live in Massachusetts and it may be rather expensive to get me down there. > > > > At the moment, I am working full-time for a large international computer corporation. So to make that trip, it would cut into my > > > > vacation time and so I would have to be compensated for that properly. > > > > > > > > > > > > > > > > Off the top of my head, I see two possible scenarios: > > > > > > > > 1) damage to the raid container > > > > if you were able to mount the filesystem with a rescue system, (and you > > > > monted it in ext3, not ext2), then I can assume that the filesystem may > > > > believe that it is in order. (One would know for sure once you run fsck). > > > > However, that does not mean the underlaying block layer is ok. > > > > The RAID container presents itself to the OS as a uniform block device, > > > > which the filesystem sits on top of. > > > > How each block gets distributed across the physical disks is entirely up > > > > to the RAID hardware, and identifying whether or not a problem exists with > > > > the RAID container is also the responsibility of the RAID hardware. > > > > That being said, if the OS has no drivers that would directly interface > > > > with the RAID hardware to collect the status of the container, there is no > > > > way you could tell, unless this hardware had some sort of beep or buzzer to warn you > > > > of this. You could also enter the setup screen of the RAID controller at > > > > boot time and check the health of the container there. > > > > > > > > The reason why I think this is a possibility is because there may be > > > > "corrupted" blocks in this RAID container that exist in areas that are > > > > read during the boot process (and I use this term generally), and not > > > > necessarily in locations where the kernel itself reside. > > > > The kernel has such a small footprint, that this is not only > > > > unlikely but probably wouldn't happen because the kernel would not be able > > > > to completely decompress successfully to continue excecution before boot. > > > > If this were the case, the failure mode would probably be more severe. > > > > > > > > > > > > > > > > 2) damage to the hardware > > > > It is possible that the hardware has become somehow damaged or changed, > > > > where at boot time when a hardware probe is performed by Kudzu, it locks > > > > up the system entirely. > > > > An example of this is a bad DIMM or possibly some other peripheral. > > > > Maybe there is a hung-up SCSI device on the chain that is on a separate > > > > power supply that just needs to be "reset" during the next reboot. > > > > > > > > > > > > > > > > If you were able to mount the filesystem from a rescue disk (in ext3 not > > > > ext2), then there is no reason why it would not work at boot time, granted > > > > the configs (/etc/fstab and kernel parameters for root) have not been > > > > changed. > > > > Therefore, the journal is probably fine and it is not the root cause. > > > > We have already ruled out kernel damage. > > > > > > > > > > > > > > > > > > > > My suggestion: > > > > 1) check out the RAID container status in the setup screen (USE GREAT > > > > CAUTION!! DO NOT CHANGE ANYTHING OR MAKE IT DO ANYTHING THAT YOU DO NOT > > > > UNERSTAND OR YOU CAN LOSE ALL OF YOUR DATA!!!). > > > > This can give you a rough idea of what may be going on. > > > > > > > > You can begin recovery by: > > > > 1) Get another disk onto the machine that is large enough to store the > > > > data that you think is necessary to salvage. > > > > 2) boot into that rescue image again, mount your RAID container, and > > > > slowly (little at a time) copy over the necessary files that you need to > > > > the new disk. > > > > > > > > > > > > There may be more possibilities here, but I am jsut going by the > > > > information presented and the first things that come to mind. > > > > > > > > > > > > I wish you good luck. > > > > If you think you really need my assistance, I can fly out there.. We will > > > > just have to go over the costs. > > > > If this is enough to help you on your way, then that is great. > > > > > > > > > > > > Also, I would like to post this back onto the newsgroups just so that > > > > other people who experience the same problem can benefit, if that is ok > > > > with you. > > > > I have been working with Linux professionally for over 7 years, and have > > > > not contributed back to the community much at all. ;) > > > > > > > > > > > > > > > > Let me know if this helps, or if you would like to move forward with > > > > consultation. > > > > > > > > > > > > > > > > Best Regards, > > > > Art Perry > > > > alp at perryconsulting.net > > > > http://www.perryconsulting.net > > > > http://www.linuxfarms.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 25 May 2004, Christopher Welton wrote: > > > > > > > > > Arthur: > > > > > > > > > > I read your reply to a problem in the redhat ext3 mailing list. I'd like > > > > > to request your assistance in solving a serious problem we are having > > > > > with one of our servers. Here are the details: > > > > > > > > > > I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium > > > > > 266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID > > > > > controller and array. All file systems are ext3. > > > > > > > > > > The server has been in service for a couple of years now. From time to > > > > > time we will lose power in our office or have another situtation that > > > > > causes the server to lose power without a proper shutdown. We had such a > > > > > situation today. > > > > > > > > > > Usually the server reboots to runlevel 5 without a problem. However, > > > > > today the server rebooted to the point in the boot process just after > > > > > mounting the root filesystem. It then stalls indefinitely and does not > > > > > continue to boot. > > > > > > > > > > I used a recovery CD to boot into a rescue shell. Once there I > > > > > successfully mounted all the partitions on all drives and examined the > > > > > files successfully, so the data, filesystem and hardware all look good. > > > > > The filesystems were mounted ext2, not ext3. > > > > > > > > > > At this point, my suspicions are that some portion of the kernel > > > > > required for booting was damaged during the power-loss shutdown or that > > > > > the ext3 journal was damaged in such a way as to block booting. > > > > > > > > > > I need suggestions on possible causes of the problem and, better yet, > > > > > possible solutions. > > > > > > > > > > Pls let me know: > > > > > 1. If you think you can solve this issue. > > > > > and > > > > > 2. What rate you would bill at > > > > > > > > > > Thank you in advance > > > > > > > > > > Chris Welton > > > > > Owner > > > > > JumpNowUSA! > > > > > 562-946-6683 > > > > > cwelton at jumpnowusa.com > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From adilger at clusterfs.com Thu May 27 19:23:53 2004 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 27 May 2004 13:23:53 -0600 Subject: Separate common journal device In-Reply-To: <20040524194509.67338.qmail@web61006.mail.yahoo.com> References: <20040421092301.GD2938@schnapps.adilger.int> <20040524194509.67338.qmail@web61006.mail.yahoo.com> Message-ID: <20040527192353.GO2603@schnapps.adilger.int> On May 24, 2004 12:45 -0700, M K wrote: > On a related note, wouldnt it be more efficient to > have a single dedicated hard drive, with multiple > partitions to store journals - one for each ext3 > system? No, because then each filesystem would cause seeking to its part of the journal for each transaction (unless the dedicated device was NVRAM). In general, the writes to the journal are pure overhead unless you actually crash so they need to be as efficient as possible at write time and the complexity at recovery time is much less critical. Having all of the journal writes for multiple filesystems stream to a single block device without any seeking is the best. Making a larger journal also helps a lot in the performance area, but you can't always afford to consume so much RAM on a system (especially a larger journal for each filesystem). > --- Andreas Dilger wrote: > > It is possible now to use an external block device > > for a single filesystem. > > The on-disk format is designed to allow multiple > > filesystems to share the > > same device, but that has never been fully > > implemented. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ From maheshext3 at yahoo.com Thu May 27 19:59:15 2004 From: maheshext3 at yahoo.com (M K) Date: Thu, 27 May 2004 12:59:15 -0700 (PDT) Subject: Separate common journal device In-Reply-To: <20040527192353.GO2603@schnapps.adilger.int> Message-ID: <20040527195915.31681.qmail@web61009.mail.yahoo.com> Oh, good point.. BTW, what should the ideal size of a journal be ? is there a general guideline to follow ? I am sort-of a newbie to ext3, any advice on choosing the journal size would be great! Thanks in advance, Mahesh --- Andreas Dilger wrote: > On May 24, 2004 12:45 -0700, M K wrote: > > On a related note, wouldnt it be more efficient to > > have a single dedicated hard drive, with multiple > > partitions to store journals - one for each ext3 > > system? > > No, because then each filesystem would cause seeking > to its part of the > journal for each transaction (unless the dedicated > device was NVRAM). > In general, the writes to the journal are pure > overhead unless you > actually crash so they need to be as efficient as > possible at write time > and the complexity at recovery time is much less > critical. > > Having all of the journal writes for multiple > filesystems stream to a > single block device without any seeking is the best. > Making a larger > journal also helps a lot in the performance area, > but you can't always > afford to consume so much RAM on a system > (especially a larger journal > for each filesystem). > > > --- Andreas Dilger wrote: > > > It is possible now to use an external block > device > > > for a single filesystem. > > > The on-disk format is designed to allow multiple > > > filesystems to share the > > > same device, but that has never been fully > > > implemented. > > Cheers, Andreas > -- > Andreas Dilger > http://sourceforge.net/projects/ext2resize/ > http://www-mddsp.enel.ucalgary.ca/People/adilger/ > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ From adilger at clusterfs.com Thu May 27 20:17:21 2004 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 27 May 2004 14:17:21 -0600 Subject: Separate common journal device In-Reply-To: <20040527195915.31681.qmail@web61009.mail.yahoo.com> References: <20040527192353.GO2603@schnapps.adilger.int> <20040527195915.31681.qmail@web61009.mail.yahoo.com> Message-ID: <20040527201721.GQ2603@schnapps.adilger.int> On May 27, 2004 12:59 -0700, M K wrote: > Oh, good point.. BTW, what should the ideal size of a > journal be ? is there a general guideline to follow ? > I am sort-of a newbie to ext3, any advice on choosing > the journal size would be great! It hasn't really been discussed much. I'd benchmark with your apps to see what is best. For Lustre (which often has hundreds of clients doing large IOs to a single ext3 filesystem at one time) we use the largest journal size possible (400MB) to reduce the possibility that the clients get blocked waiting for journal space. Lustre creates very large journal transactions so having a larger journal means we can have more concurrent handles open and definitely improved performance. Most of our server nodes also have gobs of RAM, so journal size isn't much of an issue. For other apps there is probably some upper limit where the journal never really gets too full, and making it larger doesn't help much. It also slows down journal recovery a bit with a very large journal. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ From root at hunley.homeip.net Thu May 27 20:31:38 2004 From: root at hunley.homeip.net (Douglas J Hunley) Date: Thu, 27 May 2004 16:31:38 -0400 Subject: confirm 192a133e70b9ed319286a068cd0aab526869f14a In-Reply-To: References: Message-ID: <200405271631.40227.root@hunley.homeip.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 ext3-users-request at redhat.com wrote: > Mailing list removal confirmation notice for mailing list Ext3-users > > We have received a request for the removal of your email address, > "root at hunley.homeip.net" from the ext3-users at redhat.com mailing list. > To confirm that you want to be removed from this mailing list, simply > reply to this message, keeping the Subject: header intact. Or visit > this web page: > > > https://www.redhat.com/mailman/confirm/ext3-users/192a133e70b9ed319286a068c >d0aab526869f14a > > > Or include the following line -- and only the following line -- in a > message to ext3-users-request at redhat.com: > > confirm 192a133e70b9ed319286a068cd0aab526869f14a > > Note that simply sending a `reply' to this message should work from > most mail readers, since that usually leaves the Subject: line in the > right form (additional "Re:" text in the Subject: is okay). > > If you do not wish to be removed from this list, please simply > disregard this message. If you think you are being maliciously > removed from the list, or have any other questions, send them to > ext3-users-owner at redhat.com. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAtlAq2MO5UukaubkRAjMwAKCCkx5BweX068GetC1GhyeykXTvsACgojCR K3/IV3JaxUneCPC2ShHAxuM= =mzS9 -----END PGP SIGNATURE-----