From cchan at outblaze.com  Sun May  2 15:38:42 2004
From: cchan at outblaze.com (Christopher Chan)
Date: Sun, 02 May 2004 23:38:42 +0800
Subject: 2.6.5 and latest Fedora Core 1 kernels cannot handle	files	over
 2.x GB?
In-Reply-To: <4089B9B6.2000706@outblaze.com>
References: <408876C1.80601@outblaze.com>		<1082719383.2100.9.camel@sisko.scot.redhat.com>		<40893D7A.4070706@outblaze.com>	<1082738929.2100.23.camel@sisko.scot.redhat.com>
	<4089B9B6.2000706@outblaze.com>
Message-ID: <40951602.2070403@outblaze.com>

Christopher Chan wrote:
> Stephen C. Tweedie wrote:
> 
>> Hi,
>>
>> On Fri, 2004-04-23 at 16:59, Christopher Chan wrote:
>>
>>
>>>> Your filesystem is corrupt.  You need to run e2fsck to fix it up, and
>>>> check the files against a backup. 
>>>> There's not enough information here to begin to diagnose _why_ they are
>>>> corrupt, but on 2.4 systems it's bad hardware 99% of the time. 
>>>> "memtest86" is usually a good place to start.

disk replacements solved the problem.

Just FYI in case you do memtest and what not and still got no clue.


From guolin at alexa.com  Wed May  5 22:29:34 2004
From: guolin at alexa.com (Guolin Cheng)
Date: Wed, 5 May 2004 15:29:34 -0700
Subject: recover data from failed hard drives on Fedora Core 1
Message-ID: <41089CB27BD8D24E8385C8003EDAF7ABBA493F@karl.alexa.com>

Hi,

 
 I got a problem to recover data from Fedora core 1 hosts when hard
drives fail. I know that the disks fail because there are error messages
like the following logged in /var/log/messages.

 
.....

arc144: Apr 24 12:52:53 arc144 kernel: hda: dma_intr: status=0x51 {
DriveReady SeekComplete Error }

arc144: Apr 24 12:52:53 arc144 kernel: hda: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=90186647, sector=52432952

arc144: Apr 24 12:52:53 arc144 kernel: end_request: I/O error, dev 03:0b
(hda), sector 52432952

.....

 
My question is: How to figure out the file|directory occupying the
failed sector|LBAsect?  If I can figure it out then I can skipped the
files|directories since the failed files will sometimes bring the failed
drive to completely inaccessible status on Fedora Core 1 hosts, which is
quite different from my former Redhat 8.0. 

 
 Another questions is, what is the exact difference between LABsect and
sector in the above message? Can I find any helpful&complete info on
ext2|ext3 internals? At least related to disk space allocation.

 
Any suggestions are greatly appreciated. Thanks.

 
--Guolin Cheng

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20040505/900b1e5e/attachment.htm>

From philip at texas.net  Mon May 10 18:48:59 2004
From: philip at texas.net (Philip Molter)
Date: Mon, 10 May 2004 13:48:59 -0500
Subject: EIO vs. ENOENT on disk failure
Message-ID: <20040510184859.GI39626@staff.texas.net>

I've got a filesystem-based JBOD setup.  During testing, I failed
out one of the drives and tried to access the filesystem on it.
Here are the results I got:

stat /disk                              succeeds
open /disk for reading                  succeeds
readdir (getdents) for /disk            fails EIO
open /disk/noexist for reading          fails ENOENT
open /disk/noexist for writing          fails EIO
open /disk/exists for reading           fails ENOENT
open /disk/exit for writing             fails EIO
open /disk/subdir/noexist for reading   fails ENOENT
open /disk/subdir/noexist for writing   fails ENOENT
open /disk/subdir/exist for reading     fails ENOENT
open /disk/subdir/exist for writing     fails ENOENT

I would expect every one of these to fail with an EIO given that
the underlying disk is gone and that's the cause of the failure.
What's the logic behind returning ENOENT for stats and opens when
the disk isn't there?

Thanks,
Philip

* Philip Molter
* Texas.Net Internet
* http://www.texas.net/
* philip at texas.net


From maheshext3 at yahoo.com  Thu May 13 02:09:40 2004
From: maheshext3 at yahoo.com (M K)
Date: Wed, 12 May 2004 19:09:40 -0700 (PDT)
Subject: EXT3 performance on Large (multi-TeraByte) RAID
Message-ID: <20040513020940.67583.qmail@web61006.mail.yahoo.com>

Has anyone experienced a significant degradation in ext3 performance when using it on a Multi-TeraByte RAID? As part of an experimental setup, I hooked up three 300GB drives and made an EXT3 RAID5 out of them, using the entire space one each drive, and started throwing a large number of files in the size-range 3KB to 50 KB. Then, I deleted the raid, and created a new one, but this time, I used only 3 Gigs from each drive (a very small RAID compared to the earlier one). After repeating the same test, a huge improvement in performance was see - hence, the question : does ext3 performance degrade significantly as the file system size increases?
 
Thanks in Advance,
MK

		
---------------------------------
Do you Yahoo!?
Yahoo! Movies - Buy advance tickets for 'Shrek 2' 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20040512/395c2ec8/attachment.htm>

From cpwright at cpwright.com  Thu May 13 02:49:35 2004
From: cpwright at cpwright.com (Charles P. Wright)
Date: Wed, 12 May 2004 22:49:35 -0400
Subject: EXT3 performance on Large (multi-TeraByte) RAID
In-Reply-To: <20040513020940.67583.qmail@web61006.mail.yahoo.com>
References: <20040513020940.67583.qmail@web61006.mail.yahoo.com>
Message-ID: <1084416574.3098.5.camel@arcticfox.foo>

This can easily be explained by seek time.

If you have a 3GB partition on a 300GB disk, you are only using 1% of
the surface of your disk.  During the test on a smaller partition, head
doesn't have to move as far as it does with the larger partition.

Charles

On Wed, 2004-05-12 at 22:09, M K wrote:
> Has anyone experienced a significant degradation in ext3 performance
> when using it on a Multi-TeraByte RAID? As part of an experimental
> setup, I hooked up three 300GB drives and made an EXT3 RAID5 out of
> them, using the entire space one each drive, and started throwing a
> large number of files in the size-range 3KB to 50 KB. Then, I deleted
> the raid, and created a new one, but this time, I used only 3 Gigs
> from each drive (a very small RAID compared to the earlier one). After
> repeating the same test, a huge improvement in performance was see -
> hence, the question : does ext3 performance degrade significantly as
> the file system size increases?
>  
> Thanks in Advance,
> MK
> 
> 
> ______________________________________________________________________
> Do you Yahoo!?
> Yahoo! Movies - Buy advance tickets for 'Shrek 2' 
> 
> ______________________________________________________________________
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


From adilger at clusterfs.com  Thu May 13 06:08:33 2004
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 13 May 2004 00:08:33 -0600
Subject: EXT3 performance on Large (multi-TeraByte) RAID
In-Reply-To: <20040513020940.67583.qmail@web61006.mail.yahoo.com>
References: <20040513020940.67583.qmail@web61006.mail.yahoo.com>
Message-ID: <20040513060833.GX9641@schnapps.adilger.int>

On May 12, 2004  19:09 -0700, M K wrote:
> Has anyone experienced a significant degradation in ext3 performance
> when using it on a Multi-TeraByte RAID? As part of an experimental setup,
> I hooked up three 300GB drives and made an EXT3 RAID5 out of them, using
> the entire space one each drive, and started throwing a large number
> of files in the size-range 3KB to 50 KB. Then, I deleted the raid, and
> created a new one, but this time, I used only 3 Gigs from each drive (a
> very small RAID compared to the earlier one). After repeating the same
> test, a huge improvement in performance was see - hence, the question:
> does ext3 performance degrade significantly as the file system size
> increases?

Are you using 2.4 or 2.6 kernels?  In the 2.4 kernel files are allocated
evenly across all of the filesystem space.  However, for large filesystems
this is not very effective.  In 2.6 kernels the Orlov allocator will keep
files created by the same process (e.g. tar) in a single group if possible,
which localizes block allocation and avoids seeks.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


From lists at luko.org  Thu May 13 06:27:41 2004
From: lists at luko.org (Luke Rosenthal)
Date: Thu, 13 May 2004 16:27:41 +1000 (EST)
Subject: EXT3 performance on Large (multi-TeraByte) RAID
In-Reply-To: <6.0.2.0.0.20040506232706.03bb40e0@192.168.15.1>
Message-ID: <Pine.LNX.4.44.0405131619550.12694-100000@gw.luko.org>

On Thu, 13 May 2004, Andreas Dilger wrote:

> Are you using 2.4 or 2.6 kernels?  In the 2.4 kernel files are allocated
> evenly across all of the filesystem space.  However, for large
> filesystems this is not very effective.  In 2.6 kernels the Orlov
> allocator will keep

uh oh.  I've been lurking on this list for some time, trying to learn as
much about ext3 as possible.  This looks bad.  Can I ask for some advice?

Say I had a 45gb disk, with a 30gb ext3 partition at the end on which I
had some important stuff stored.  I remove all partitions on the disk,
create one large ext3 partition and begin filling the disk with data.  I
realise my mistake after about 8gb was written.  Is it too late to restore
about 300mb of JPG files from this disk?

Do writes happen sequentially, or are they scattered all over the place?
If they are scattered, would they have hosed the data completely or can
parts be recovered?

Luke.


From maheshext3 at yahoo.com  Thu May 13 12:10:51 2004
From: maheshext3 at yahoo.com (M K)
Date: Thu, 13 May 2004 05:10:51 -0700 (PDT)
Subject: EXT3 performance on Large (multi-TeraByte) RAID
In-Reply-To: <1084416574.3098.5.camel@arcticfox.foo>
Message-ID: <20040513121051.61183.qmail@web61002.mail.yahoo.com>

Oh I see.. Thanks.. 
So HDD's RPM and on-drive buffer would matter a lot on large RAIDs then? is there a way to tune the file system to minimise this impact ?
Again, Thanks in advance!
MK

"Charles P. Wright" <cpwright at cpwright.com> wrote:
This can easily be explained by seek time.

If you have a 3GB partition on a 300GB disk, you are only using 1% of
the surface of your disk. During the test on a smaller partition, head
doesn't have to move as far as it does with the larger partition.

Charles

On Wed, 2004-05-12 at 22:09, M K wrote:
> Has anyone experienced a significant degradation in ext3 performance
> when using it on a Multi-TeraByte RAID? As part of an experimental
> setup, I hooked up three 300GB drives and made an EXT3 RAID5 out of
> them, using the entire space one each drive, and started throwing a
> large number of files in the size-range 3KB to 50 KB. Then, I deleted
> the raid, and created a new one, but this time, I used only 3 Gigs
> from each drive (a very small RAID compared to the earlier one). After
> repeating the same test, a huge improvement in performance was see -
> hence, the question : does ext3 performance degrade significantly as
> the file system size increases?
> 
> Thanks in Advance,
> MK
> 
> 
> ______________________________________________________________________
> Do you Yahoo!?
> Yahoo! Movies - Buy advance tickets for 'Shrek 2' 
> 
> ______________________________________________________________________
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users

		
---------------------------------
Do you Yahoo!?
Yahoo! Movies - Buy advance tickets for 'Shrek 2' 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20040513/b3f3a181/attachment.htm>

From maheshext3 at yahoo.com  Thu May 13 13:02:51 2004
From: maheshext3 at yahoo.com (M K)
Date: Thu, 13 May 2004 06:02:51 -0700 (PDT)
Subject: Preferable bdflush values for EXT3 performance on Large
	(multi-TeraByte) RAID
In-Reply-To: <20040513121051.61183.qmail@web61002.mail.yahoo.com>
Message-ID: <20040513130251.23879.qmail@web61003.mail.yahoo.com>

Sorry I forgot to ask:
are there any values for bdflush that work better for large / very large ext3 partitions with a very large number of writes?


M K <maheshext3 at yahoo.com> wrote:
Oh I see.. Thanks.. 
So HDD's RPM and on-drive buffer would matter a lot on large RAIDs then? is there a way to tune the file system to minimise this impact ?
Again, Thanks in advance!
MK

"Charles P. Wright" <cpwright at cpwright.com> wrote:
This can easily be explained by seek time.

If you have a 3GB partition on a 300GB disk, you are only using 1% of
the surface of your disk. During the test on a smaller partition, head
doesn't have to move as far as it does with the larger partition.

Charles

On Wed, 2004-05-12 at 22:09, M K wrote:
> Has anyone experienced a significant degradation in ext3 performance
> when using it on a Multi-TeraByte RAID? As part of an experimental
> setup, I hooked up three 300GB drives and made an EXT3 RAID5 out of
> them, using the entire space one each drive, and started throwing a
> large number of files in the size-range 3KB to 50 KB. Then, I deleted
> the raid, and created a new one, but this time, I used only 3 Gigs
> from each drive (a very small RAID compared to the earlier one). After
> repeating the same ! test, a huge improvement in performance was see -
> hence, the question : does ext3 performance degrade significantly as
> the file system size increases?
> 
> Thanks in Advance,
> MK
> 
> 
> ______________________________________________________________________
> Do you Yahoo!?
> Yahoo! Movies - Buy advance tickets for 'Shrek 2' 
> 
> ______________________________________________________________________
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


---------------------------------
Do you Yahoo!?
Yahoo! Movies - Buy advance tickets for 'Shrek 2' _______________________________________________
Ext3-users mailing list
Ext3-users at redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users


---------------------------------
Do you Yahoo!?
Yahoo! Movies - Buy advance tickets for 'Shrek 2' 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20040513/d636e235/attachment.htm>

From paolo at php3.it  Wed May 19 08:07:45 2004
From: paolo at php3.it (Paolo Dina)
Date: Wed, 19 May 2004 10:07:45 +0200
Subject: cp weird behaviour, some copied files differ from original.
Message-ID: <40AB15D1.9000104@php3.it>

Hi.
I know that for the problem that will follow many things other than ext3 
are involved, like a kernel upgrade, a possible coreutils mis-behaviour 
and other, so I beg your pardon if I'm not in the right place ...

Now the problem. I have met a trouble upgrading web/dns server running 
linux.
Precisely, I have upgraded the kernel from 2.4.9 to 2.6.6 and the hard 
disk also, from 20Gb to 80Gb.
Ok for the kernel, all is ok.

Problems came out with hard disk upgrade. I followed the "Hard Disk 
Upgrade Mini How-To" (http://www.tldp.org/HOWTO/Hard-Disk-Upgrade/).
After the copy of files from the old disk (ext3) to the new one (ext3 
also, but with larger partitions) some problems arose.
In fact the command to compare the two disks after the copy, find / 
-path /proc -prune -o -path /new-disk -prune -o -xtype f -exec cmp {} 
/new-disk{} \;
tells that there are some differences in some files.

Looking with vi I saw that it's true indeed.. Some characters are 
different! Example, in a log file, there is a line where a '1' become a 
'y' in the word 'May', and so on.
Manually replacing the "corrupted" file with the original one and 
running again the find command I have that other files differ. And 
running again, all seems to be ok..
The behaviour is quite umpredictable :\

I have copied some hundreds of thousand of files, and just a little 
percentage seems to be "damaged", but I need to know the cause of this fact.

Can you help in some way?

Thaks for any reply,
Paolo

P.S. I ran memtest and hard disk diagnostic, hardware results to be ok.


From stevew at aui.com  Thu May 20 20:40:45 2004
From: stevew at aui.com (Steve Watford)
Date: Thu, 20 May 2004 16:40:45 -0400
Subject: HD Partition Lost??
Message-ID: <200405201640.45015.stevew@aui.com>

Help!

I am running FC1, with a maxtor 160GB HD installed.  Drive is only about a 
month old. I had just finished moving data from an older HD that was making 
some funny sounds and throwing some random errors. All was well.  Last night 
we lost power for several hours and 1 machine's UPS/shutdown system didn't 
power down the workstation. The other 7 in our office did fine.  The power 
was out for several hours.  
Anyway, I now have an unbootable partition with quite bit of unbacked up data 
on it I am trying to recover.  In the past, whenever we would have a disk 
problem I was always able to recover with tomsrt, e2fsck etc.  Sometimes it 
took a little while, but I can't even get started on this one.  I have been 
looking at lde but don't think I know enough of what I'm doing to make that 
work. During the booting of the machine it throws the following errors:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772, high=5, 
low=7009692, sector=0
end_request: I/O error, dev 03:03 (hda), sector 0
... repeats 10 time
kjournald starting.  Commit interval 5 seconds

I have commented the particular partition (/dev/hda3) out of the fstab file so 
it isn't even trying to mount it. When trying to mount it manually I get the 
usual:

steve]# mount -t ext3 /dev/hda3 /data
mount: wrong fs type, bad option, bad superblock on /dev/hda3,
       or too many mounted file systems

e2fsck yields the following:

steve]# /sbin/e2fsck /dev/hda3
e2fsck 1.34 (25-Jul-2003)
/sbin/e2fsck: Attempt to read block from filesystem resulted in short read 
while trying to open /dev/hda3
Could this be a zero-length partition?

[root at mercury steve]# /sbin/fdisk -l  /dev/hda

Disk /dev/hda: 163.9 GB, 163928604672 bytes
255 heads, 63 sectors/track, 19929 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1       522   4192933+   b  Win95 FAT32
/dev/hda2           523       535    104422+  83  Linux
/dev/hda3          5659     19929 114631807+  83  Linux
/dev/hda4           536      5658  41150497+   f  Win95 Ext'd (LBA)
/dev/hda5           536      3722  25599546   83  Linux
/dev/hda6          3723      4359   5116671   83  Linux
/dev/hda7          4360      4996   5116671   83  Linux
/dev/hda8          4997      5123   1020096   83  Linux
/dev/hda9          5124      5250   1020096   83  Linux
/dev/hda10         5251      5377   1020096   83  Linux
/dev/hda11         5378      5604   1823346   82  Linux swap
/dev/hda12         5605      5649    361431   83  Linux
/dev/hda13         5650      5657     64228+  83  Linux

Partition table entries are not in disk order

________________________________________________
We were set for the disk to mirror to another machine at 3 AM.  It never made 
it.  I really need to get to the data if at all possible.  Any suggestions 
would be much appreciated.

Steve


From evilninja at gmx.net  Thu May 20 22:22:52 2004
From: evilninja at gmx.net (evilninja)
Date: Fri, 21 May 2004 00:22:52 +0200
Subject: HD Partition Lost??
In-Reply-To: <40AD2A62.1090102@g-house.de>
References: <200405201640.45015.stevew@aui.com> <40AD2A62.1090102@g-house.de>
Message-ID: <40AD2FBC.4030203@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steve Watford schrieb:
| hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
| hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772,
| high=5,
| low=7009692, sector=0
| end_request: I/O error, dev 03:03 (hda), sector 0
| ... repeats 10 time
| kjournald starting.  Commit interval 5 seconds

hardware errors :-(
try at least to "dd" from the damaged partition, then try to fsck the
image.

Christian.
- --
BOFH excuse #43:

boss forgot system password
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFArS+8C/PVm5+NVoYRAk35AJ9DdT3hP0YmBDZ1SVszLWw7WprCAwCg6/5p
LT7apS8i4qcagiAl25j6A5w=
=D5et
-----END PGP SIGNATURE-----


From evilninja at gmx.net  Fri May 21 00:07:49 2004
From: evilninja at gmx.net (evilninja)
Date: Fri, 21 May 2004 02:07:49 +0200
Subject: HD Partition Lost??
In-Reply-To: <200405201812.52935.stevew@aui.com>
References: <200405201640.45015.stevew@aui.com> <40AD2A62.1090102@g-house.de>
	<200405201812.52935.stevew@aui.com>
Message-ID: <40AD4855.5090107@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steve Watford schrieb:
| Thanks,
| I'll give dd a try for backup puposes. I have already tried to fsck
| the
| partition it just comes back with a short read error.  Asking if it is
| a zero

always be careful to check an already damaged disk:

| Steve Watford schrieb:
| | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
| | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772,

being not a professional but a normal user i think this really looks
like some hw issues. so i'd suggest better to copy (dd) all the data
from the disk, as long as you have time to. every additional use of the
disk may cause its final death.

i wonder why your other partitions *seem* to still be ok (e.g. "mount"
and using its data then succeeds)...

Christian.

PS: sorry for contacting you directly, Steve, without cc-ing to the
list...my fault.
- --
BOFH excuse #406:

Bad cafeteria food landed all the sysadmins in the hospital.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFArUhVC/PVm5+NVoYRAmrwAKDB0zoNjyodGqgQsTfDR74mdJH8kQCg1kcq
IrXjP2q9ecIEXRigOLB8Y9Y=
=9vbd
-----END PGP SIGNATURE-----


From tkb9 at adelphia.net  Fri May 21 02:51:49 2004
From: tkb9 at adelphia.net (Toby Bluhm)
Date: Thu, 20 May 2004 22:51:49 -0400
Subject: HD Partition Lost??
In-Reply-To: <40AD4855.5090107@gmx.net>
References: <200405201640.45015.stevew@aui.com>
	<40AD2A62.1090102@g-house.de>	<200405201812.52935.stevew@aui.com>
	<40AD4855.5090107@gmx.net>
Message-ID: <40AD6EC5.1060408@adelphia.net>

evilninja wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Steve Watford schrieb:
> | Thanks,
> | I'll give dd a try for backup puposes. I have already tried to fsck
> | the
> | partition it just comes back with a short read error.  Asking if it is
> | a zero
>
> always be careful to check an already damaged disk:
>
> | Steve Watford schrieb:
> | | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> | | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772,
>
> being not a professional but a normal user i think this really looks
> like some hw issues. so i'd suggest better to copy (dd) all the data
> from the disk, as long as you have time to. every additional use of the
> disk may cause its final death.
>
> i wonder why your other partitions *seem* to still be ok (e.g. "mount"
> and using its data then succeeds)...
>
Ack! I've had too many failures with Maxtor disks.

Anyway, use conv=sync,noerror with your dd command, preferably to an 
identical disk. Do the entire disk /dev/hda. Yes, it will take a very 
long time, but you have no other recourse if you want maintain all your 
data. Then fsck the new disk. Fsck on the bad disk may just make matters 
worse.

Download Maxtor's ide utility & see if you can fix the bad one - may 
require a total disk rewrite.

Nothing unusual about  the other partitions being okay - just a matter 
of location of the sectors on the platter(s). Be suspicious of the 
entire disk though.

-- 
Toby Bluhm


From carles at unlimitedmail.org  Fri May 21 13:58:19 2004
From: carles at unlimitedmail.org (Carles Xavier Munyoz =?iso-8859-15?q?Bald=F3?=)
Date: Fri, 21 May 2004 15:58:19 +0200
Subject: Some bytes removed.
Message-ID: <200405211558.19172.carles@unlimitedmail.org>

Hi,
I have a 1 GigaByte size file and want to remove some bytes from it.
For this I must follow the next steps:
(1) Create a new file.
(2) Copy into it the bytes I will left from the original file.
(3) Remove the original file. 
(4) Move the new file to be the original file.

The problem with this process is that it uses lot of disk I/O.
Actually only one disk block of the file is modified.
Is there any other way to do that using an ext3 file system ?

Greetings.
---
Carles Xavier Munyoz Bald?
carles at unlimitedmail.org
http://www.unlimitedmail.net/
---


From evilninja at gmx.net  Fri May 21 20:50:50 2004
From: evilninja at gmx.net (evilninja)
Date: Fri, 21 May 2004 22:50:50 +0200
Subject: Some bytes removed.
In-Reply-To: <200405211558.19172.carles@unlimitedmail.org>
References: <200405211558.19172.carles@unlimitedmail.org>
Message-ID: <40AE6BAA.7040700@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Carles Xavier Munyoz Bald? schrieb:
| Hi,
| I have a 1 GigaByte size file and want to remove some bytes from it.
| For this I must follow the next steps:
| (1) Create a new file.
| (2) Copy into it the bytes I will left from the original file.
| (3) Remove the original file.
| (4) Move the new file to be the original file.
|
| The problem with this process is that it uses lot of disk I/O.
| Actually only one disk block of the file is modified.
| Is there any other way to do that using an ext3 file system ?

um, use and editor for doing things like this?  *gg*
you'll need a proper editor and lots of RAM anyway....

or try "dd". it can seek to a given block number and then copy out the
bits you want to have.

Christian.

- --
BOFH excuse #120:

we just switched to FDDI.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFArmupC/PVm5+NVoYRAiStAJ9eSTbBXu8MyRdlzeWyz6ZETUjnlACcDoR3
2YJ7UdicqEjRtlVGrRbyb14=
=Eg3T
-----END PGP SIGNATURE-----


From stevew at aui.com  Fri May 21 22:29:27 2004
From: stevew at aui.com (Steve Watford)
Date: Fri, 21 May 2004 18:29:27 -0400
Subject: HD Partition Lost??
In-Reply-To: <40AD2FBC.4030203@gmx.net>
References: <200405201640.45015.stevew@aui.com> <40AD2A62.1090102@g-house.de>
	<40AD2FBC.4030203@gmx.net>
Message-ID: <200405211829.27656.stevew@aui.com>

Yes, I downloaded the Maxtor diagnostic program and installed to a diskette.  
It reports that the drive is failing and to back up right away returning a 
diagnostic code for an RMA.  Oh well, guess it really is hardware.  I will be  
dd'ing it over to another drive and sending this one back.  Then I'll work on 
it from there.

Thanks,
Steve


On Thursday 20 May 2004 6:22 pm, evilninja wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Steve Watford schrieb:
> | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772,
> | high=5,
> | low=7009692, sector=0
> | end_request: I/O error, dev 03:03 (hda), sector 0
> | ... repeats 10 time
> | kjournald starting.  Commit interval 5 seconds
>
> hardware errors :-(
> try at least to "dd" from the damaged partition, then try to fsck the
> image.
>
> Christian.
> - --
> BOFH excuse #43:
>
> boss forgot system password
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQFArS+8C/PVm5+NVoYRAk35AJ9DdT3hP0YmBDZ1SVszLWw7WprCAwCg6/5p
> LT7apS8i4qcagiAl25j6A5w=
> =D5et
> -----END PGP SIGNATURE-----
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users

-- 
NOTICE: This e-mail message and any attachment to this e-mail message contains
confidential information that is legally privileged. If you are not the 
intended recipient, you must not review, retransmit, convert to hard copy, 
copy, use or disseminate this e-mail or any attachments to it. If you have 
received this e-mail in error, please notify us immediately by return e-mail 
or by telephone at 727-372-0115 and delete the original and all copies of 
this transmission (including any attachments).

Thank you


From stevew at aui.com  Fri May 21 22:36:54 2004
From: stevew at aui.com (Steve Watford)
Date: Fri, 21 May 2004 18:36:54 -0400
Subject: HD Partition Lost??
In-Reply-To: <40AD6EC5.1060408@adelphia.net>
References: <200405201640.45015.stevew@aui.com> <40AD4855.5090107@gmx.net>
	<40AD6EC5.1060408@adelphia.net>
Message-ID: <200405211836.54553.stevew@aui.com>

Yes, I downloaded the Maxtor diagnostic program and installed to a diskette.  
It reports that the drive is failing and to back up right away returning a 
diagnostic code for an RMA.  Oh well, guess it really is hardware.  I will be  
dd'ing it over to another drive and sending this one back.  Then I'll work on 
it from there.

What would be the exact syntax for the dd command for the entire disk as you 
suggested?  I will be installing a parallel drive in the morning, also a 
160GB drive as /dev/hdc with the bad one being /dev/hda.  The bad drive is 
nowhere near full, but is the max file size a problem still?  I shouldn't do 
just partitions instead?  Although the one with the actual problem is a 100GB 
partition, although only about 20% full. Around 22 Gig all together on the 
drive.

Thanks for the help,
Steve

On Thursday 20 May 2004 10:51 pm, Toby Bluhm wrote:
> evilninja wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Steve Watford schrieb:
> > | Thanks,
> > | I'll give dd a try for backup puposes. I have already tried to fsck
> > | the
> > | partition it just comes back with a short read error.  Asking if it is
> > | a zero
> >
> > always be careful to check an already damaged disk:
> > | Steve Watford schrieb:
> > | | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> > | | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772,
> >
> > being not a professional but a normal user i think this really looks
> > like some hw issues. so i'd suggest better to copy (dd) all the data
> > from the disk, as long as you have time to. every additional use of the
> > disk may cause its final death.
> >
> > i wonder why your other partitions *seem* to still be ok (e.g. "mount"
> > and using its data then succeeds)...
>
> Ack! I've had too many failures with Maxtor disks.
>
> Anyway, use conv=sync,noerror with your dd command, preferably to an
> identical disk. Do the entire disk /dev/hda. Yes, it will take a very
> long time, but you have no other recourse if you want maintain all your
> data. Then fsck the new disk. Fsck on the bad disk may just make matters
> worse.
>
> Download Maxtor's ide utility & see if you can fix the bad one - may
> require a total disk rewrite.
>
> Nothing unusual about  the other partitions being okay - just a matter
> of location of the sectors on the platter(s). Be suspicious of the
> entire disk though.


From tkb9 at adelphia.net  Sat May 22 03:03:06 2004
From: tkb9 at adelphia.net (Toby Bluhm)
Date: Fri, 21 May 2004 23:03:06 -0400
Subject: HD Partition Lost??
In-Reply-To: <200405211836.54553.stevew@aui.com>
References: <200405201640.45015.stevew@aui.com>
	<40AD4855.5090107@gmx.net>	<40AD6EC5.1060408@adelphia.net>
	<200405211836.54553.stevew@aui.com>
Message-ID: <40AEC2EA.1030908@adelphia.net>

Steve Watford wrote:

>Yes, I downloaded the Maxtor diagnostic program and installed to a diskette.  
>It reports that the drive is failing and to back up right away returning a 
>diagnostic code for an RMA.  Oh well, guess it really is hardware.  I will be  
>dd'ing it over to another drive and sending this one back.  Then I'll work on 
>it from there.
>
>What would be the exact syntax for the dd command for the entire disk as you 
>suggested?  I will be installing a parallel drive in the morning, also a 
>160GB drive as /dev/hdc with the bad one being /dev/hda.  The bad drive is 
>nowhere near full, but is the max file size a problem still?  I shouldn't do 
>just partitions instead?  Although the one with the actual problem is a 100GB 
>partition, although only about 20% full. Around 22 Gig all together on the 
>drive.
>
>Thanks for the help,
>Steve
>
>On Thursday 20 May 2004 10:51 pm, Toby Bluhm wrote:
>  
>
>>evilninja wrote:
>>    
>>
>>>-----BEGIN PGP SIGNED MESSAGE-----
>>>Hash: SHA1
>>>
>>>Steve Watford schrieb:
>>>| Thanks,
>>>| I'll give dd a try for backup puposes. I have already tried to fsck
>>>| the
>>>| partition it just comes back with a short read error.  Asking if it is
>>>| a zero
>>>
>>>always be careful to check an already damaged disk:
>>>| Steve Watford schrieb:
>>>| | hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
>>>| | hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=90895772,
>>>
>>>being not a professional but a normal user i think this really looks
>>>like some hw issues. so i'd suggest better to copy (dd) all the data
>>>from the disk, as long as you have time to. every additional use of the
>>>disk may cause its final death.
>>>
>>>i wonder why your other partitions *seem* to still be ok (e.g. "mount"
>>>and using its data then succeeds)...
>>>      
>>>
>>Ack! I've had too many failures with Maxtor disks.
>>
>>Anyway, use conv=sync,noerror with your dd command, preferably to an
>>identical disk. Do the entire disk /dev/hda. Yes, it will take a very
>>long time, but you have no other recourse if you want maintain all your
>>data. Then fsck the new disk. Fsck on the bad disk may just make matters
>>worse.
>>
>>Download Maxtor's ide utility & see if you can fix the bad one - may
>>require a total disk rewrite.
>>
>>Nothing unusual about  the other partitions being okay - just a matter
>>of location of the sectors on the platter(s). Be suspicious of the
>>entire disk though.
>>    
>>
>
>
>  
>

You may as well do the entire disk:

dd if=/dev/hda of=/dev/hdb bs=1k conf=sync,noerror

Install the new disk as hda & run your fsck on hda3.

-- 

Toby Bluhm


From adam.cassar at netregistry.com.au  Sun May 23 02:36:54 2004
From: adam.cassar at netregistry.com.au (Adam Cassar)
Date: Sun, 23 May 2004 12:36:54 +1000
Subject: ext3 htree issues
Message-ID: <1085279814.523.13.camel@akira>

Hi Guys,

I am running ext3 on kernel v2.6.5.

I have an ext3 filesystem with dir_index and data=journal for
/var/spool/exim

Today I noticed in the exim logs a bunch of 'failed to unlink
/var/spool/exim/input/P/1BRbSP-0006hy-Jp-D'

I also noticed these in the kernel logs:

EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file
(612870), 0
EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file
(22203), 0

I have never seen this before with exim and ext3. This is the first time
I have tried running htree on the exim spool however.


From adam.cassar at netregistry.com.au  Sun May 23 09:14:39 2004
From: adam.cassar at netregistry.com.au (Adam Cassar)
Date: Sun, 23 May 2004 19:14:39 +1000
Subject: ext3 htree issues
In-Reply-To: <1085279814.523.13.camel@akira>
References: <1085279814.523.13.camel@akira>
Message-ID: <1085303678.519.1.camel@akira>

In addition I received the following stack trace when unmounting the
file system.

sb orphan head is 22203
sb_info orphan list:
  inode hda12:29794 at d85e6b1c: mode 100640, nlink -1, next 0
Assertion failure in ext3_put_super() at fs/ext3/super.c:412:
"list_empty(&sbi->s_orphan)"
------------[ cut here ]------------
kernel BUG at fs/ext3/super.c:412!
invalid operand: 0000 [#1]
SMP 
CPU:    0
EIP:    0060:[<c0182a56>]    Not tainted
EFLAGS: 00010206   (2.6.5) 
EIP is at ext3_put_super+0xde/0x144
eax: 0000005e   ebx: 00000001   ecx: 00000000   edx: c02fe3c4
esi: f0d8a000   edi: f0d8b108   ebp: f797b200   esp: c6d59f04
ds: 007b   es: 007b   ss: 0068
Process umount (pid: 1937, threadinfo=c6d58000 task=f787c280)
Stack: c02bd800 c02bd7e0 c02bd7d0 0000019c c02bd7b5 f797b250 f797b200
c0304440 
       c6d58000 c014b341 f797b200 f797b200 f7c1f500 0804ff20 c014bd3e
f797b200 
       f797b200 c03045c0 c014b19e f797b200 f7fe7540 f797b200 c015ea02
f797b200 
Call Trace:
 [<c014b341>] generic_shutdown_super+0x9d/0x160
 [<c014bd3e>] kill_block_super+0x12/0x28
 [<c014b19e>] deactivate_super+0x46/0x94
 [<c015ea02>] __mntput+0x1e/0x24
 [<c01517ec>] path_release+0x28/0x30
 [<c015f0c5>] sys_umount+0x69/0x74
 [<c013c9e0>] sys_munmap+0x38/0x58
 [<c015f0dc>] sys_oldumount+0xc/0x10
 [<c0106b83>] syscall_call+0x7/0xb

Code: 0f 0b 9c 01 d0 d7 2b c0 83 c4 14 6a 00 8b 85 94 00 00 00 50 

On Sun, 2004-05-23 at 12:36, Adam Cassar wrote:
> Hi Guys,
> 
> I am running ext3 on kernel v2.6.5.
> 
> I have an ext3 filesystem with dir_index and data=journal for
> /var/spool/exim
> 
> Today I noticed in the exim logs a bunch of 'failed to unlink
> /var/spool/exim/input/P/1BRbSP-0006hy-Jp-D'
> 
> I also noticed these in the kernel logs:
> 
> EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file
> (612870), 0
> EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file
> (22203), 0
> 
> I have never seen this before with exim and ext3. This is the first time
> I have tried running htree on the exim spool however.
> 
> 
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


From maheshext3 at yahoo.com  Mon May 24 19:38:15 2004
From: maheshext3 at yahoo.com (M K)
Date: Mon, 24 May 2004 12:38:15 -0700 (PDT)
Subject: Req. For Info: External Journal Pros/Cons, advisable size
Message-ID: <20040524193815.78638.qmail@web61007.mail.yahoo.com>

I have a basic question, being new to EXT3. 
What are the pros and cons of using an external
journal?
Also, for a Terabyte-sized system, what should I use
as the external journal's size? is there a general
rule-of-thumb to follow when choosing the external
journal's size?
  I know these are very basic questions, if they have
been already answered in some FAQ, please let me know.
Any suggestions and help are very much appreciated!
Thanks in advance!

Cheers,
Mahesh


__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


From maheshext3 at yahoo.com  Mon May 24 19:45:09 2004
From: maheshext3 at yahoo.com (M K)
Date: Mon, 24 May 2004 12:45:09 -0700 (PDT)
Subject: Separate common journal device
In-Reply-To: <20040421092301.GD2938@schnapps.adilger.int>
Message-ID: <20040524194509.67338.qmail@web61006.mail.yahoo.com>

On a related note, wouldnt it be more efficient to
have a single dedicated hard drive, with multiple
partitions to store journals - one for each ext3
system? 
--- Andreas Dilger <adilger at clusterfs.com> wrote:
> On Apr 20, 2004  23:56 -0500, Vijayan Prabhakaran
> wrote:
> > Is it possible to use a separate journal device
> (one on a separate
> > drive or a partition) shared among more than 1
> Ext3 file systems ?
> 
> It is possible now to use an external block device
> for a single filesystem.
> The on-disk format is designed to allow multiple
> filesystems to share the
> same device, but that has never been fully
> implemented.
> 
> At one point I had implemented a patch to mount a
> "jbd" filesystem with the
> journal device as the first step of having a shared
> journal device.  Having
> the "jbd" device in /etc/fstab (before filesystems
> that use it) allows e2fsck
> to do journal replay on all of the filesystems
> before the journal starts to
> be used, or alternately dumps the journal data to an
> external file for later
> replay (e.g. if block devices are not available when
> e2fsck is run on the
> jbd device).  It also allows the jbd code to
> configure the in-core code to
> be ready for external filesystems to connect to it. 
> Finally, it also marks
> the block device as in-use so it is less likely that
> it will be overwritten
> accidentally.
> 
> See the following email for the (ancient) patch. 
> Most of the comments
> and a large fraction of the code in that email are
> still relevant, with
> the exception that all of the UUID handling already
> exists as libblkid
> in e2fsprogs, and it doesn't say what kernel version
> this is for (I'd
> suspect 2.3, but I'm not totally sure.  Sadly,
> nobody commented on it
> at the time and it was lost in the mists of
> antiquity.
> 
> > Subject: [PATCH][RFC] mountable journal devices
> > To: Ext2 development mailing list
> <ext2-devel at lists.sourceforge.net>
> > Date: Wed, 8 Aug 2001 02:08:23 -0600 (MDT)
>
http://marc.theaimsgroup.com/?l=ext2-devel&m=99725819513803
> 
> And the thread starting at discusses shared external
> journal devices:
>
https://listman.redhat.com/archives/ext3-users/2001-November/msg00182.html
> 
> Cheers, Andreas
> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


From maheshext3 at yahoo.com  Mon May 24 19:52:49 2004
From: maheshext3 at yahoo.com (M K)
Date: Mon, 24 May 2004 12:52:49 -0700 (PDT)
Subject: logging disk activity
In-Reply-To: <1081882546.17960.1.camel@rkalaskar>
Message-ID: <20040524195249.43829.qmail@web61004.mail.yahoo.com>

Try using iostat
it comes as part of the sysstat package. You can find
the RPM for it in rpmfind.net
--- Rahul Kalaskar <rkalaskar at aethon.com> wrote:
> Hi all,
> 
> I would like to know how often a writes happen on
> ext3 fs. Is there any
> way to find this out?
> 
> Thanks
> Rahul
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


From maheshext3 at yahoo.com  Mon May 24 19:56:32 2004
From: maheshext3 at yahoo.com (M K)
Date: Mon, 24 May 2004 12:56:32 -0700 (PDT)
Subject: logging disk activity
In-Reply-To: <407C4D7E.8000604@excelcia.org>
Message-ID: <20040524195632.19391.qmail@web61003.mail.yahoo.com>

If I am not mistaken, that number can be changed in
linux/fs/jbd/journal.c
also, Stephen Tweedie has a patch which allows you to
specify the journal update time as a parameter to the
mount command
plus, you can tweak bdflush (see man proc for details,
and linux/fs/buffer.c) to tune the timing of the
buffer flushes.
--- Kurt Fitzner <kfitzner at excelcia.org> wrote:
> Rahul Kalaskar wrote:
> > I would like to know how often a writes happen on
> ext3 fs. Is there any
> > way to find this out?
> 
> Data is flushed to the journal every 5 seconds, as
> opposed to ext2 where 
> it is flushed every 30.
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


From maheshext3 at yahoo.com  Mon May 24 19:51:01 2004
From: maheshext3 at yahoo.com (M K)
Date: Mon, 24 May 2004 12:51:01 -0700 (PDT)
Subject: EXT3 on raid with external journal...
In-Reply-To: <407FEB9D.1000002@excelcia.org>
Message-ID: <20040524195101.22885.qmail@web61002.mail.yahoo.com>

Kurt, was there any consensus on this thread?
I am running tests on the same configuration (ext3 on
RAID, with external journal)- are there any particular
tests that you had run? I could run it on my systems
too and post the results in the mailing list.


--- Kurt Fitzner <kfitzner at excelcia.org> wrote:
> Matt Bernstein wrote:
> > On Apr 13 Kurt Fitzner wrote:
> > 
> > There could be metadata which is only in the
> journal, so failure probably 
> > means reboot + full fsck, so you may as well use
> ext2 if your machine 
> > doesn't otherwise crash.
> > 
> > Far preferable, I think, would be to put your
> journal on a RAID 1 pair.
> 
> I would like to think that if the ext3 driver
> encountered an error 
> writing to the journal, that it would then skip the
> journal and write 
> straight to the device - reverting to ext2 behavior.
>  There should never 
> be any loss of data (meta or otherwise) upon the
> failure of a journal 
> device.  That is, unless the failure of the
> journaling device coincides 
> with a power failure.  That is:
> 
> 1) Failure of journaling device
> 2) Attempted write of metadata to journal device
> 3) Power failure before ext3 gives up on the
> journaling device
> 
> In that scenario, the ramification is the array
> requiring a full fsck. 
> The benefit of running the journal on an external
> device would far 
> outweigh the cost of a full fsck in the unlikely
> event the above happens.
> 
> I need to know, though, what exactly is the behavior
> of ext3 in the 
> following situations:
>   - At system startup if there is a failure to
> "mount" an external journal
>   - During operation if the external journal device
> fails.
> 
> Does ext3 then revert to non-journaled (ext2)
> behavior in those instances?
> 
>   -
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


From maheshext3 at yahoo.com  Mon May 24 20:29:53 2004
From: maheshext3 at yahoo.com (M K)
Date: Mon, 24 May 2004 13:29:53 -0700 (PDT)
Subject: nature of ext3 journal 
Message-ID: <20040524202953.29517.qmail@web61003.mail.yahoo.com>

Is the ext3 journal a transient journal - meaning that
its emptied out as and when the transactions recorded
in the journal are completed - thus maintaining almost
a constant size? Or does it keep growing as time
progresses ?
Thanks in advance!


__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


From adilger at clusterfs.com  Tue May 25 16:30:58 2004
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 25 May 2004 10:30:58 -0600
Subject: ext3 htree issues
In-Reply-To: <1085279814.523.13.camel@akira>
References: <1085279814.523.13.camel@akira>
Message-ID: <20040525163058.GB2603@schnapps.adilger.int>

On May 23, 2004  12:36 +1000, Adam Cassar wrote:
> I am running ext3 on kernel v2.6.5.
> 
> I have an ext3 filesystem with dir_index and data=journal for
> /var/spool/exim
> 
> Today I noticed in the exim logs a bunch of 'failed to unlink
> /var/spool/exim/input/P/1BRbSP-0006hy-Jp-D'
> 
> I also noticed these in the kernel logs:
> 
> EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file
> (612870), 0
> EXT3-fs warning (device hda12): ext3_unlink: Deleting nonexistent file
> (22203), 0
> 
> I have never seen this before with exim and ext3. This is the first time
> I have tried running htree on the exim spool however.

I will posted an updated patch to ext3-devel for this problem in the
thread "problems with ext3 fs, kernels up to 2.6.6-rc2".

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


From maheshext3 at yahoo.com  Wed May 26 19:14:28 2004
From: maheshext3 at yahoo.com (M K)
Date: Wed, 26 May 2004 12:14:28 -0700 (PDT)
Subject: HD Partition Lost??
In-Reply-To: <40AEC2EA.1030908@adelphia.net>
Message-ID: <20040526191428.45704.qmail@web61009.mail.yahoo.com>

Though this is not a hard-drive info thread, someone's
complaint about Maxtor drives caught my attention,
since I have been experiencing similar issues - quite
a few drives have been just going bad all of a sudden
after working fine for a month or two - the
time-to-failure duration is random; My experience has
been primarily with the Maxstor 250 and 300 GB drives
- which the company claims are "near-line" drives. 
are there any other reliable drives that anyone has
tested ?
Thanks in Advance!
--- Toby Bluhm <tkb9 at adelphia.net> wrote:
> Steve Watford wrote:
> 
> >Yes, I downloaded the Maxtor diagnostic program and
> installed to a diskette.  
> >It reports that the drive is failing and to back up
> right away returning a 
> >diagnostic code for an RMA.  Oh well, guess it
> really is hardware.  I will be  
> >dd'ing it over to another drive and sending this
> one back.  Then I'll work on 
> >it from there.
> >
> >What would be the exact syntax for the dd command
> for the entire disk as you 
> >suggested?  I will be installing a parallel drive
> in the morning, also a 
> >160GB drive as /dev/hdc with the bad one being
> /dev/hda.  The bad drive is 
> >nowhere near full, but is the max file size a
> problem still?  I shouldn't do 
> >just partitions instead?  Although the one with the
> actual problem is a 100GB 
> >partition, although only about 20% full. Around 22
> Gig all together on the 
> >drive.
> >
> >Thanks for the help,
> >Steve
> >
> >On Thursday 20 May 2004 10:51 pm, Toby Bluhm wrote:
> >  
> >
> >>evilninja wrote:
> >>    
> >>
> >>>-----BEGIN PGP SIGNED MESSAGE-----
> >>>Hash: SHA1
> >>>
> >>>Steve Watford schrieb:
> >>>| Thanks,
> >>>| I'll give dd a try for backup puposes. I have
> already tried to fsck
> >>>| the
> >>>| partition it just comes back with a short read
> error.  Asking if it is
> >>>| a zero
> >>>
> >>>always be careful to check an already damaged
> disk:
> >>>| Steve Watford schrieb:
> >>>| | hda: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> >>>| | hda: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=90895772,
> >>>
> >>>being not a professional but a normal user i
> think this really looks
> >>>like some hw issues. so i'd suggest better to
> copy (dd) all the data
> >>>from the disk, as long as you have time to. every
> additional use of the
> >>>disk may cause its final death.
> >>>
> >>>i wonder why your other partitions *seem* to
> still be ok (e.g. "mount"
> >>>and using its data then succeeds)...
> >>>      
> >>>
> >>Ack! I've had too many failures with Maxtor disks.
> >>
> >>Anyway, use conv=sync,noerror with your dd
> command, preferably to an
> >>identical disk. Do the entire disk /dev/hda. Yes,
> it will take a very
> >>long time, but you have no other recourse if you
> want maintain all your
> >>data. Then fsck the new disk. Fsck on the bad disk
> may just make matters
> >>worse.
> >>
> >>Download Maxtor's ide utility & see if you can fix
> the bad one - may
> >>require a total disk rewrite.
> >>
> >>Nothing unusual about  the other partitions being
> okay - just a matter
> >>of location of the sectors on the platter(s). Be
> suspicious of the
> >>entire disk though.
> >>    
> >>
> >
> >
> >  
> >
> 
> You may as well do the entire disk:
> 
> dd if=/dev/hda of=/dev/hdb bs=1k conf=sync,noerror
> 
> Install the new disk as hda & run your fsck on hda3.
> 
> -- 
> 
> Toby Bluhm
> 
> 
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


From maheshext3 at yahoo.com  Wed May 26 21:32:04 2004
From: maheshext3 at yahoo.com (M K)
Date: Wed, 26 May 2004 14:32:04 -0700 (PDT)
Subject: HD Partition Lost??
In-Reply-To: <20040526191428.45704.qmail@web61009.mail.yahoo.com>
Message-ID: <20040526213204.87320.qmail@web61004.mail.yahoo.com>

Answering my own question: did some digging through
the linux-raid archives, found a huge number of posts
where people had problems with virtually all drive
manufacturers. bad . :-(

--- M K <maheshext3 at yahoo.com> wrote:
> Though this is not a hard-drive info thread,
> someone's
> complaint about Maxtor drives caught my attention,
> since I have been experiencing similar issues -
> quite
> a few drives have been just going bad all of a
> sudden
> after working fine for a month or two - the
> time-to-failure duration is random; My experience
> has
> been primarily with the Maxstor 250 and 300 GB
> drives
> - which the company claims are "near-line" drives. 
> are there any other reliable drives that anyone has
> tested ?
> Thanks in Advance!
> --- Toby Bluhm <tkb9 at adelphia.net> wrote:
> > Steve Watford wrote:
> > 
> > >Yes, I downloaded the Maxtor diagnostic program
> and
> > installed to a diskette.  
> > >It reports that the drive is failing and to back
> up
> > right away returning a 
> > >diagnostic code for an RMA.  Oh well, guess it
> > really is hardware.  I will be  
> > >dd'ing it over to another drive and sending this
> > one back.  Then I'll work on 
> > >it from there.
> > >
> > >What would be the exact syntax for the dd command
> > for the entire disk as you 
> > >suggested?  I will be installing a parallel drive
> > in the morning, also a 
> > >160GB drive as /dev/hdc with the bad one being
> > /dev/hda.  The bad drive is 
> > >nowhere near full, but is the max file size a
> > problem still?  I shouldn't do 
> > >just partitions instead?  Although the one with
> the
> > actual problem is a 100GB 
> > >partition, although only about 20% full. Around
> 22
> > Gig all together on the 
> > >drive.
> > >
> > >Thanks for the help,
> > >Steve
> > >
> > >On Thursday 20 May 2004 10:51 pm, Toby Bluhm
> wrote:
> > >  
> > >
> > >>evilninja wrote:
> > >>    
> > >>
> > >>>-----BEGIN PGP SIGNED MESSAGE-----
> > >>>Hash: SHA1
> > >>>
> > >>>Steve Watford schrieb:
> > >>>| Thanks,
> > >>>| I'll give dd a try for backup puposes. I have
> > already tried to fsck
> > >>>| the
> > >>>| partition it just comes back with a short
> read
> > error.  Asking if it is
> > >>>| a zero
> > >>>
> > >>>always be careful to check an already damaged
> > disk:
> > >>>| Steve Watford schrieb:
> > >>>| | hda: dma_intr: status=0x51 { DriveReady
> > SeekComplete Error }
> > >>>| | hda: dma_intr: error=0x40 {
> > UncorrectableError }, LBAsect=90895772,
> > >>>
> > >>>being not a professional but a normal user i
> > think this really looks
> > >>>like some hw issues. so i'd suggest better to
> > copy (dd) all the data
> > >>>from the disk, as long as you have time to.
> every
> > additional use of the
> > >>>disk may cause its final death.
> > >>>
> > >>>i wonder why your other partitions *seem* to
> > still be ok (e.g. "mount"
> > >>>and using its data then succeeds)...
> > >>>      
> > >>>
> > >>Ack! I've had too many failures with Maxtor
> disks.
> > >>
> > >>Anyway, use conv=sync,noerror with your dd
> > command, preferably to an
> > >>identical disk. Do the entire disk /dev/hda.
> Yes,
> > it will take a very
> > >>long time, but you have no other recourse if you
> > want maintain all your
> > >>data. Then fsck the new disk. Fsck on the bad
> disk
> > may just make matters
> > >>worse.
> > >>
> > >>Download Maxtor's ide utility & see if you can
> fix
> > the bad one - may
> > >>require a total disk rewrite.
> > >>
> > >>Nothing unusual about  the other partitions
> being
> > okay - just a matter
> > >>of location of the sectors on the platter(s). Be
> > suspicious of the
> > >>entire disk though.
> > >>    
> > >>
> > >
> > >
> > >  
> > >
> > 
> > You may as well do the entire disk:
> > 
> > dd if=/dev/hda of=/dev/hdb bs=1k conf=sync,noerror
> > 
> > Install the new disk as hda & run your fsck on
> hda3.
> > 
> > -- 
> > 
> > Toby Bluhm
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Ext3-users mailing list
> > Ext3-users at redhat.com
> > https://www.redhat.com/mailman/listinfo/ext3-users
> 
> 
> 
> 	
> 		
> __________________________________
> Do you Yahoo!?
> Friends.  Fun.  Try the all-new Yahoo! Messenger.
> http://messenger.yahoo.com/ 
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


From cwelton at jumpnowusa.com  Thu May 27 01:33:02 2004
From: cwelton at jumpnowusa.com (Christopher Welton)
Date: 26 May 2004 18:33:02 -0700
Subject: HELP! after power loss, system boots through mount of root fs then
	stalls
Message-ID: <1085621582.3262.38.camel@localhost.localdomain>

I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium
266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID
controller and array. All file systems are ext3. 

The server has been in service for a couple of years now. From time to
time we will lose power in our office or have another situtation that
causes the server to lose power without a proper shutdown. We had such a
situation today. 

Usually the server reboots to runlevel 5 without a problem. However,
today the server rebooted to the point in the boot process just after
mounting the root filesystem. It then stalls indefinitely and does not
continue to boot. 

I used a recovery CD to boot into a rescue shell. Once there I
successfully mounted all the partitions on all drives and examined the
files successfully, so the data, filesystem and hardware all look good.
The filesystems were mounted ext2, not ext3.

At this point, my suspicions are that some portion of the kernel
required for booting was damaged during the power-loss shutdown or that
the ext3 journal was damaged in such a way as to block booting. 

I need suggestions on possible causes of the problem and, better yet,
possible solutions.

I'm cross-posting this message on the RH 7.3 list.


From awilliam at mdah.state.ms.us  Thu May 27 01:36:16 2004
From: awilliam at mdah.state.ms.us (Adam Williams)
Date: Wed, 26 May 2004 20:36:16 -0500
Subject: HELP! after power loss, system boots through mount of root fs
 then	stalls
In-Reply-To: <1085621582.3262.38.camel@localhost.localdomain>
References: <1085621582.3262.38.camel@localhost.localdomain>
Message-ID: <40B54610.9050607@mdah.state.ms.us>

in rescue mode did you do e2fsck -c -y /dev/sdx#

Christopher Welton wrote:

>I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium
>266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID
>controller and array. All file systems are ext3. 
>
>The server has been in service for a couple of years now. From time to
>time we will lose power in our office or have another situtation that
>causes the server to lose power without a proper shutdown. We had such a
>situation today. 
>
>Usually the server reboots to runlevel 5 without a problem. However,
>today the server rebooted to the point in the boot process just after
>mounting the root filesystem. It then stalls indefinitely and does not
>continue to boot. 
>
>I used a recovery CD to boot into a rescue shell. Once there I
>successfully mounted all the partitions on all drives and examined the
>files successfully, so the data, filesystem and hardware all look good.
>The filesystems were mounted ext2, not ext3.
>
>At this point, my suspicions are that some portion of the kernel
>required for booting was damaged during the power-loss shutdown or that
>the ext3 journal was damaged in such a way as to block booting. 
>
>I need suggestions on possible causes of the problem and, better yet,
>possible solutions.
>
>I'm cross-posting this message on the RH 7.3 list.
>
>
>_______________________________________________
>Ext3-users mailing list
>Ext3-users at redhat.com
>https://www.redhat.com/mailman/listinfo/ext3-users
>
>  
>


From cwelton at jumpnowusa.com  Thu May 27 02:13:19 2004
From: cwelton at jumpnowusa.com (Christopher Welton)
Date: 26 May 2004 19:13:19 -0700
Subject: HELP! after power loss, system boots through mount of root fs
	then	stalls
In-Reply-To: <40B54610.9050607@mdah.state.ms.us>
References: <1085621582.3262.38.camel@localhost.localdomain>
	<40B54610.9050607@mdah.state.ms.us>
Message-ID: <1085623999.3262.42.camel@localhost.localdomain>

No, I did not.

On Wed, 2004-05-26 at 18:36, Adam Williams wrote:
> in rescue mode did you do e2fsck -c -y /dev/sdx#
> 
> Christopher Welton wrote:
> 
> >I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium
> >266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID
> >controller and array. All file systems are ext3. 
> >
> >The server has been in service for a couple of years now. From time to
> >time we will lose power in our office or have another situtation that
> >causes the server to lose power without a proper shutdown. We had such a
> >situation today. 
> >
> >Usually the server reboots to runlevel 5 without a problem. However,
> >today the server rebooted to the point in the boot process just after
> >mounting the root filesystem. It then stalls indefinitely and does not
> >continue to boot. 
> >
> >I used a recovery CD to boot into a rescue shell. Once there I
> >successfully mounted all the partitions on all drives and examined the
> >files successfully, so the data, filesystem and hardware all look good.
> >The filesystems were mounted ext2, not ext3.
> >
> >At this point, my suspicions are that some portion of the kernel
> >required for booting was damaged during the power-loss shutdown or that
> >the ext3 journal was damaged in such a way as to block booting. 
> >
> >I need suggestions on possible causes of the problem and, better yet,
> >possible solutions.
> >
> >I'm cross-posting this message on the RH 7.3 list.
> >
> >
> >_______________________________________________
> >Ext3-users mailing list
> >Ext3-users at redhat.com
> >https://www.redhat.com/mailman/listinfo/ext3-users
> >
> >  
> >
> 
> 
> 
> 


From awilliam at mdah.state.ms.us  Thu May 27 02:15:08 2004
From: awilliam at mdah.state.ms.us (Adam Williams)
Date: Wed, 26 May 2004 21:15:08 -0500
Subject: HELP! after power loss, system boots through mount of root fs
 then	stalls
In-Reply-To: <1085623999.3262.42.camel@localhost.localdomain>
References: <1085621582.3262.38.camel@localhost.localdomain>	
	<40B54610.9050607@mdah.state.ms.us>
	<1085623999.3262.42.camel@localhost.localdomain>
Message-ID: <40B54F2C.1070405@mdah.state.ms.us>

well try it and see what happens :)

Christopher Welton wrote:

>No, I did not.
>
>On Wed, 2004-05-26 at 18:36, Adam Williams wrote:
>  
>
>>in rescue mode did you do e2fsck -c -y /dev/sdx#
>>
>>Christopher Welton wrote:
>>
>>    
>>
>>>I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium
>>>266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID
>>>controller and array. All file systems are ext3. 
>>>
>>>The server has been in service for a couple of years now. From time to
>>>time we will lose power in our office or have another situtation that
>>>causes the server to lose power without a proper shutdown. We had such a
>>>situation today. 
>>>
>>>Usually the server reboots to runlevel 5 without a problem. However,
>>>today the server rebooted to the point in the boot process just after
>>>mounting the root filesystem. It then stalls indefinitely and does not
>>>continue to boot. 
>>>
>>>I used a recovery CD to boot into a rescue shell. Once there I
>>>successfully mounted all the partitions on all drives and examined the
>>>files successfully, so the data, filesystem and hardware all look good.
>>>The filesystems were mounted ext2, not ext3.
>>>
>>>At this point, my suspicions are that some portion of the kernel
>>>required for booting was damaged during the power-loss shutdown or that
>>>the ext3 journal was damaged in such a way as to block booting. 
>>>
>>>I need suggestions on possible causes of the problem and, better yet,
>>>possible solutions.
>>>
>>>I'm cross-posting this message on the RH 7.3 list.
>>>
>>>
>>>_______________________________________________
>>>Ext3-users mailing list
>>>Ext3-users at redhat.com
>>>https://www.redhat.com/mailman/listinfo/ext3-users
>>>
>>> 
>>>
>>>      
>>>
>>
>>
>>    
>>
>
>  
>


From ext3 at linuxfarms.com  Thu May 27 14:11:12 2004
From: ext3 at linuxfarms.com (Arthur Perry)
Date: Thu, 27 May 2004 10:11:12 -0400 (EDT)
Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd)
Message-ID: <Pine.LNX.4.58.0405271010530.17951@tiamat.perryconsulting.net>


---------- Forwarded message ----------
Date: Thu, 27 May 2004 08:43:30 -0400 (EDT)
From: Arthur Perry <alp at perryconsulting.net>
To: ext3 at linuxfarms.com
Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd)


---------- Forwarded message ----------
Date: Thu, 27 May 2004 08:11:17 -0400 (EDT)
From: Arthur Perry <alp at perryconsulting.net>
To: Christopher Welton <cwelton at jumpnowusa.com>
Cc: ext3-users at redhat.com
Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd)

Hi Chris,

I put the whole thread into ext3-users at redhat.com, so others can benefit.

e2fsck will repair filesystems damage, but it will do nothing for the RAID
container.
If your damage exists primarily on the RAID container, then what you want
to do is repair that first!
Otherwise, you may be making corrections to a filesystem and writing those
changes to what could be mapped as bad blocks, and only worsening your
situation.

It all depends on what kind of RAID container you have, the type of
damage, and the extent of damage.

So in a simple answer, the filesystem check MAY appear to fix it for you
in the really short term,
or it may not. It's really all about your RAID container first.

In my experience, we have had bad luck with RAID5 on certain controllers.
I do not know anything about the particular one you have listed below.

I would be sure to back up anything you can with a rescue disk boot before
making any changes to your disk, simply because at this time, it is
unknown exactly what you are dealing with.


Best Regards,
Art Perry


On Wed, 26 May 2004, Christopher Welton wrote:

> Thanks Art, your response is useful. I have posted my original question
> to the RH 7.3 and RH ext3 lists. Feel free to post your question and/or
> answers.
>
> One thing I would like to know is how to recover from the damage to the
> RAID container scenario. Is this just a matter of running e2fsck? Pls
> let me know.
>
> Thanks for your help!
>
> Chris
>
> On Wed, 2004-05-26 at 17:20, Arthur Perry wrote:
> > Hi Chris,
> >
> > I just wanted to know if you have tried to fix this problem yet, and if
> > any of my info and suggestions were helpful.
> >
> > Best of Luck,
> > Art Perry
> >
> >
> >
> > On Tue, 25 May 2004, Arthur Perry wrote:
> >
> > >
> > >
> > > Oh, I just wanted to add a very important suggestion:
> > > Do not run fsck or anything that may modify the filesystem on that
> > > container until you have attempted to back up the important data first!
> > > At least, not until you are confident that what you are dealing with here
> > > is not any related to any damage to that container.
> > > In theory, the fsck may work out fine, but we (at least I not being there)
> > > are not sure about what is really going on.
> > >
> > > In practice, when there is any question about a medium's integrity, don't
> > > do anything further to it until the important data is extracted from it
> > > and backed up the best that you can before proceeding.
> > > I would mount read-only.
> > >
> > > Just a heads up!
> > >
> > >
> > >
> > > ---------- Forwarded message ----------
> > > Date: Tue, 25 May 2004 12:14:16 -0400 (EDT)
> > > From: Arthur Perry <alp at perryconsulting.net>
> > > To: Christopher Welton <cwelton at jumpnowusa.com>
> > > Cc: alp at linuxfarms.com
> > > Subject: Re: Linux consultation needed
> > >
> > > Hi Christopher,
> > >
> > > By your description, I do believe that I can fix this problem and it may be rather easy.
> > >
> > > Unfortunately, I live in Massachusetts and it may be rather expensive to get me down there.
> > > At the moment, I am working full-time for a large international computer corporation. So to make that trip, it would cut into my
> > > vacation time and so I would have to be compensated for that properly.
> > >
> > >
> > >
> > > Off the top of my head, I see two possible scenarios:
> > >
> > > 1) damage to the raid container
> > >    if you were able to mount the filesystem with a rescue system, (and you
> > > monted it in ext3, not ext2), then I can assume that the filesystem may
> > > believe that it is in order. (One would know for sure once you run fsck).
> > > However, that does not mean the underlaying block layer is ok.
> > > The RAID container presents itself to the OS as a uniform block device,
> > > which the filesystem sits on top of.
> > > How each block gets distributed across the physical disks is entirely up
> > > to the RAID hardware, and identifying whether or not a problem exists with
> > > the RAID container is also the responsibility of the RAID hardware.
> > > That being said, if the OS has no drivers that would directly interface
> > > with the RAID hardware to collect the status of the container, there is no
> > > way you could tell, unless this hardware had some sort of beep or buzzer to warn you
> > > of this. You could also enter the setup screen of the RAID controller at
> > > boot time and check the health of the container there.
> > >
> > > The reason why I think this is a possibility is because there may be
> > > "corrupted" blocks in this RAID container that exist in areas that are
> > > read during the boot process (and I use this term generally), and not
> > > necessarily in locations where the kernel itself reside.
> > > The kernel has such a small footprint, that this is not only
> > > unlikely but probably wouldn't happen because the kernel would not be able
> > > to completely decompress successfully to continue excecution before boot.
> > > If this were the case, the failure mode would probably be more severe.
> > >
> > >
> > >
> > > 2) damage to the hardware
> > >    It is possible that the hardware has become somehow damaged or changed,
> > > where at boot time when a hardware probe is performed by Kudzu, it locks
> > > up the system entirely.
> > > An example of this is a bad DIMM or possibly some other peripheral.
> > > Maybe there is a hung-up SCSI device on the chain that is on a separate
> > > power supply that just needs to be "reset" during the next reboot.
> > >
> > >
> > >
> > > If you were able to mount the filesystem from a rescue disk (in ext3 not
> > > ext2), then there is no reason why it would not work at boot time, granted
> > > the configs (/etc/fstab and kernel parameters for root) have not been
> > > changed.
> > > Therefore, the journal is probably fine and it is not the root cause.
> > > We have already ruled out kernel damage.
> > >
> > >
> > >
> > >
> > > My suggestion:
> > > 1) check out the RAID container status in the setup screen (USE GREAT
> > > CAUTION!! DO NOT CHANGE ANYTHING OR MAKE IT DO ANYTHING THAT YOU DO NOT
> > > UNERSTAND OR YOU CAN LOSE ALL OF YOUR DATA!!!).
> > > This can give you a rough idea of what may be going on.
> > >
> > > You can begin recovery by:
> > > 1) Get another disk onto the machine that is large enough to store the
> > > data that you think is necessary to salvage.
> > > 2) boot into that rescue image again, mount your RAID container, and
> > > slowly (little at a time) copy over the necessary files that you need to
> > > the new disk.
> > >
> > >
> > > There may be more possibilities here, but I am jsut going by the
> > > information presented and the first things that come to mind.
> > >
> > >
> > > I wish you good luck.
> > > If you think you really need my assistance, I can fly out there.. We will
> > > just have to go over the costs.
> > > If this is enough to help you on your way, then that is great.
> > >
> > >
> > > Also, I would like to post this back onto the newsgroups just so that
> > > other people who experience the same problem can benefit, if that is ok
> > > with you.
> > > I have been working with Linux professionally for over 7 years, and have
> > > not contributed back to the community much at all. ;)
> > >
> > >
> > >
> > > Let me know if this helps, or if you would like to move forward with
> > > consultation.
> > >
> > >
> > >
> > > Best Regards,
> > > Art Perry
> > > alp at perryconsulting.net
> > > http://www.perryconsulting.net
> > > http://www.linuxfarms.com
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, 25 May 2004, Christopher Welton wrote:
> > >
> > > > Arthur:
> > > >
> > > > I read your reply to a problem in the redhat ext3 mailing list. I'd like
> > > > to request your assistance in solving a serious problem we are having
> > > > with one of our servers. Here are the details:
> > > >
> > > > I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium
> > > > 266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID
> > > > controller and array. All file systems are ext3.
> > > >
> > > > The server has been in service for a couple of years now. From time to
> > > > time we will lose power in our office or have another situtation that
> > > > causes the server to lose power without a proper shutdown. We had such a
> > > > situation today.
> > > >
> > > > Usually the server reboots to runlevel 5 without a problem. However,
> > > > today the server rebooted to the point in the boot process just after
> > > > mounting the root filesystem. It then stalls indefinitely and does not
> > > > continue to boot.
> > > >
> > > > I used a recovery CD to boot into a rescue shell. Once there I
> > > > successfully mounted all the partitions on all drives and examined the
> > > > files successfully, so the data, filesystem and hardware all look good.
> > > > The filesystems were mounted ext2, not ext3.
> > > >
> > > > At this point, my suspicions are that some portion of the kernel
> > > > required for booting was damaged during the power-loss shutdown or that
> > > > the ext3 journal was damaged in such a way as to block booting.
> > > >
> > > > I need suggestions on possible causes of the problem and, better yet,
> > > > possible solutions.
> > > >
> > > > Pls let me know:
> > > > 1. If you think you can solve this issue.
> > > > and
> > > > 2. What rate you would bill at
> > > >
> > > > Thank you in advance
> > > >
> > > > Chris Welton
> > > > Owner
> > > > JumpNowUSA!
> > > > 562-946-6683
> > > > cwelton at jumpnowusa.com
> > > >
> > >
> >
> >
>


From tytso at mit.edu  Thu May 27 19:04:45 2004
From: tytso at mit.edu (Theodore Ts'o)
Date: Thu, 27 May 2004 15:04:45 -0400
Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd)
In-Reply-To: <Pine.LNX.4.58.0405271010530.17951@tiamat.perryconsulting.net>
References: <Pine.LNX.4.58.0405271010530.17951@tiamat.perryconsulting.net>
Message-ID: <20040527190445.GC29793@thunk.org>

The one thing I would add to this is that if you are using any kind of
partitioning scheme on top of the hardware RAID device, make sure your
partition table is sane first.  If the starting block or the size of
the partition is wrong, running e2fsck will also do much more damage.

In general, the rule is:

1)  Make sure the block device is sane.
2)  Make sure the partition table is sane
3)  Run e2fsck

If you're not sure, you can try running e2fsck -n first, or making a
full image backup first.

					- Ted


On Thu, May 27, 2004 at 10:11:12AM -0400, Arthur Perry wrote:
> 
> 
> ---------- Forwarded message ----------
> Date: Thu, 27 May 2004 08:43:30 -0400 (EDT)
> From: Arthur Perry <alp at perryconsulting.net>
> To: ext3 at linuxfarms.com
> Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd)
> 
> 
> 
> ---------- Forwarded message ----------
> Date: Thu, 27 May 2004 08:11:17 -0400 (EDT)
> From: Arthur Perry <alp at perryconsulting.net>
> To: Christopher Welton <cwelton at jumpnowusa.com>
> Cc: ext3-users at redhat.com
> Subject: (regards to ext3 and RAID) Re: Linux consultation needed (fwd)
> 
> Hi Chris,
> 
> I put the whole thread into ext3-users at redhat.com, so others can benefit.
> 
> e2fsck will repair filesystems damage, but it will do nothing for the RAID
> container.
> If your damage exists primarily on the RAID container, then what you want
> to do is repair that first!
> Otherwise, you may be making corrections to a filesystem and writing those
> changes to what could be mapped as bad blocks, and only worsening your
> situation.
> 
> It all depends on what kind of RAID container you have, the type of
> damage, and the extent of damage.
> 
> So in a simple answer, the filesystem check MAY appear to fix it for you
> in the really short term,
> or it may not. It's really all about your RAID container first.
> 
> In my experience, we have had bad luck with RAID5 on certain controllers.
> I do not know anything about the particular one you have listed below.
> 
> I would be sure to back up anything you can with a rescue disk boot before
> making any changes to your disk, simply because at this time, it is
> unknown exactly what you are dealing with.
> 
> 
> Best Regards,
> Art Perry
> 
> 
> 
> 
> On Wed, 26 May 2004, Christopher Welton wrote:
> 
> > Thanks Art, your response is useful. I have posted my original question
> > to the RH 7.3 and RH ext3 lists. Feel free to post your question and/or
> > answers.
> >
> > One thing I would like to know is how to recover from the damage to the
> > RAID container scenario. Is this just a matter of running e2fsck? Pls
> > let me know.
> >
> > Thanks for your help!
> >
> > Chris
> >
> > On Wed, 2004-05-26 at 17:20, Arthur Perry wrote:
> > > Hi Chris,
> > >
> > > I just wanted to know if you have tried to fix this problem yet, and if
> > > any of my info and suggestions were helpful.
> > >
> > > Best of Luck,
> > > Art Perry
> > >
> > >
> > >
> > > On Tue, 25 May 2004, Arthur Perry wrote:
> > >
> > > >
> > > >
> > > > Oh, I just wanted to add a very important suggestion:
> > > > Do not run fsck or anything that may modify the filesystem on that
> > > > container until you have attempted to back up the important data first!
> > > > At least, not until you are confident that what you are dealing with here
> > > > is not any related to any damage to that container.
> > > > In theory, the fsck may work out fine, but we (at least I not being there)
> > > > are not sure about what is really going on.
> > > >
> > > > In practice, when there is any question about a medium's integrity, don't
> > > > do anything further to it until the important data is extracted from it
> > > > and backed up the best that you can before proceeding.
> > > > I would mount read-only.
> > > >
> > > > Just a heads up!
> > > >
> > > >
> > > >
> > > > ---------- Forwarded message ----------
> > > > Date: Tue, 25 May 2004 12:14:16 -0400 (EDT)
> > > > From: Arthur Perry <alp at perryconsulting.net>
> > > > To: Christopher Welton <cwelton at jumpnowusa.com>
> > > > Cc: alp at linuxfarms.com
> > > > Subject: Re: Linux consultation needed
> > > >
> > > > Hi Christopher,
> > > >
> > > > By your description, I do believe that I can fix this problem and it may be rather easy.
> > > >
> > > > Unfortunately, I live in Massachusetts and it may be rather expensive to get me down there.
> > > > At the moment, I am working full-time for a large international computer corporation. So to make that trip, it would cut into my
> > > > vacation time and so I would have to be compensated for that properly.
> > > >
> > > >
> > > >
> > > > Off the top of my head, I see two possible scenarios:
> > > >
> > > > 1) damage to the raid container
> > > >    if you were able to mount the filesystem with a rescue system, (and you
> > > > monted it in ext3, not ext2), then I can assume that the filesystem may
> > > > believe that it is in order. (One would know for sure once you run fsck).
> > > > However, that does not mean the underlaying block layer is ok.
> > > > The RAID container presents itself to the OS as a uniform block device,
> > > > which the filesystem sits on top of.
> > > > How each block gets distributed across the physical disks is entirely up
> > > > to the RAID hardware, and identifying whether or not a problem exists with
> > > > the RAID container is also the responsibility of the RAID hardware.
> > > > That being said, if the OS has no drivers that would directly interface
> > > > with the RAID hardware to collect the status of the container, there is no
> > > > way you could tell, unless this hardware had some sort of beep or buzzer to warn you
> > > > of this. You could also enter the setup screen of the RAID controller at
> > > > boot time and check the health of the container there.
> > > >
> > > > The reason why I think this is a possibility is because there may be
> > > > "corrupted" blocks in this RAID container that exist in areas that are
> > > > read during the boot process (and I use this term generally), and not
> > > > necessarily in locations where the kernel itself reside.
> > > > The kernel has such a small footprint, that this is not only
> > > > unlikely but probably wouldn't happen because the kernel would not be able
> > > > to completely decompress successfully to continue excecution before boot.
> > > > If this were the case, the failure mode would probably be more severe.
> > > >
> > > >
> > > >
> > > > 2) damage to the hardware
> > > >    It is possible that the hardware has become somehow damaged or changed,
> > > > where at boot time when a hardware probe is performed by Kudzu, it locks
> > > > up the system entirely.
> > > > An example of this is a bad DIMM or possibly some other peripheral.
> > > > Maybe there is a hung-up SCSI device on the chain that is on a separate
> > > > power supply that just needs to be "reset" during the next reboot.
> > > >
> > > >
> > > >
> > > > If you were able to mount the filesystem from a rescue disk (in ext3 not
> > > > ext2), then there is no reason why it would not work at boot time, granted
> > > > the configs (/etc/fstab and kernel parameters for root) have not been
> > > > changed.
> > > > Therefore, the journal is probably fine and it is not the root cause.
> > > > We have already ruled out kernel damage.
> > > >
> > > >
> > > >
> > > >
> > > > My suggestion:
> > > > 1) check out the RAID container status in the setup screen (USE GREAT
> > > > CAUTION!! DO NOT CHANGE ANYTHING OR MAKE IT DO ANYTHING THAT YOU DO NOT
> > > > UNERSTAND OR YOU CAN LOSE ALL OF YOUR DATA!!!).
> > > > This can give you a rough idea of what may be going on.
> > > >
> > > > You can begin recovery by:
> > > > 1) Get another disk onto the machine that is large enough to store the
> > > > data that you think is necessary to salvage.
> > > > 2) boot into that rescue image again, mount your RAID container, and
> > > > slowly (little at a time) copy over the necessary files that you need to
> > > > the new disk.
> > > >
> > > >
> > > > There may be more possibilities here, but I am jsut going by the
> > > > information presented and the first things that come to mind.
> > > >
> > > >
> > > > I wish you good luck.
> > > > If you think you really need my assistance, I can fly out there.. We will
> > > > just have to go over the costs.
> > > > If this is enough to help you on your way, then that is great.
> > > >
> > > >
> > > > Also, I would like to post this back onto the newsgroups just so that
> > > > other people who experience the same problem can benefit, if that is ok
> > > > with you.
> > > > I have been working with Linux professionally for over 7 years, and have
> > > > not contributed back to the community much at all. ;)
> > > >
> > > >
> > > >
> > > > Let me know if this helps, or if you would like to move forward with
> > > > consultation.
> > > >
> > > >
> > > >
> > > > Best Regards,
> > > > Art Perry
> > > > alp at perryconsulting.net
> > > > http://www.perryconsulting.net
> > > > http://www.linuxfarms.com
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, 25 May 2004, Christopher Welton wrote:
> > > >
> > > > > Arthur:
> > > > >
> > > > > I read your reply to a problem in the redhat ext3 mailing list. I'd like
> > > > > to request your assistance in solving a serious problem we are having
> > > > > with one of our servers. Here are the details:
> > > > >
> > > > > I run a RH 7.3 installation on a Compaq Proliant 6500 with dual pentium
> > > > > 266Mhz processors, approx. 630MB ram and a hardware SMART-2DH RAID
> > > > > controller and array. All file systems are ext3.
> > > > >
> > > > > The server has been in service for a couple of years now. From time to
> > > > > time we will lose power in our office or have another situtation that
> > > > > causes the server to lose power without a proper shutdown. We had such a
> > > > > situation today.
> > > > >
> > > > > Usually the server reboots to runlevel 5 without a problem. However,
> > > > > today the server rebooted to the point in the boot process just after
> > > > > mounting the root filesystem. It then stalls indefinitely and does not
> > > > > continue to boot.
> > > > >
> > > > > I used a recovery CD to boot into a rescue shell. Once there I
> > > > > successfully mounted all the partitions on all drives and examined the
> > > > > files successfully, so the data, filesystem and hardware all look good.
> > > > > The filesystems were mounted ext2, not ext3.
> > > > >
> > > > > At this point, my suspicions are that some portion of the kernel
> > > > > required for booting was damaged during the power-loss shutdown or that
> > > > > the ext3 journal was damaged in such a way as to block booting.
> > > > >
> > > > > I need suggestions on possible causes of the problem and, better yet,
> > > > > possible solutions.
> > > > >
> > > > > Pls let me know:
> > > > > 1. If you think you can solve this issue.
> > > > > and
> > > > > 2. What rate you would bill at
> > > > >
> > > > > Thank you in advance
> > > > >
> > > > > Chris Welton
> > > > > Owner
> > > > > JumpNowUSA!
> > > > > 562-946-6683
> > > > > cwelton at jumpnowusa.com
> > > > >
> > > >
> > >
> > >
> >
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


From adilger at clusterfs.com  Thu May 27 19:23:53 2004
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 27 May 2004 13:23:53 -0600
Subject: Separate common journal device
In-Reply-To: <20040524194509.67338.qmail@web61006.mail.yahoo.com>
References: <20040421092301.GD2938@schnapps.adilger.int>
	<20040524194509.67338.qmail@web61006.mail.yahoo.com>
Message-ID: <20040527192353.GO2603@schnapps.adilger.int>

On May 24, 2004  12:45 -0700, M K wrote:
> On a related note, wouldnt it be more efficient to
> have a single dedicated hard drive, with multiple
> partitions to store journals - one for each ext3
> system? 

No, because then each filesystem would cause seeking to its part of the
journal for each transaction (unless the dedicated device was NVRAM).
In general, the writes to the journal are pure overhead unless you
actually crash so they need to be as efficient as possible at write time
and the complexity at recovery time is much less critical.

Having all of the journal writes for multiple filesystems stream to a
single block device without any seeking is the best.  Making a larger
journal also helps a lot in the performance area, but you can't always
afford to consume so much RAM on a system (especially a larger journal
for each filesystem).

> --- Andreas Dilger <adilger at clusterfs.com> wrote:
> > It is possible now to use an external block device
> > for a single filesystem.
> > The on-disk format is designed to allow multiple
> > filesystems to share the
> > same device, but that has never been fully
> > implemented.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


From maheshext3 at yahoo.com  Thu May 27 19:59:15 2004
From: maheshext3 at yahoo.com (M K)
Date: Thu, 27 May 2004 12:59:15 -0700 (PDT)
Subject: Separate common journal device
In-Reply-To: <20040527192353.GO2603@schnapps.adilger.int>
Message-ID: <20040527195915.31681.qmail@web61009.mail.yahoo.com>

Oh, good point.. BTW,  what should the ideal size of a
journal be ? is there a general guideline to follow ?
I am sort-of a newbie to ext3, any advice on choosing
the journal size would be great!
Thanks in advance,
Mahesh

--- Andreas Dilger <adilger at clusterfs.com> wrote:
> On May 24, 2004  12:45 -0700, M K wrote:
> > On a related note, wouldnt it be more efficient to
> > have a single dedicated hard drive, with multiple
> > partitions to store journals - one for each ext3
> > system? 
> 
> No, because then each filesystem would cause seeking
> to its part of the
> journal for each transaction (unless the dedicated
> device was NVRAM).
> In general, the writes to the journal are pure
> overhead unless you
> actually crash so they need to be as efficient as
> possible at write time
> and the complexity at recovery time is much less
> critical.
> 
> Having all of the journal writes for multiple
> filesystems stream to a
> single block device without any seeking is the best.
>  Making a larger
> journal also helps a lot in the performance area,
> but you can't always
> afford to consume so much RAM on a system
> (especially a larger journal
> for each filesystem).
> 
> > --- Andreas Dilger <adilger at clusterfs.com> wrote:
> > > It is possible now to use an external block
> device
> > > for a single filesystem.
> > > The on-disk format is designed to allow multiple
> > > filesystems to share the
> > > same device, but that has never been fully
> > > implemented.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


From adilger at clusterfs.com  Thu May 27 20:17:21 2004
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 27 May 2004 14:17:21 -0600
Subject: Separate common journal device
In-Reply-To: <20040527195915.31681.qmail@web61009.mail.yahoo.com>
References: <20040527192353.GO2603@schnapps.adilger.int>
	<20040527195915.31681.qmail@web61009.mail.yahoo.com>
Message-ID: <20040527201721.GQ2603@schnapps.adilger.int>

On May 27, 2004  12:59 -0700, M K wrote:
> Oh, good point.. BTW,  what should the ideal size of a
> journal be ? is there a general guideline to follow ?
> I am sort-of a newbie to ext3, any advice on choosing
> the journal size would be great!

It hasn't really been discussed much.  I'd benchmark with your apps to
see what is best.  For Lustre (which often has hundreds of clients doing
large IOs to a single ext3 filesystem at one time) we use the largest
journal size possible (400MB) to reduce the possibility that the clients
get blocked waiting for journal space.  Lustre creates very large journal
transactions so having a larger journal means we can have more concurrent
handles open and definitely improved performance.  Most of our server nodes
also have gobs of RAM, so journal size isn't much of an issue.

For other apps there is probably some upper limit where the journal never
really gets too full, and making it larger doesn't help much.  It also
slows down journal recovery a bit with a very large journal.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


From root at hunley.homeip.net  Thu May 27 20:31:38 2004
From: root at hunley.homeip.net (Douglas J Hunley)
Date: Thu, 27 May 2004 16:31:38 -0400
Subject: confirm 192a133e70b9ed319286a068cd0aab526869f14a
In-Reply-To: <mailman.4565.1085688114.30524.ext3-users@redhat.com>
References: <mailman.4565.1085688114.30524.ext3-users@redhat.com>
Message-ID: <200405271631.40227.root@hunley.homeip.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ext3-users-request at redhat.com wrote:
> Mailing list removal confirmation notice for mailing list Ext3-users
>
> We have received a request  for the removal of your email address,
> "root at hunley.homeip.net" from the ext3-users at redhat.com mailing list.
> To confirm that you want to be removed from this mailing list, simply
> reply to this message, keeping the Subject: header intact.  Or visit
> this web page:
>
>    
> https://www.redhat.com/mailman/confirm/ext3-users/192a133e70b9ed319286a068c
>d0aab526869f14a
>
>
> Or include the following line -- and only the following line -- in a
> message to ext3-users-request at redhat.com:
>
>     confirm 192a133e70b9ed319286a068cd0aab526869f14a
>
> Note that simply sending a `reply' to this message should work from
> most mail readers, since that usually leaves the Subject: line in the
> right form (additional "Re:" text in the Subject: is okay).
>
> If you do not wish to be removed from this list, please simply
> disregard this message.  If you think you are being maliciously
> removed from the list, or have any other questions, send them to
> ext3-users-owner at redhat.com.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAtlAq2MO5UukaubkRAjMwAKCCkx5BweX068GetC1GhyeykXTvsACgojCR
K3/IV3JaxUneCPC2ShHAxuM=
=mzS9
-----END PGP SIGNATURE-----