From clay at exavio.com.cn  Thu Feb 19 03:00:21 2004
From: clay at exavio.com.cn (Isaac Claymore)
Date: Thu, 19 Feb 2004 11:00:21 +0800
Subject: fs block level syncing
In-Reply-To: <Pine.LNX.4.44.0402121238060.7342-100000@gate.nmr.mgh.harvard.edu>
References: <Pine.LNX.4.44.0402121238060.7342-100000@gate.nmr.mgh.harvard.edu>
Message-ID: <20040219030021.GA29856@exavio.com.cn>

Hi, here's an article about asynchronous block level replication:

http://www.linuxjournal.com/article.php?sid=7265

HTH


On Thu, Feb 12, 2004 at 12:51:36PM -0500, Paul Raines wrote:
> 
> Right now we do a lot of hard to hard disk backup by using rsync to weekly
> "mirror" the source filesystem to a backup filesystem. This works fairly
> well for most sources.  However, one issue with rsync is that simple things
> like changing the file name or directory name cause the whole file or
> directory structure to get recopied over a previous sync. Also, like for
> mail spools, large files that simply get appended to get the whole file
> recopied.
> 
> Does anyone know of something that syncs an ext2/3 fs to another
> at the block level which result in less data transfer?
> 
> -- 
> ---------------------------------------------------------------
> Paul Raines                   email: raines at nmr.mgh.harvard.edu
> MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
> 149 (2301) 13th Street        Charlestown, MA 02129	USA   
> 
> 
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users

-- 

Regards, Isaac
()  ascii ribbon campaign - against html e-mail
/\                        - against microsoft attachments




From christian.braun at ch.abb.com  Thu Feb 19 13:01:10 2004
From: christian.braun at ch.abb.com (christian.braun at ch.abb.com)
Date: Thu, 19 Feb 2004 14:01:10 +0100
Subject: ext3 Overhead
Message-ID: <OF79BBF471.9842B507-ONC1256E3F.0046FDE8-C1256E3F.0047848E@ch.abb.com>

Hello!



Message from "Stephen C. Tweedie" <sct at redhat.com> received on 13.02.2004 
12:12

13.02.2004 12:12



"Stephen C. Tweedie" <sct at redhat.com>





        To:     christian.braun at ch.abb.com
        cc:     ext3 users list <ext3-users at redhat.com>
        Subject:        Re: ext3 Overhead

Hi,

On Wed, 2004-02-11 at 14:55, christian.braun at ch.abb.com wrote:

> I'm using a CompactFlash as storage device. Since those CF cards only
> have limited write cycles (CF does wear-levelling by itself, but you
> don't want to write too many timet so the card) i was wondering by
> what a factor the journaling of ext3 increases the write accesses to
> the CompactFlash compared to ext2. Thanks a lot already for your help!

It increases the number of writes a bit, but in many cases might
actually reduce the number of overwrites for certain blocks like the
superblock, which can be an advantage for those CF cards that don't do
wear-levelling.  But it's really not a filesystem designed for flash.

--Stephen




On Wed, 2004-02-11 at 14:55, christian.braun at ch.abb.com wrote:

>> I'm using a CompactFlash as storage device. Since those CF cards only
>> have limited write cycles (CF does wear-levelling by itself, but you
>> don't want to write too many timet so the card) i was wondering by
>> what a factor the journaling of ext3 increases the write accesses to
>> the CompactFlash compared to ext2. Thanks a lot already for your help!
>
>It increases the number of writes a bit, but in many cases might
>actually reduce the number of overwrites for certain blocks like the
>superblock, which can be an advantage for those CF cards that don't do
>wear-levelling.  But it's really not a filesystem designed for flash.

Well, as I said my CF card does wear-levelling, so that's not to worry 
about. Still, as you said, there is a difference in the number of write 
accesses between ext2 and ext3... I just need to know in what region that 
difference is... is it 3 times... or 30 times... or 300... or even more?

Thank you!
Christian Braun




From leandro at dutra.fastmail.fm  Wed Feb 18 14:48:15 2004
From: leandro at dutra.fastmail.fm (Leandro =?utf-8?b?R3VpbWFyw6Nlcw==?= Faria Corcete Dutra)
Date: Wed, 18 Feb 2004 14:48:15 +0000 (UTC)
Subject: ext3 badness in 2.6.0-test2
References: <20030804142245.GA1627@nevyn.them.org> <20030804132219.2e0c53b4.akpm@osdl.org> <16176.41431.279477.273718@gargle.gargle.HOWL> <20030805235735.4c180fa4.akpm@osdl.org> <16178.63046.43567.551323@gargle.gargle.HOWL> <20030807181631.2962dfca.akpm@osdl.org>
Message-ID: <loom.20040218T153545-132@post.gmane.org>

Andrew Morton <akpm <at> osdl.org> writes:

> 
> Neil Brown <neilb <at> cse.unsw.edu.au> wrote:
> >
> > On Tuesday August 5, akpm <at> osdl.org wrote:
> > > Neil Brown <neilb <at> cse.unsw.edu.au> wrote:
> > > > ...
> > > > Aug  6 15:22:05 adams kernel: EXT3-fs error (device md1):
ext3_add_entry: bad entry in directory #41
> > > > 009295: rec_len is smaller than minimal - offset=0, inode=3265411686,
rec_len=0, name_len=0
> > > 
> > > It looks like we had a block full of zeroes come back from the device
> > > driver.  I find it distinctly fishy how this happens so much with
> > > ext3-on-md, and so little with ext3-on-just-a-disk.
> > 
> > Well, they're not *all* zero.....
> > 
> > I can reproduce this easily with various configurations of ext3 over
> > raid5, and get a similar problem with ext2 over raid5 (corrupt inodes
> > rather than directory entries) but ext3 over raid0 is rock-solid.
> 
> Good news that it is reproducible.

Has anything ever come out of this?

I am setting up a database server with RAID5, LVM2 and ext3fs, and has just
stumbled upon this issue.  Now I am nervous about proceeding.  If there is a
patch in some 2.6.3-rc I'd just upgrade, otherwise perhaps I have to go back to
2.4.24?

I have seen people reporting this with 2.6.0 and 2.6.1 very recently, nothing on
2.6.2 yet.  I took a look at the 2.6.2 ChangeLog, found nothing that seemed
relevant.


> Have you tried running fsx-linux?  It is good at picking up data loss.

I will try this as soon as I finish memtest86 and cpuburn.

Meanwhile any information welcome.




From sct at redhat.com  Thu Feb 19 21:16:13 2004
From: sct at redhat.com (Stephen C. Tweedie)
Date: 19 Feb 2004 21:16:13 +0000
Subject: ext3 Overhead
In-Reply-To: <OF79BBF471.9842B507-ONC1256E3F.0046FDE8-C1256E3F.0047848E@ch.abb.com>
References: 	 <OF79BBF471.9842B507-ONC1256E3F.0046FDE8-C1256E3F.0047848E@ch.abb.com>
Message-ID: <1077225373.2070.834.camel@sisko.scot.redhat.com>

Hi,

On Thu, 2004-02-19 at 13:01, christian.braun at ch.abb.com wrote:

> Well, as I said my CF card does wear-levelling, so that's not to worry 
> about. Still, as you said, there is a difference in the number of write 
> accesses between ext2 and ext3... I just need to know in what region that 
> difference is... is it 3 times... or 30 times... or 300... or even more?

For data, there's no difference --- unless you're in data=journal mode
--- except for the fact that ext3 usually starts flushing stuff to disk
earlier than ext2, which can mean that ext3 writes temporary data more
often than ext2.  For metadata, I'd expect ext3 is at most twice the
writes of ext2 in most circumstances, but it's not something I've ever
measured.

--Stephen





From bothie at gmx.de  Thu Feb 19 21:25:35 2004
From: bothie at gmx.de (Bodo Thiesen)
Date: Thu, 19 Feb 2004 22:25:35 +0100
Subject: ext3 Overhead
In-Reply-To: <1077225373.2070.834.camel@sisko.scot.redhat.com>
References: <OF79BBF471.9842B507-ONC1256E3F.0046FDE8-C1256E3F.0047848E@ch.abb.com>
	<1077225373.2070.834.camel@sisko.scot.redhat.com>
Message-ID: <200402192125.i1JLP1b07955@mx1.redhat.com>

Hello.

"Stephen C. Tweedie" <sct at redhat.com> wrote:

>On Thu, 2004-02-19 at 13:01, christian.braun at ch.abb.com wrote:
>
>> Well, as I said my CF card does wear-levelling, so that's not to worry 
>> about. Still, as you said, there is a difference in the number of write 
>> accesses between ext2 and ext3... I just need to know in what region that 
>> difference is... is it 3 times... or 30 times... or 300... or even more?
>
> For data, there's no difference --- unless you're in data=journal mode
> --- except for the fact that ext3 usually starts flushing stuff to disk
> earlier than ext2, which can mean that ext3 writes temporary data more
> often than ext2.  For metadata, I'd expect ext3 is at most twice the
> writes of ext2 in most circumstances, but it's not something I've ever
> measured.

Most probably even more. Imagine deleting a directory recursively. On ext2 
most unlink operations will cause only one write operation to disk. On ext3 
each unlink operation creates at least one extra journal entry (plus the 
write of the directory blocks). Same for modifications to the block bitmaps 
and so on.

Regards, Bodo




From lfabio_ext3users at smiling-web.com  Sun Feb 22 00:53:07 2004
From: lfabio_ext3users at smiling-web.com (Luigi Fabio)
Date: Sat, 21 Feb 2004 19:53:07 -0500
Subject: Crashed filesystem - directory recovery
Message-ID: <4037B723.28990.D89EB49@localhost>

Hello folks,
I had an ext3 filesystem mounted as the root of a Linux MOO server. 
Unfortunately, the filesystem was on one of the infamous DTLA-3070xx 
drives - and the drive decided to fail at the worst moment it 
possibly could, trashing the filesystem fairly well.
The situation is as follows: I used dd_rescue to create an image of 
what is left of the filesystem, but I ended up with some 65MB of 
'holes' in the image. Among the 'holes' is the sector that hosts a 
directory, /home/weyrmount/MOO (indeed, on the original drive, trying 
to CD into that gives IO Error)
That directory contained three files, plus an 'arch' directory. Now, 
while I understand that recovering the MOO dir itself is unrealistic, 
is there any way I could recover the arch dir - and the files 
therein? Any help will be greatly appreciated.

Regards,
Luigi Fabio - lfabio at smiling-web.com




From adilger at clusterfs.com  Sun Feb 22 01:56:50 2004
From: adilger at clusterfs.com (Andreas Dilger)
Date: Sat, 21 Feb 2004 18:56:50 -0700
Subject: Crashed filesystem - directory recovery
In-Reply-To: <4037B723.28990.D89EB49@localhost>
References: <4037B723.28990.D89EB49@localhost>
Message-ID: <20040222015650.GC17735@schnapps.adilger.int>

On Feb 21, 2004  19:53 -0500, Luigi Fabio wrote:
> I had an ext3 filesystem mounted as the root of a Linux MOO server. 
> Unfortunately, the filesystem was on one of the infamous DTLA-3070xx 
> drives - and the drive decided to fail at the worst moment it 
> possibly could, trashing the filesystem fairly well.
> The situation is as follows: I used dd_rescue to create an image of 
> what is left of the filesystem, but I ended up with some 65MB of 
> 'holes' in the image. Among the 'holes' is the sector that hosts a 
> directory, /home/weyrmount/MOO (indeed, on the original drive, trying 
> to CD into that gives IO Error)
> That directory contained three files, plus an 'arch' directory. Now, 
> while I understand that recovering the MOO dir itself is unrealistic, 
> is there any way I could recover the arch dir - and the files 
> therein? Any help will be greatly appreciated.

If you run e2fsck on the copied device, you should find any unattached
items put into lost+found.  Of course, it is also possible that files
in "arch" are also corrupted, depedning on location of 65MB of holes.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




From lfabio_ext3users at smiling-web.com  Sun Feb 22 02:59:08 2004
From: lfabio_ext3users at smiling-web.com (Luigi Fabio)
Date: Sat, 21 Feb 2004 21:59:08 -0500
Subject: Crashed filesystem - directory recovery
In-Reply-To: <20040222015650.GC17735@schnapps.adilger.int>
References: <4037B723.28990.D89EB49@localhost>
Message-ID: <4037D4AC.19956.DFD48ED@localhost>

On 21 Feb 2004 at 18:56, Andreas Dilger wrote:
> If you run e2fsck on the copied device, you should find any unattached
> items put into lost+found.  Of course, it is also possible that files
> in "arch" are also corrupted, depedning on location of 65MB of holes.
Interestingly enough, in lost+found there is no trace of arch - or of 
any of the 18xx files it contained. There is, however, a subdir of 
arch - I have to wonder, it seems very curious that only that single 
file out of 1800 managed to survive. I do get asked by fsck to clear 
a few inodes with too many errors at the beginning - would answering 
'no' there help, perhaps?
 
> Cheers, Andreas
> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/

Regards,
Luigi Fabio - lfabio at smiling-web.com




From bothie at gmx.de  Sun Feb 22 11:20:01 2004
From: bothie at gmx.de (Bodo Thiesen)
Date: Sun, 22 Feb 2004 12:20:01 +0100
Subject: Crashed filesystem - directory recovery
In-Reply-To: <4037D4AC.19956.DFD48ED@localhost>
References: <4037B723.28990.D89EB49@localhost>
	<4037D4AC.19956.DFD48ED@localhost>
Message-ID: <200402221119.i1MBJQb12044@mx1.redhat.com>

Hello.

"Luigi Fabio" <lfabio_ext3users at smiling-web.com> wrote:

> Interestingly enough, in lost+found there is no trace of arch - or of 
> any of the 18xx files it contained. There is, however, a subdir of 
> arch - I have to wonder, it seems very curious that only that single 
> file out of 1800 managed to survive. I do get asked by fsck to clear 
> a few inodes with too many errors at the beginning - would answering 
> 'no' there help, perhaps?

If you already answered 'yes', than you have to recopy the image from the 
broken hard disk again, because saying 'yes' simply means to delete the 
inode. On the other hand: If the inodes management areas are in broken 
blocks, than they cannot be rescued at all. How important are the files? 
(There are companies, which are specialised on data rescue, which are 
able to do things, which you cannot - but it's expensive as everything 
needs to be done by hand ;-)

Regards, Bodo




From lfabio_ext3users at smiling-web.com  Sun Feb 22 13:51:20 2004
From: lfabio_ext3users at smiling-web.com (Luigi Fabio)
Date: Sun, 22 Feb 2004 08:51:20 -0500
Subject: Crashed filesystem - directory recovery
In-Reply-To: <200402221119.i1MBJQb12044@mx1.redhat.com>
References: <4037D4AC.19956.DFD48ED@localhost>
Message-ID: <40386D88.18281.105264E2@localhost>

On 22 Feb 2004 at 12:20, Bodo Thiesen wrote:
> If you already answered 'yes', than you have to recopy the image from the 
> broken hard disk again, because saying 'yes' simply means to delete the 
> inode. On the other hand: If the inodes management areas are in broken 
> blocks, than they cannot be rescued at all. How important are the files? 
> (There are companies, which are specialised on data rescue, which are 
> able to do things, which you cannot - but it's expensive as everything 
> needs to be done by hand ;-)
Yes, I realise. I did copy the image again from a backup - several 
times - but I didn't get very far. I answered no and told fsck to fix 
the filetype instead when it asked me, but I only was able to recover 
one file from the arch dir - one much too old to be of interest (as a 
matter of fact, fsck did not stop when I answered no, I recalled 
correctly). At this point, any ideas are welcome - both for recovery 
of this filesystem and as a choice for future filesystems, because 
ext3 has shown itself to be far too brittle for critical usage.
To answer your last question - I asked a data recovery place, in the 
hopes that they can mount the platters on a working head/spindle 
system to retrieve the data that I could not, but I am still waiting 
for an estimate. Per past experiences, unless they can gain data that 
way, data recovery places are much *less* effective than in house 
solutions, because they can't even remotely afford to dedicate the 
amount of time that we do in house to the problem while keeping their 
bills acceptable.

> Regards, Bodo

Regards,
Luigi Fabio - lfabio at smiling-web.com




From sct at redhat.com  Thu Feb 26 16:50:21 2004
From: sct at redhat.com (Stephen C. Tweedie)
Date: 26 Feb 2004 16:50:21 +0000
Subject: ext3 Overhead
In-Reply-To: <200402192125.i1JLP1b07955@mx1.redhat.com>
References: 	 <OF79BBF471.9842B507-ONC1256E3F.0046FDE8-C1256E3F.0047848E@ch.abb.com>
	 <1077225373.2070.834.camel@sisko.scot.redhat.com>
	 <200402192125.i1JLP1b07955@mx1.redhat.com>
Message-ID: <1077814220.2040.398.camel@sisko.scot.redhat.com>

Hi,

On Thu, 2004-02-19 at 21:25, Bodo Thiesen wrote:

> Most probably even more. Imagine deleting a directory recursively. On ext2 
> most unlink operations will cause only one write operation to disk. On ext3 
> each unlink operation creates at least one extra journal entry (plus the 
> write of the directory blocks).

That's true to some extent --- but remember that journal writes are
batched into big sequential IOs.  So for a whole 5 seconds' worth of
truncates, all of the separate bitmap updates for a single bitmap block
get coalesced; and then all of the separate pieces of metadata that were
touched are themselves coalesced into one journal IO.  And disk are
*MUCH* faster at sequential IO than random metadata IO.

So for embedded flash IOs, the overhead of the extra metadata IO is
significant, because flash doesn't (much) care about seeks.  But for
normal disks, the actual impact is very much less.

--Stephen





From mfedyk at matchmail.com  Thu Feb 26 19:05:30 2004
From: mfedyk at matchmail.com (Mike Fedyk)
Date: Thu, 26 Feb 2004 11:05:30 -0800
Subject: ext3 Overhead
In-Reply-To: <1077225373.2070.834.camel@sisko.scot.redhat.com>
References: <OF79BBF471.9842B507-ONC1256E3F.0046FDE8-C1256E3F.0047848E@ch.abb.com> <1077225373.2070.834.camel@sisko.scot.redhat.com>
Message-ID: <403E437A.8070900@matchmail.com>

Stephen C. Tweedie wrote:
> Hi,
> 
> On Thu, 2004-02-19 at 13:01, christian.braun at ch.abb.com wrote:
> 
> 
>>Well, as I said my CF card does wear-levelling, so that's not to worry 
>>about. Still, as you said, there is a difference in the number of write 
>>accesses between ext2 and ext3... I just need to know in what region that 
>>difference is... is it 3 times... or 30 times... or 300... or even more?
> 
> 
> For data, there's no difference --- unless you're in data=journal mode
> --- except for the fact that ext3 usually starts flushing stuff to disk
> earlier than ext2, which can mean that ext3 writes temporary data more
> often than ext2.  For metadata, I'd expect ext3 is at most twice the
> writes of ext2 in most circumstances, but it's not something I've ever
> measured.

That can be configured to 30sec in the source too.  Has the patch to 
make that a mount option made it into upstream?




From sct at redhat.com  Thu Feb 26 19:17:56 2004
From: sct at redhat.com (Stephen C. Tweedie)
Date: 26 Feb 2004 19:17:56 +0000
Subject: ext3 Overhead
In-Reply-To: <403E437A.8070900@matchmail.com>
References: 	 <OF79BBF471.9842B507-ONC1256E3F.0046FDE8-C1256E3F.0047848E@ch.abb.com>
	 <1077225373.2070.834.camel@sisko.scot.redhat.com>
	 <403E437A.8070900@matchmail.com>
Message-ID: <1077823075.2040.573.camel@sisko.scot.redhat.com>

Hi,

On Thu, 2004-02-26 at 19:05, Mike Fedyk wrote:

> That can be configured to 30sec in the source too.  Has the patch to 
> make that a mount option made it into upstream?

Yes, and "mount -o remount,commit=<seconds>" should work too even
without an unmount.

--Stephen