From evil at g-house.de  Sun Sep  3 13:20:03 2006
From: evil at g-house.de (Christian Kujau)
Date: Sun, 3 Sep 2006 14:20:03 +0100 (BST)
Subject: [OT] Re: Partitioning for ext3fs
In-Reply-To: <52103.216.160.118.81.1157059001.squirrel@216.160.118.81>
References: <4301.216.160.118.81.1156937128.squirrel@216.160.118.81>   
	<Pine.LNX.4.64.0608302340150.12865@sheep.housecafe.de>
	<52103.216.160.118.81.1157059001.squirrel@216.160.118.81>
Message-ID: <Pine.LNX.4.64.0609031405320.21585@sheep.housecafe.de>

[please reply on-list, so that other ppl can help too]

On Thu, 31 Aug 2006, david  cooke wrote:
> By hang, I mean the boot process will not go any further if I
> turn on the USB during boot. Whatever boot happens to be

Hm, too bad :(
But I'd suggest to discuss this issue on some FC forum, usb-list or even 
linux-kernel.

>> I don't know if I understand you correctly: you've upgraded to FC5 and
>> the external (USB? SATA?) drive still "does not work"?

> Typing fdisk /dev/sdc gives us
> Unable to open /dev/sdc

so, your OS (linux, FC) does not seem to be aware of your usb-disk 
(sdc) or the driver crashed. try to check dmesg/messages for related 
information and pass it on to one of the above mentioned lists.

> There is a light on in the USB so I think it's on...Yes, it is on.

That's good ;)

> Typing fdisk /dev/sda results in
> The number of cylinders for this disk is set to 19457.
[...]

OK, so sda (sata disk?) is doing well. this is good ;)

> Now, I keep thinking this is my primary so don't to mess around here.
> Is this the USB?

typing "dmesg | grep sd" after booting should reveal which disk got 
initialized with which "sdX" name.

> A different subject and we can cross that bridge later, but
> See above where the cylinders are set to nineteen thousand plus and not
> (I guess the usual) 1024? Can it be fixed?

This is what the help-messages above is in place:

"The number of cylinders for this disk is set to 19457.
  There is nothing wrong with that, but this is larger than 1024,
  and could in certain setups cause problems with:
  1) software that runs at boot time (e.g., old versions of LILO)
  2) booting and partitioning software from other OSs
    (e.g., DOS FDISK, OS/2 FDISK)"

So, why would you change anything? do you have DOS, OS/2 fdisk? do you
have old version of lilo? (FC uses GRUB anyway, IIRC).

greetings,
Christian.
-- 
BOFH excuse #115:

your keyboard's space bar is generating spurious keycodes.


From evilninja at gmx.net  Sun Sep  3 18:25:43 2006
From: evilninja at gmx.net (Christian)
Date: Sun, 3 Sep 2006 19:25:43 +0100 (BST)
Subject: Stress testing for ext3?
In-Reply-To: <F82344282A084F4C98F7235E896FEA5A62AE10@chill.shore.mbari.org>
References: <F82344282A084F4C98F7235E896FEA5A62AE10@chill.shore.mbari.org>
Message-ID: <Pine.LNX.4.64.0609031916500.26360@sheep.housecafe.de>

On Thu, 31 Aug 2006, Kieft, Brian wrote:
> Does anyone know of a good method for exercising
> an ext3 file system?

I'm not aware of such a "torture" tool, but any long run of your 
real-world-application of choice, some benchmarks or heavy operation on 
a big source tree or so should do no harm to any in-kernel 
rw-filesystem.

> Perhaps something that involves power removal in between commits
> or in the midst of a write,

start any of the things mentioned above and pull the plug ;)
maybe "reboot -f" could simulate this:

   -f     Force halt or reboot, don't call shutdown(8).

but I've never tried that and don't know if it will KILL running 
processes before rebooting.

> and then checks for corrupt data. Do any utilities exist for this?

fsck.ext[23] will do that for the fs structure. you could use diff(1) 
against a known-to-be-good filesystem to verify that all data is in
place.

Christian.
-- 
BOFH excuse #325:

Your processor does not develop enough heat.


From evilninja at gmx.net  Sun Sep  3 18:29:44 2006
From: evilninja at gmx.net (Christian)
Date: Sun, 3 Sep 2006 19:29:44 +0100 (BST)
Subject: Ext3 emergency recovery
In-Reply-To: <DE4AB11F-EAD7-43FA-B2A9-BEAD718E8ECB@atlas.st>
References: <DE4AB11F-EAD7-43FA-B2A9-BEAD718E8ECB@atlas.st>
Message-ID: <Pine.LNX.4.64.0609031926230.26360@sheep.housecafe.de>

On Tue, 29 Aug 2006, Adam Atlas wrote:
> I have a damaged Ext3 filesystem which fsck has not been able to recover.

maybe the information *how* the fs went corrupt could help. posting a 
fsck log is also nice...

> Up to group 95. Some say "SEVERE DATA LOSS POSSIBLE."

are you using the latest e2fsprogs? latest kernel? i386 or something 
more exotic?

> filesystem and tried answering yes to all of them; it ended up just erasing 
> the whole thing.

is there nothing in lost+found?

-- 
BOFH excuse #325:

Your processor does not develop enough heat.


From mr._x at shaw.ca  Sun Sep  3 20:39:15 2006
From: mr._x at shaw.ca (..:::BeOS Mr. X:::..)
Date: Sun, 03 Sep 2006 13:39:15 -0700
Subject: Stress testing for ext3?
In-Reply-To: <F82344282A084F4C98F7235E896FEA5A62AE10@chill.shore.mbari.org>
References: <F82344282A084F4C98F7235E896FEA5A62AE10@chill.shore.mbari.org>
Message-ID: <44FB3D73.30408@shaw.ca>

I know of a method to continously execute a command, maybe doing a full 
listing of the drive's contents will heat the drives up, but I am not 
sure about the error checking part. Here is what I would do:
while [ 1 -eq 1 ]; do ls  -shw9 -R; done

Hope this helps!

Mr. X

Kieft, Brian wrote:
> Does anyone know of a good method for exercising an ext3 file system? 
> Perhaps something that involves power removal in between commits or in 
> the midst of a write, and then checks for corrupt data. Do any utilities 
> exist for this?
> 
>  
> 
> Thanks!
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


From neilb at cse.unsw.edu.au  Mon Sep  4 10:17:55 2006
From: neilb at cse.unsw.edu.au (Neil Brown)
Date: Mon, 4 Sep 2006 20:17:55 +1000
Subject: debian unstable & ext3
In-Reply-To: message from Larry McVoy on Thursday August 31
References: <20060831175939.GF27660@bitmover.com>
Message-ID: <17659.64851.51385.204395@cse.unsw.edu.au>


(posting again from my subscribed address as it is a members' only
list - grumble)
On Thursday August 31, lm at bitmover.com wrote:
> I'm running 
> 
> Linux travis 2.6.15-1-686 #2 Mon Mar 6 15:27:08 UTC 2006 i686 GNU/Linux
> 
> on a laptop with ext3 on /
> 
> Some time ago things started getting weird in the following way: I do a
> fairly normal hack, ^Z, make, test loop when developing and it seems
> that vim is calling fsync or sync and that is then flushing everything
> to disk.  My tests create maybe 10 dozen files in ~30MB and for some
> reason this is taking 4 seconds to flush.
> 
> I'm not sure if ext3, the kernel, or vim is the problem.  I already
> googled and set
> 
> set swapsync=sync
> set nofsync
> 
> in my .exrc but that hasn't helped.
> 
> Has anyone else seen this and do they have a work around?  I'm about to
> switch to reiserfs and that's a lot of fuss for what should be a simple
> problem (I hope).

I've noticed this sort of problem, but it hasn't yet been enough to
make me explore very far....

One thing worth a try is to mount with data=writeback.

Though I sustem that 
  set swapsync=
might be fastest, but might not be what you want.

NeilBrown


From neilb at suse.de  Mon Sep  4 10:09:57 2006
From: neilb at suse.de (Neil Brown)
Date: Mon, 4 Sep 2006 20:09:57 +1000
Subject: debian unstable & ext3
In-Reply-To: message from Larry McVoy on Thursday August 31
References: <20060831175939.GF27660@bitmover.com>
Message-ID: <17659.64373.492887.356430@cse.unsw.edu.au>

On Thursday August 31, lm at bitmover.com wrote:
> I'm running 
> 
> Linux travis 2.6.15-1-686 #2 Mon Mar 6 15:27:08 UTC 2006 i686 GNU/Linux
> 
> on a laptop with ext3 on /
> 
> Some time ago things started getting weird in the following way: I do a
> fairly normal hack, ^Z, make, test loop when developing and it seems
> that vim is calling fsync or sync and that is then flushing everything
> to disk.  My tests create maybe 10 dozen files in ~30MB and for some
> reason this is taking 4 seconds to flush.
> 
> I'm not sure if ext3, the kernel, or vim is the problem.  I already
> googled and set
> 
> set swapsync=sync
> set nofsync
> 
> in my .exrc but that hasn't helped.
> 
> Has anyone else seen this and do they have a work around?  I'm about to
> switch to reiserfs and that's a lot of fuss for what should be a simple
> problem (I hope).

I've noticed this sort of problem, but it hasn't yet been enough to
make me explore very far....

One thing worth a try is to mount with data=writeback.

Though I sustem that 
  set swapsync=
might be fastest, but might not be what you want.

NeilBrown


From evilninja at gmx.net  Tue Sep  5 09:48:31 2006
From: evilninja at gmx.net (Christian)
Date: Tue, 5 Sep 2006 10:48:31 +0100 (BST)
Subject: debian unstable & ext3
In-Reply-To: <20060831175939.GF27660@bitmover.com>
References: <20060831175939.GF27660@bitmover.com>
Message-ID: <Pine.LNX.4.64.0609051040340.7225@prinz64.housecafe.de>

[resent to ext3-users at redhat.com]

On Thu, 31 Aug 2006, Larry McVoy wrote:
> Some time ago things started getting weird in the following way: I do a
> fairly normal hack, ^Z, make, test loop when developing and it seems
----------------------^ this would STOP your editor (vi), but do you :w 
before you do this?

> that vim is calling fsync or sync

you can start vim via strace(1) to find out which one is called.

> and that is then flushing everything to disk. My tests create maybe 10
> dozen files in ~30MB and for some reason this is taking 4 seconds to
> flush.

How full is the fs, maybe fragmentation is bad or the 4 sec are even 
I/O-bound? What mount-options are used?

It'd be intresting to reproduce this behaviour on a fresh filesystem.

> I'm about to switch to reiserfs and that's a lot of fuss for what should

Let us know if this solved the problem ;)

Christian.
-- 
BOFH excuse #277:

Your Flux Capacitor has gone bad.


From evilninja at gmx.net  Tue Sep  5 10:50:51 2006
From: evilninja at gmx.net (Christian)
Date: Tue, 5 Sep 2006 11:50:51 +0100 (BST)
Subject: Stress testing for ext3?
In-Reply-To: <44FB3D73.30408@shaw.ca>
References: <F82344282A084F4C98F7235E896FEA5A62AE10@chill.shore.mbari.org>
	<44FB3D73.30408@shaw.ca>
Message-ID: <Pine.LNX.4.64.0609051148110.7225@prinz64.housecafe.de>

On Sun, 3 Sep 2006, ..:::BeOS Mr. X:::.. wrote:
> I know of a method to continously execute a command, maybe doing a full 
> listing of the drive's contents will heat the drives up, but I am not sure 
> about the error checking part. Here is what I would do:
> while [ 1 -eq 1 ]; do ls  -shw9 -R; done

The directory liting will be cached, after the first run the disk 
should not be touched any more (try it out...). Also, when you're not 
redirecting the output to somewhere else (e.g. /dev/null), the terminal 
displaying the output will be the bottleneck and not the fs or the 
disk...

-- 
BOFH excuse #34:

(l)user error


From herta.vandeneynde at cc.kuleuven.be  Tue Sep  5 13:09:38 2006
From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde)
Date: Tue, 05 Sep 2006 15:09:38 +0200
Subject: Stress testing for ext3?
In-Reply-To: <Pine.LNX.4.64.0609051148110.7225@prinz64.housecafe.de>
References: <F82344282A084F4C98F7235E896FEA5A62AE10@chill.shore.mbari.org>	<44FB3D73.30408@shaw.ca>
	<Pine.LNX.4.64.0609051148110.7225@prinz64.housecafe.de>
Message-ID: <44FD7712.7040509@cc.kuleuven.be>

Christian wrote:
> On Sun, 3 Sep 2006, ..:::BeOS Mr. X:::.. wrote:
> 
>> I know of a method to continously execute a command, maybe doing a 
>> full listing of the drive's contents will heat the drives up, but I am 
>> not sure about the error checking part. Here is what I would do:
>> while [ 1 -eq 1 ]; do ls  -shw9 -R; done
> 
> 
> The directory liting will be cached, after the first run the disk should 
> not be touched any more (try it out...). Also, when you're not 
> redirecting the output to somewhere else (e.g. /dev/null), the terminal 
> displaying the output will be the bottleneck and not the fs or the disk...
> 

A colleague of mine reported he got ext3 to bail out while repeatedly 
recompiling the kernel.  He enabled all kernel modules, and then ran:

# while true; do make clean; make -j18; done

The filesystem ended up being mounted ro.  The fsck at reboot moved some 
files to lost+found, after which the filesystem could be used again.

Kind regards,

Herta

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm


From tweeks at rackspace.com  Tue Sep  5 20:53:52 2006
From: tweeks at rackspace.com (tweeks)
Date: Tue, 5 Sep 2006 15:53:52 -0500
Subject: IO lockups and ext3 readonly filecorruption on RHEL4 (pre and post
	U4)
Message-ID: <200609051553.52276.tweeks@rackspace.com>

Has anyone been seeing IO lockup problems on EL4?  

I've tried multiple IO scheduler options (elevator=) in the boot... I'm seeing 
the same behavior regardless.  Independent of hardware.  Whitebox ATA, HA 
enclosure with dedicated SCSI, megaraid RAID hardware, Dell 2850s... same 
behavior:

A semi-busy system will suddenly go into some kind of IO la-la land where 
nothing can be written to disk for >1hour.  Of course when this happens, the 
ext3 kernel module freaks out and remounts all the filesystems as readonly.  
Then when the system is rebooted, if the system is allowed to fsck, the 
journal is hosed and the filesystem eats itself.  Moving them off the RH 
kernel all together seems to fix the problem, but I have not found a way to 
reproduce the problem yet (burning and stress testing doesn't seem to make it 
appear), so real re-testing is difficult at best.

It's become so big of a problem that we're moving some customers that require 
rock solid systems either over to RHEL3, or off RH and over to SLES or other 
distro with a non-RH kernel.  

Just the ext3 problem (minus the IO lockup part) can be seen in other BZ 
tickets:
	https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175877
(when the filesystem fills up)

Has anyone seen these type of IO lockups + ext3 corruption on RHEL4?  
Can you reproduce it?

Tweeks


From richard.c.wolber at boeing.com  Tue Sep  5 21:19:04 2006
From: richard.c.wolber at boeing.com (Wolber, Richard C)
Date: Tue, 5 Sep 2006 14:19:04 -0700
Subject: IO lockups and ext3 readonly filecorruption on RHEL4 (pre and
	postU4)
In-Reply-To: <200609051553.52276.tweeks@rackspace.com>
Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB9901324161@XCH-NW-5V2.nw.nos.boeing.com>

We're using the same systems with the same OS (well okay, actually
CentOS 4) and aren't seeing the same thing. 

2.6.14.3 #1 SMP PREEMPT Thu Dec 8 10:34:08 PST 2005 i686 i686 i386
GNU/Linux

..Chuck..
 

> -----Original Message-----
> From: tweeks [mailto:tweeks at rackspace.com] 
> Sent: Tuesday, September 05, 2006 1:54 PM
> To: ext3-users at redhat.com
> Subject: IO lockups and ext3 readonly filecorruption on RHEL4 
> (pre and postU4)
> 
> Has anyone been seeing IO lockup problems on EL4?  
> 
> I've tried multiple IO scheduler options (elevator=) in the 
> boot... I'm seeing the same behavior regardless.  Independent 
> of hardware.  Whitebox ATA, HA enclosure with dedicated SCSI, 
> megaraid RAID hardware, Dell 2850s... same
> behavior:
> 
> A semi-busy system will suddenly go into some kind of IO 
> la-la land where nothing can be written to disk for >1hour.  
> Of course when this happens, the
> ext3 kernel module freaks out and remounts all the 
> filesystems as readonly.  
> Then when the system is rebooted, if the system is allowed to 
> fsck, the journal is hosed and the filesystem eats itself.  
> Moving them off the RH kernel all together seems to fix the 
> problem, but I have not found a way to reproduce the problem 
> yet (burning and stress testing doesn't seem to make it 
> appear), so real re-testing is difficult at best.
> 
> It's become so big of a problem that we're moving some 
> customers that require rock solid systems either over to 
> RHEL3, or off RH and over to SLES or other distro with a 
> non-RH kernel.  
> 
> Just the ext3 problem (minus the IO lockup part) can be seen 
> in other BZ
> tickets:
> 	https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175877
> (when the filesystem fills up)
> 
> Has anyone seen these type of IO lockups + ext3 corruption on RHEL4?  
> Can you reproduce it?
> 
> Tweeks
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
> 


From evilninja at gmx.net  Wed Sep  6 00:34:34 2006
From: evilninja at gmx.net (Christian)
Date: Wed, 6 Sep 2006 01:34:34 +0100 (BST)
Subject: IO lockups and ext3 readonly filecorruption on RHEL4 (pre a U4)
In-Reply-To: <200609051553.52276.tweeks@rackspace.com>
References: <200609051553.52276.tweeks@rackspace.com>
Message-ID: <Pine.LNX.4.64.0609060130090.7225@prinz64.housecafe.de>

On Tue, 5 Sep 2006, tweeks wrote:
> Has anyone been seeing IO lockup problems on EL4?

not using RHEL here, but...

> A semi-busy system will suddenly go into some kind of IO la-la land where
> nothing can be written to disk for >1hour.

ok, so ext3 will remount the fs to RO. this would happen if a panic() 
occurs? is there anything related in the logs? (if /var is RO too, try 
to setup a loghost).

> Then when the system is rebooted, if the system is allowed to fsck, the
> journal is hosed and the filesystem eats itself.

coud you be more specific? what does fsck.ext3 say? is there something 
in lost+found? remember to use latest version of e2fsprogs. have you 
tried a vanilla kernel yet?

-- 
BOFH excuse #289:

Interference between the keyboard and the chair.


From tweeks at rackspace.com  Tue Sep  5 23:29:03 2006
From: tweeks at rackspace.com (tweeks)
Date: Tue, 5 Sep 2006 18:29:03 -0500
Subject: IO lockups and ext3 readonly filecorruption on RHEL4 (pre and
	postU4)
In-Reply-To: <8C7C41A176AC0B468BEFB2EFD9BDAB9901324161@XCH-NW-5V2.nw.nos.boeing.com>
References: <8C7C41A176AC0B468BEFB2EFD9BDAB9901324161@XCH-NW-5V2.nw.nos.boeing.com>
Message-ID: <200609051829.04031.tweeks@rackspace.com>

On Tuesday 05 September 2006 04:19 pm, Wolber, Richard C wrote:
> We're using the same systems with the same OS (well okay, actually
> CentOS 4) and aren't seeing the same thing.
>
> 2.6.14.3 #1 SMP PREEMPT Thu Dec 8 10:34:08 PST 2005 i686 i686 i386
> GNU/Linux

On how many servers tho.  

We have several thousand.

Tweeks


From tytso at mit.edu  Wed Sep  6 05:45:43 2006
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 6 Sep 2006 01:45:43 -0400
Subject: debian unstable & ext3
In-Reply-To: <17659.64851.51385.204395@cse.unsw.edu.au>
References: <20060831175939.GF27660@bitmover.com>
	<17659.64851.51385.204395@cse.unsw.edu.au>
Message-ID: <20060906054543.GA20892@thunk.org>

[Sorry, in Germany and so my e-mail latency is slow... ]

On Mon, Sep 04, 2006 at 08:17:55PM +1000, Neil Brown wrote:
> On Thursday August 31, lm at bitmover.com wrote:
> > Some time ago things started getting weird in the following way: I do a
> > fairly normal hack, ^Z, make, test loop when developing and it seems
> > that vim is calling fsync or sync and that is then flushing everything
> > to disk.  My tests create maybe 10 dozen files in ~30MB and for some
> > reason this is taking 4 seconds to flush.
> > 
>
> One thing worth a try is to mount with data=writeback.

Or data=ordered.  What does "cat /proc/mounts" say?  

The fsync() operation results in a journal commit operation, and if
you're using "data=ordered" or "data=journaled", the data blocks will
be flushed to either their final location on disk or to the journal
before the journal is allowed to commit.

						- Ted


From tweeks at rackspace.com  Wed Sep  6 14:23:25 2006
From: tweeks at rackspace.com (tweeks)
Date: Wed, 6 Sep 2006 09:23:25 -0500
Subject: IO lockups and ext3 readonly filecorruption on RHEL4 (pre a U4)
In-Reply-To: <Pine.LNX.4.64.0609060130090.7225@prinz64.housecafe.de>
References: <200609051553.52276.tweeks@rackspace.com>
	<Pine.LNX.4.64.0609060130090.7225@prinz64.housecafe.de>
Message-ID: <200609060923.25597.tweeks@rackspace.com>

On Tuesday 05 September 2006 07:34 pm, Christian wrote:


> ok, so ext3 will remount the fs to RO. this would happen if a panic()
> occurs? 

These boxes are not panicing.  IO (or O actually) seems to come to a complete 
stop, the system can't sync.. the journal becomes out of sync.. ext3 freaks 
and re-mounts RO, and eventually the system becomes mostly unresponsive (as 
no new processes can be properly started.  Graceful rebooting becomes a 
problem, and eventual reboots find the unsync'd disc very hard to fsck 
successfully.

> is there anything related in the logs? 

No.. they're read only.

> (if /var is RO too, try  
> to setup a loghost).

We may try that as we already have a shared NetDump server set up.
Can i do syslog to BOTH the local machine AND a network syslog server.  If the 
local logs are locked, will my writing to a remote host still work?

> coud you be more specific? what does fsck.ext3 say? 

It shows thousands of de-linked files being found.  But I have not witnessed 
this first hand, as I am not in front of the console on these machines.  But 
I'll ask.

> is there something 
> in lost+found? 

I'm assuming yes.

> remember to use latest version of e2fsprogs. have you 
> tried a vanilla kernel yet?

Well, yes.  But since it is thus far not able to be reliably reproduced, it's 
hard to tell what works and what doesn't.  If anyone who understands the 
nature of this problem has any suggestions for reliably triggering it, then 
please speak up.

Tim:
You mentioned some type of forced buffer flush patch last month... any ETA on 
this?


Tweeks
-- 
Thomas Weeks, Lead Sys. Engineer          The Managed Hosting Specialist(TM)  
Rackspace Managed Hosting                 http://www.rackspace.com/
Managed Service Innovation Team           Email:<tweeks_at!rackspace.c0m>
"We Fanatically Support Fanatical Support!" (w)210.447.4451 (f)210.447.4041


From mind at bi.lt  Thu Sep  7 07:25:40 2006
From: mind at bi.lt (Mindaugas)
Date: Thu, 7 Sep 2006 10:25:40 +0300
Subject: wiping of unused space on ext3
Message-ID: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>


  Hello,

  I was asked if it is possible to zero unused space in ext3 partition?

  Users write to the server via Samba and are far from computer geeks so
teaching them to use some safedelete utility is quite impossible.

  Is there some way or utility to wipe out all the data from unused space?

  Thanks,

  Mindaugas


From bryan at kadzban.is-a-geek.net  Thu Sep  7 10:58:25 2006
From: bryan at kadzban.is-a-geek.net (Bryan Kadzban)
Date: Thu, 07 Sep 2006 06:58:25 -0400
Subject: wiping of unused space on ext3
In-Reply-To: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
Message-ID: <44FFFB51.7050809@kadzban.is-a-geek.net>

Mindaugas wrote:
> Hello,
> 
> I was asked if it is possible to zero unused space in ext3 partition?

Easiest way I can think of is:

cat /dev/zero >/fsmountpoint/temp-file

Then, after you get the inevitable error that the disk is full:

rm /fsmountpoint/temp-file

Of course this should probably be done while nobody else is trying to
create or enlarge a file, otherwise they could get errors too...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20060907/2c1cf1e0/attachment.sig>

From richard.c.wolber at boeing.com  Thu Sep  7 14:53:07 2006
From: richard.c.wolber at boeing.com (Wolber, Richard C)
Date: Thu, 7 Sep 2006 07:53:07 -0700
Subject: wiping of unused space on ext3
In-Reply-To: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB9901324173@XCH-NW-5V2.nw.nos.boeing.com>

> From: Mindaugas [mailto:mind at bi.lt] 
> Sent: Thursday, September 07, 2006 12:26 AM
> To: ext3-users at redhat.com
> Cc: CentOS mailing list
> Subject: wiping of unused space on ext3
> 
> 
>   Hello,
> 
>   I was asked if it is possible to zero unused space in ext3 
> partition?
> 
>   Users write to the server via Samba and are far from 
> computer geeks so teaching them to use some safedelete 
> utility is quite impossible.
> 
>   Is there some way or utility to wipe out all the data from 
> unused space?


I believe you can use the "chattr -s" command to mark all of the files
so that
when they are deleted, their blocks are wiped with zeros. I believe that
you'd
need to set up some sort of cron job to make sure all of the files have
this
attribute set on a regular basis, unless this works as a directory level
attribute.

..Chuck..


From adilger at clusterfs.com  Thu Sep  7 21:15:06 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 7 Sep 2006 15:15:06 -0600
Subject: wiping of unused space on ext3
In-Reply-To: <8C7C41A176AC0B468BEFB2EFD9BDAB9901324173@XCH-NW-5V2.nw.nos.boeing.com>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<8C7C41A176AC0B468BEFB2EFD9BDAB9901324173@XCH-NW-5V2.nw.nos.boeing.com>
Message-ID: <20060907211506.GR6441@schatzie.adilger.int>

On Sep 07, 2006  07:53 -0700, Wolber, Richard C wrote:
> I believe you can use the "chattr -s" command to mark all of the files
> so that when they are deleted, their blocks are wiped with zeros.

In theory yes, but this has never been implemented.

> I believe that you'd need to set up some sort of cron job to make sure
> all of the files have this attribute set on a regular basis, unless
> this works as a directory level attribute.

It should be inherited from the parent.  It is not currently functional
and is unlikely to ever make it into the kernel.  Instead, write a
shared library that hooks "unlink" and have it wipe your files from
userspace.  I think there is already a "libtrashcan" or similar that
will allow undelete in the same manner.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From richard.c.wolber at boeing.com  Thu Sep  7 21:21:33 2006
From: richard.c.wolber at boeing.com (Wolber, Richard C)
Date: Thu, 7 Sep 2006 14:21:33 -0700
Subject: wiping of unused space on ext3
In-Reply-To: <20060907211506.GR6441@schatzie.adilger.int>
Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB9901324180@XCH-NW-5V2.nw.nos.boeing.com>

> On Sep 07, 2006  07:53 -0700, Wolber, Richard C wrote:
> > I believe you can use the "chattr -s" command to mark all 
> > of the files so that when they are deleted, their blocks 
> > are wiped with zeros.
> 
> In theory yes, but this has never been implemented.

*BLINK*

So let me get this straight. This feature is documented 
in the man page and works within the chattr command. It is
also noted when you do a "chattr -v". And yet it still has no
effect? I seriously wonder how many people are using this
"feature" without realizing that it has absolutely no
effect?

Is it worth my time to patch the documentation? Or is this the
forgotten stepchild of a development dispute that the parties
would ignore any sane input on?

..Chuck..


From matts at ksu.edu  Thu Sep  7 22:40:26 2006
From: matts at ksu.edu (Matt Stegman)
Date: Thu, 7 Sep 2006 17:40:26 -0500 (CDT)
Subject: wiping of unused space on ext3
In-Reply-To: <8C7C41A176AC0B468BEFB2EFD9BDAB9901324180@XCH-NW-5V2.nw.nos.boeing.com>
Message-ID: <Pine.GSO.4.44L.0609071640050.12664-100000@unix1.cc.ksu.edu>

Well, the manpage does say:

BUGS AND LIMITATIONS
       The `c', 's', and `u' attributes are not honored by the ext2 and
       ext3 filesystems  as implemented  in the current  mainline Linux
       kernels.  These attributes may be implemented in future versions
       ext2 and ext3.

If I remember right, it was dropped from the kernel because it was
incomplete - inode data isn't wiped, the journal isn't wiped, and if a
file is truncated, old data blocks weren't wiped.  I think these were
decided to be too difficult or too slow to implement, and so the feature
was dropped "for the time being."  It's been a while since I read the mail
thread on this though, and my memory may be faulty.

-- 
Matt Stegman

On Thu, 7 Sep 2006, Wolber, Richard C wrote:

> > On Sep 07, 2006  07:53 -0700, Wolber, Richard C wrote:
> > > I believe you can use the "chattr -s" command to mark all
> > > of the files so that when they are deleted, their blocks
> > > are wiped with zeros.
> >
> > In theory yes, but this has never been implemented.
>
> *BLINK*
>
> So let me get this straight. This feature is documented
> in the man page and works within the chattr command. It is
> also noted when you do a "chattr -v". And yet it still has no
> effect? I seriously wonder how many people are using this
> "feature" without realizing that it has absolutely no
> effect?
>
> Is it worth my time to patch the documentation? Or is this the
> forgotten stepchild of a development dispute that the parties
> would ignore any sane input on?
>
> ..Chuck..
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>
>


From adilger at clusterfs.com  Thu Sep  7 22:53:50 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 7 Sep 2006 16:53:50 -0600
Subject: wiping of unused space on ext3
In-Reply-To: <Pine.GSO.4.44L.0609071640050.12664-100000@unix1.cc.ksu.edu>
References: <8C7C41A176AC0B468BEFB2EFD9BDAB9901324180@XCH-NW-5V2.nw.nos.boeing.com>
	<Pine.GSO.4.44L.0609071640050.12664-100000@unix1.cc.ksu.edu>
Message-ID: <20060907225350.GA6441@schatzie.adilger.int>

On Sep 07, 2006  17:40 -0500, Matt Stegman wrote:
> Well, the manpage does say:

Also, right when 's' is defined, see Note:

	When  a  file  with  the  ?s? attribute set is deleted, its blocks are
	zeroed and written back to the disk.  Note: please make sure  to  read
	the bugs and limitations section at the end of this document.

> BUGS AND LIMITATIONS
>        The `c', 's', and `u' attributes are not honored by the ext2 and
>        ext3 filesystems  as implemented  in the current  mainline Linux
>        kernels.  These attributes may be implemented in future versions
>        ext2 and ext3.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From evilninja at gmx.net  Fri Sep  8 06:37:32 2006
From: evilninja at gmx.net (Christian)
Date: Fri, 8 Sep 2006 07:37:32 +0100 (BST)
Subject: IO lockups and ext3 readonly filecorruption on RHEL4 (pre a U4)
In-Reply-To: <200609060923.25597.tweeks@rackspace.com>
References: <200609051553.52276.tweeks@rackspace.com>
	<Pine.LNX.4.64.0609060130090.7225@prinz64.housecafe.de>
	<200609060923.25597.tweeks@rackspace.com>
Message-ID: <Pine.LNX.4.64.0609080653070.28204@sheep.housecafe.de>

sorry for my late reply....

On Wed, 6 Sep 2006, tweeks wrote:
> We may try that as we already have a shared NetDump server set up.
> Can i do syslog to BOTH the local machine AND a network syslog server.  If the
> local logs are locked, will my writing to a remote host still work?

yes, if syslogd/syslog-ng is still running, logging to the loghost 
should do. the network and the system has to be working of course.

> hard to tell what works and what doesn't.  If anyone who understands the
> nature of this problem has any suggestions for reliably triggering it, then
> please speak up.

without more details, it's hard to conclude anything, one can only 
*guess* :(

Christian.
-- 
BOFH excuse #377:

Someone hooked the twisted pair wires into the answering machine.


From rmy at tigress.co.uk  Fri Sep  8 09:01:27 2006
From: rmy at tigress.co.uk (Ron Yorston)
Date: Fri, 08 Sep 2006 10:01:27 +0100
Subject: wiping of unused space on ext3
In-Reply-To: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
Message-ID: <200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>

"Mindaugas" <mind at bi.lt> wrote:
>  I was asked if it is possible to zero unused space in ext3 partition?

I have a couple of patches that add a zerofree mount option to ext2 and
ext3 filesystems.  The ext2 version is much better tested and more
complete:  it zeros all file data blocks, directory blocks and extended
attributes (though not inode data).  The ext3 patch only handles file
data, not metadata.

I've been meaning to submit these to LKML, but since you ask let's give
them an airing here first.

Since this is being copied to the CentOS mailing list I should point
out that I also have versions of the patches that apply cleanly to the
RHEL 4 kernel.  I don't have them to hand at the moment but if there's
any interest I can provide them later.

Some background information and other tools are on my website:

   http://intgat.tigress.co.uk/rmy/uml/index.html

Ron


From rmy at tigress.co.uk  Fri Sep  8 09:04:05 2006
From: rmy at tigress.co.uk (Ron Yorston)
Date: Fri, 08 Sep 2006 10:04:05 +0100
Subject: [PATCH] ext2: zero freed blocks
In-Reply-To: <200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
Message-ID: <200609080904.k88945DE015112@tiffany.internal.tigress.co.uk>

Add a zerofree mount option to the ext2 filesystem.  This causes freed
blocks to be filled with zeros.

ext2_zero_blocks has an additional argument to specify whether or not
zeroing is required:  there's no point in zeroing blocks that have
just come from the free list.

Some rerrangement of code in xattr.c is required to ensure that
ext2_zero_blocks is never called with a locked buffer.

Signed-off-by: Ron Yorston <rmy at tigress.co.uk>

---

--- linux-2.6.17/Documentation/filesystems/ext2.txt.zerofree2	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/Documentation/filesystems/ext2.txt	2006-08-25 20:08:06.000000000 +0100
@@ -58,6 +58,8 @@ nobh				Do not attach buffer_heads to fi
 
 xip				Use execute in place (no caching) if possible
 
+zerofree			Zero data blocks when they are freed.
+
 grpquota,noquota,quota,usrquota	Quota options are silently ignored by ext2.
 
 
--- linux-2.6.17/fs/ext2/balloc.c.zerofree2	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/fs/ext2/balloc.c	2006-08-25 20:08:26.000000000 +0100
@@ -174,9 +174,28 @@ static void group_release_blocks(struct 
 	}
 }
 
+static void ext2_zero_blocks(struct super_block *sb, unsigned long block,
+		unsigned long count)
+{
+	unsigned long i;
+	struct buffer_head * bh;
+
+	for (i = 0; i < count; i++) {
+		bh = sb_getblk(sb, block+i);
+		if (!bh)
+			continue;
+
+		lock_buffer(bh);
+		memset(bh->b_data, 0, bh->b_size);
+		mark_buffer_dirty(bh);
+		unlock_buffer(bh);
+		brelse(bh);
+	}
+}
+
 /* Free given blocks, update quota and i_blocks field */
 void ext2_free_blocks (struct inode * inode, unsigned long block,
-		       unsigned long count)
+		       unsigned long count, int zero)
 {
 	struct buffer_head *bitmap_bh = NULL;
 	struct buffer_head * bh2;
@@ -201,6 +220,9 @@ void ext2_free_blocks (struct inode * in
 
 	ext2_debug ("freeing block(s) %lu-%lu\n", block, block + count - 1);
 
+	if (test_opt(sb, ZEROFREE) && zero)
+		ext2_zero_blocks(sb, block, count);
+
 do_more:
 	overflow = 0;
 	block_group = (block - le32_to_cpu(es->s_first_data_block)) /
--- linux-2.6.17/fs/ext2/super.c.zerofree2	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/fs/ext2/super.c	2006-08-25 20:08:06.000000000 +0100
@@ -287,7 +287,7 @@ enum {
 	Opt_err_ro, Opt_nouid32, Opt_nocheck, Opt_debug,
 	Opt_oldalloc, Opt_orlov, Opt_nobh, Opt_user_xattr, Opt_nouser_xattr,
 	Opt_acl, Opt_noacl, Opt_xip, Opt_ignore, Opt_err, Opt_quota,
-	Opt_usrquota, Opt_grpquota
+	Opt_usrquota, Opt_grpquota, Opt_zerofree
 };
 
 static match_table_t tokens = {
@@ -310,6 +310,7 @@ static match_table_t tokens = {
 	{Opt_oldalloc, "oldalloc"},
 	{Opt_orlov, "orlov"},
 	{Opt_nobh, "nobh"},
+	{Opt_zerofree, "zerofree"},
 	{Opt_user_xattr, "user_xattr"},
 	{Opt_nouser_xattr, "nouser_xattr"},
 	{Opt_acl, "acl"},
@@ -393,6 +394,9 @@ static int parse_options (char * options
 		case Opt_nobh:
 			set_opt (sbi->s_mount_opt, NOBH);
 			break;
+		case Opt_zerofree:
+			set_opt (sbi->s_mount_opt, ZEROFREE);
+			break;
 #ifdef CONFIG_EXT2_FS_XATTR
 		case Opt_user_xattr:
 			set_opt (sbi->s_mount_opt, XATTR_USER);
--- linux-2.6.17/fs/ext2/xattr.c.zerofree2	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/fs/ext2/xattr.c	2006-08-25 20:08:06.000000000 +0100
@@ -676,7 +676,7 @@ ext2_xattr_set2(struct inode *inode, str
 
 			new_bh = sb_getblk(sb, block);
 			if (!new_bh) {
-				ext2_free_blocks(inode, block, 1);
+				ext2_free_blocks(inode, block, 1, 0);
 				error = -EIO;
 				goto cleanup;
 			}
@@ -715,25 +715,26 @@ ext2_xattr_set2(struct inode *inode, str
 
 	error = 0;
 	if (old_bh && old_bh != new_bh) {
+		unsigned long block = old_bh->b_blocknr;
 		struct mb_cache_entry *ce;
 
 		/*
 		 * If there was an old block and we are no longer using it,
 		 * release the old block.
 		 */
-		ce = mb_cache_entry_get(ext2_xattr_cache, old_bh->b_bdev,
-					old_bh->b_blocknr);
+		ce = mb_cache_entry_get(ext2_xattr_cache, old_bh->b_bdev, block);
 		lock_buffer(old_bh);
 		if (HDR(old_bh)->h_refcount == cpu_to_le32(1)) {
 			/* Free the old block. */
 			if (ce)
 				mb_cache_entry_free(ce);
 			ea_bdebug(old_bh, "freeing");
-			ext2_free_blocks(inode, old_bh->b_blocknr, 1);
+			unlock_buffer(old_bh);
 			/* We let our caller release old_bh, so we
 			 * need to duplicate the buffer before. */
 			get_bh(old_bh);
 			bforget(old_bh);
+			ext2_free_blocks(inode, block, 1, 1);
 		} else {
 			/* Decrement the refcount only. */
 			HDR(old_bh)->h_refcount = cpu_to_le32(
@@ -744,8 +745,8 @@ ext2_xattr_set2(struct inode *inode, str
 			mark_buffer_dirty(old_bh);
 			ea_bdebug(old_bh, "refcount now=%d",
 				le32_to_cpu(HDR(old_bh)->h_refcount));
+			unlock_buffer(old_bh);
 		}
-		unlock_buffer(old_bh);
 	}
 
 cleanup:
@@ -789,10 +790,10 @@ ext2_xattr_delete_inode(struct inode *in
 	if (HDR(bh)->h_refcount == cpu_to_le32(1)) {
 		if (ce)
 			mb_cache_entry_free(ce);
-		ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1);
+		unlock_buffer(bh);
 		get_bh(bh);
 		bforget(bh);
-		unlock_buffer(bh);
+		ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1, 1);
 	} else {
 		HDR(bh)->h_refcount = cpu_to_le32(
 			le32_to_cpu(HDR(bh)->h_refcount) - 1);
--- linux-2.6.17/fs/ext2/inode.c.zerofree2	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/fs/ext2/inode.c	2006-08-25 20:08:06.000000000 +0100
@@ -100,7 +100,7 @@ void ext2_discard_prealloc (struct inode
 		ei->i_prealloc_count = 0;
 		ei->i_prealloc_block = 0;
 		write_unlock(&ei->i_meta_lock);
-		ext2_free_blocks (inode, block, total);
+		ext2_free_blocks (inode, block, total, 0);
 		return;
 	} else
 		write_unlock(&ei->i_meta_lock);
@@ -467,7 +467,7 @@ static int ext2_alloc_branch(struct inod
 	for (i = 1; i < n; i++)
 		bforget(branch[i].bh);
 	for (i = 0; i < n; i++)
-		ext2_free_blocks(inode, le32_to_cpu(branch[i].key), 1);
+		ext2_free_blocks(inode, le32_to_cpu(branch[i].key), 1, 0);
 	return err;
 }
 
@@ -527,7 +527,7 @@ changed:
 	for (i = 1; i < num; i++)
 		bforget(where[i].bh);
 	for (i = 0; i < num; i++)
-		ext2_free_blocks(inode, le32_to_cpu(where[i].key), 1);
+		ext2_free_blocks(inode, le32_to_cpu(where[i].key), 1, 1);
 	return -EAGAIN;
 }
 
@@ -837,7 +837,7 @@ static inline void ext2_free_data(struct
 				count++;
 			else {
 				mark_inode_dirty(inode);
-				ext2_free_blocks (inode, block_to_free, count);
+				ext2_free_blocks (inode, block_to_free, count, 1);
 			free_this:
 				block_to_free = nr;
 				count = 1;
@@ -846,7 +846,7 @@ static inline void ext2_free_data(struct
 	}
 	if (count > 0) {
 		mark_inode_dirty(inode);
-		ext2_free_blocks (inode, block_to_free, count);
+		ext2_free_blocks (inode, block_to_free, count, 1);
 	}
 }
 
@@ -889,7 +889,7 @@ static void ext2_free_branches(struct in
 					   (__le32*)bh->b_data + addr_per_block,
 					   depth);
 			bforget(bh);
-			ext2_free_blocks(inode, nr, 1);
+			ext2_free_blocks(inode, nr, 1, 1);
 			mark_inode_dirty(inode);
 		}
 	} else
--- linux-2.6.17/fs/ext2/ext2.h.zerofree2	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/fs/ext2/ext2.h	2006-08-25 20:08:06.000000000 +0100
@@ -94,7 +94,7 @@ extern unsigned long ext2_bg_num_gdb(str
 extern int ext2_new_block (struct inode *, unsigned long,
 			   __u32 *, __u32 *, int *);
 extern void ext2_free_blocks (struct inode *, unsigned long,
-			      unsigned long);
+			      unsigned long, int);
 extern unsigned long ext2_count_free_blocks (struct super_block *);
 extern unsigned long ext2_count_dirs (struct super_block *);
 extern void ext2_check_blocks_bitmap (struct super_block *);
--- linux-2.6.17/include/linux/ext2_fs.h.zerofree2	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/include/linux/ext2_fs.h	2006-08-25 20:08:06.000000000 +0100
@@ -310,6 +310,7 @@ struct ext2_inode {
 #define EXT2_MOUNT_MINIX_DF		0x000080  /* Mimics the Minix statfs */
 #define EXT2_MOUNT_NOBH			0x000100  /* No buffer_heads */
 #define EXT2_MOUNT_NO_UID32		0x000200  /* Disable 32-bit UIDs */
+#define EXT2_MOUNT_ZEROFREE		0x000400  /* Zero freed blocks */
 #define EXT2_MOUNT_XATTR_USER		0x004000  /* Extended user attributes */
 #define EXT2_MOUNT_POSIX_ACL		0x008000  /* POSIX Access Control Lists */
 #define EXT2_MOUNT_XIP			0x010000  /* Execute in place */


From rmy at tigress.co.uk  Fri Sep  8 09:04:53 2006
From: rmy at tigress.co.uk (Ron Yorston)
Date: Fri, 08 Sep 2006 10:04:53 +0100
Subject: [PATCH] ext3: zero freed blocks
In-Reply-To: <200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
Message-ID: <200609080904.k8894raH015117@tiffany.internal.tigress.co.uk>

Add a zerofree mount option to the ext3 filesystem.  This causes freed
blocks to be filled with zeros.

Zeroing is only applied to data blocks, not metadata.  This means that
directory blocks and extended attributes are not zeroed.

Signed-off-by; Ron Yorston <rmy at tigress.co.uk>

---

--- linux-2.6.17/Documentation/filesystems/ext3.txt.zerofree3	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/Documentation/filesystems/ext3.txt	2006-08-26 19:07:34.000000000 +0100
@@ -113,6 +113,8 @@ noquota
 grpquota
 usrquota
 
+zerofree		Zero data blocks when they are freed.
+
 
 Specification
 =============
--- linux-2.6.17/fs/ext3/super.c.zerofree3	2006-08-26 19:06:57.000000000 +0100
+++ linux-2.6.17/fs/ext3/super.c	2006-08-26 19:07:34.000000000 +0100
@@ -675,7 +675,7 @@ enum {
 	Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
 	Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
 	Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
-	Opt_grpquota
+	Opt_grpquota, Opt_zerofree
 };
 
 static match_table_t tokens = {
@@ -720,6 +720,7 @@ static match_table_t tokens = {
 	{Opt_jqfmt_vfsold, "jqfmt=vfsold"},
 	{Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
 	{Opt_grpquota, "grpquota"},
+	{Opt_zerofree, "zerofree"},
 	{Opt_noquota, "noquota"},
 	{Opt_quota, "quota"},
 	{Opt_usrquota, "usrquota"},
@@ -1052,6 +1053,9 @@ clear_qf_name:
 		case Opt_nobh:
 			set_opt(sbi->s_mount_opt, NOBH);
 			break;
+		case Opt_zerofree:
+			set_opt(sbi->s_mount_opt, ZEROFREE);
+			break;
 		default:
 			printk (KERN_ERR
 				"EXT3-fs: Unrecognized mount option \"%s\" "
--- linux-2.6.17/fs/ext3/balloc.c.zerofree3	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/fs/ext3/balloc.c	2006-08-26 19:08:14.000000000 +0100
@@ -491,9 +491,28 @@ error_return:
 	return;
 }
 
+static void ext3_zero_blocks(struct super_block *sb, unsigned long block,
+		unsigned long count)
+{
+	unsigned long i;
+	struct buffer_head *bh;
+
+	for (i = 0; i < count; i++) {
+		bh = sb_getblk(sb, block+i);
+		if (!bh)
+			continue;
+
+		lock_buffer(bh) ;
+		memset(bh->b_data, 0, bh->b_size);
+		mark_buffer_dirty(bh);
+		unlock_buffer(bh) ;
+		brelse(bh);
+	}
+}
+
 /* Free given blocks, update quota and i_blocks field */
 void ext3_free_blocks(handle_t *handle, struct inode *inode,
-			unsigned long block, unsigned long count)
+			unsigned long block, unsigned long count, int zero)
 {
 	struct super_block * sb;
 	int dquot_freed_blocks;
@@ -503,6 +522,8 @@ void ext3_free_blocks(handle_t *handle, 
 		printk ("ext3_free_blocks: nonexistent device");
 		return;
 	}
+	if (test_opt(sb, ZEROFREE) && zero && !ext3_should_journal_data(inode))
+		ext3_zero_blocks(sb, block, count);
 	ext3_free_blocks_sb(handle, sb, block, count, &dquot_freed_blocks);
 	if (dquot_freed_blocks)
 		DQUOT_FREE_BLOCK(inode, dquot_freed_blocks);
--- linux-2.6.17/fs/ext3/inode.c.zerofree3	2006-08-26 19:06:57.000000000 +0100
+++ linux-2.6.17/fs/ext3/inode.c	2006-08-26 19:07:34.000000000 +0100
@@ -562,7 +562,7 @@ static int ext3_alloc_blocks(handle_t *h
 	return ret;
 failed_out:
 	for (i = 0; i <index; i++)
-		ext3_free_blocks(handle, inode, new_blocks[i], 1);
+		ext3_free_blocks(handle, inode, new_blocks[i], 1, 0);
 	return ret;
 }
 
@@ -661,9 +661,9 @@ failed:
 		ext3_journal_forget(handle, branch[i].bh);
 	}
 	for (i = 0; i <indirect_blks; i++)
-		ext3_free_blocks(handle, inode, new_blocks[i], 1);
+		ext3_free_blocks(handle, inode, new_blocks[i], 1, 0);
 
-	ext3_free_blocks(handle, inode, new_blocks[i], num);
+	ext3_free_blocks(handle, inode, new_blocks[i], num, 0);
 
 	return err;
 }
@@ -760,9 +760,9 @@ err_out:
 	for (i = 1; i <= num; i++) {
 		BUFFER_TRACE(where[i].bh, "call journal_forget");
 		ext3_journal_forget(handle, where[i].bh);
-		ext3_free_blocks(handle,inode,le32_to_cpu(where[i-1].key),1);
+		ext3_free_blocks(handle,inode,le32_to_cpu(where[i-1].key),1,0);
 	}
-	ext3_free_blocks(handle, inode, le32_to_cpu(where[num].key), blks);
+	ext3_free_blocks(handle, inode, le32_to_cpu(where[num].key), blks, 0);
 
 	return err;
 }
@@ -1996,7 +1996,7 @@ static void ext3_clear_blocks(handle_t *
 		}
 	}
 
-	ext3_free_blocks(handle, inode, block_to_free, count);
+	ext3_free_blocks(handle, inode, block_to_free, count, 1);
 }
 
 /**
@@ -2169,7 +2169,7 @@ static void ext3_free_branches(handle_t 
 				ext3_journal_test_restart(handle, inode);
 			}
 
-			ext3_free_blocks(handle, inode, nr, 1);
+			ext3_free_blocks(handle, inode, nr, 1, 0);
 
 			if (parent_bh) {
 				/*
--- linux-2.6.17/fs/ext3/xattr.c.zerofree3	2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17/fs/ext3/xattr.c	2006-08-26 19:07:34.000000000 +0100
@@ -484,7 +484,7 @@ ext3_xattr_release_block(handle_t *handl
 		ea_bdebug(bh, "refcount now=0; freeing");
 		if (ce)
 			mb_cache_entry_free(ce);
-		ext3_free_blocks(handle, inode, bh->b_blocknr, 1);
+		ext3_free_blocks(handle, inode, bh->b_blocknr, 1, 0);
 		get_bh(bh);
 		ext3_forget(handle, 1, inode, bh, bh->b_blocknr);
 	} else {
@@ -804,7 +804,7 @@ inserted:
 			new_bh = sb_getblk(sb, block);
 			if (!new_bh) {
 getblk_failed:
-				ext3_free_blocks(handle, inode, block, 1);
+				ext3_free_blocks(handle, inode, block, 1, 0);
 				error = -EIO;
 				goto cleanup;
 			}
--- linux-2.6.17/include/linux/ext3_fs.h.zerofree3	2006-08-26 19:06:57.000000000 +0100
+++ linux-2.6.17/include/linux/ext3_fs.h	2006-08-26 19:07:34.000000000 +0100
@@ -376,6 +376,7 @@ struct ext3_inode {
 #define EXT3_MOUNT_QUOTA		0x80000 /* Some quota option set */
 #define EXT3_MOUNT_USRQUOTA		0x100000 /* "old" user quota */
 #define EXT3_MOUNT_GRPQUOTA		0x200000 /* "old" group quota */
+#define EXT3_MOUNT_ZEROFREE		0x400000 /* Zero freed blocks */
 
 /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */
 #ifndef _LINUX_EXT2_FS_H
@@ -745,7 +746,7 @@ extern int ext3_new_block (handle_t *, s
 extern int ext3_new_blocks (handle_t *, struct inode *, unsigned long,
 			unsigned long *, int *);
 extern void ext3_free_blocks (handle_t *, struct inode *, unsigned long,
-			      unsigned long);
+			      unsigned long, int);
 extern void ext3_free_blocks_sb (handle_t *, struct super_block *,
 				 unsigned long, unsigned long, int *);
 extern unsigned long ext3_count_free_blocks (struct super_block *);


From mind at bi.lt  Fri Sep  8 10:51:08 2006
From: mind at bi.lt (Mindaugas)
Date: Fri, 8 Sep 2006 13:51:08 +0300
Subject: wiping of unused space on ext3
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
Message-ID: <044401c6d334$b1108d90$f20214ac@bite.lt>


> "Mindaugas" <mind at bi.lt> wrote:
>>  I was asked if it is possible to zero unused space in ext3 partition?
> 
> I've been meaning to submit these to LKML, but since you ask let's give
> them an airing here first.
> 
> Since this is being copied to the CentOS mailing list I should point
> out that I also have versions of the patches that apply cleanly to the
> RHEL 4 kernel.  I don't have them to hand at the moment but if there's
> any interest I can provide them later.

  Thank you for the answer. Just this request is suspended now so I
don't know if I will need those patches anymore.
  
  In case I will need them I will ask you for the RHEL4 version. :)

  Mindaugas


From richard.c.wolber at boeing.com  Fri Sep  8 13:53:26 2006
From: richard.c.wolber at boeing.com (Wolber, Richard C)
Date: Fri, 8 Sep 2006 06:53:26 -0700
Subject: wiping of unused space on ext3
In-Reply-To: <Pine.GSO.4.44L.0609071640050.12664-100000@unix1.cc.ksu.edu>
Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB9901324182@XCH-NW-5V2.nw.nos.boeing.com>

> -----Original Message-----
> From: Matt Stegman [mailto:matts at ksu.edu] 
> Sent: Thursday, September 07, 2006 3:40 PM
> To: Wolber, Richard C
> Cc: Andreas Dilger; Mindaugas; ext3-users at redhat.com; CentOS 
> mailing list
> Subject: RE: wiping of unused space on ext3
> 
> Well, the manpage does say:
> 
> BUGS AND LIMITATIONS
>        The `c', 's', and `u' attributes are not honored by 
> the ext2 and ext3 filesystems  as implemented  in the current  
> mainline Linux kernels.  These attributes may be implemented in 
> future versions ext2 and ext3.


Doh! Thanks for the cluestick!

..Chuck..


From rmy at tigress.co.uk  Fri Sep  8 18:13:33 2006
From: rmy at tigress.co.uk (Ron Yorston)
Date: Fri, 08 Sep 2006 19:13:33 +0100
Subject: wiping of unused space on ext3
In-Reply-To: <044401c6d334$b1108d90$f20214ac@bite.lt>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
	<044401c6d334$b1108d90$f20214ac@bite.lt>
Message-ID: <200609081813.k88IDX3W015821@tiffany.internal.tigress.co.uk>

Here we are:  RHEL 4 versions of my zerofree patches.  I add these to
the kernel spec file at about Patch6000.

Ron

-------------- next part --------------
--- linux-2.6.9/include/linux/ext2_fs.h.zerofree2	2004-10-18 22:53:21.000000000 +0100
+++ linux-2.6.9/include/linux/ext2_fs.h	2006-08-29 19:49:10.000000000 +0100
@@ -310,6 +310,7 @@ struct ext2_inode {
 #define EXT2_MOUNT_MINIX_DF		0x0080	/* Mimics the Minix statfs */
 #define EXT2_MOUNT_NOBH			0x0100	/* No buffer_heads */
 #define EXT2_MOUNT_NO_UID32		0x0200  /* Disable 32-bit UIDs */
+#define EXT2_MOUNT_ZEROFREE		0x0400	/* Zero freed blocks */
 #define EXT2_MOUNT_XATTR_USER		0x4000	/* Extended user attributes */
 #define EXT2_MOUNT_POSIX_ACL		0x8000	/* POSIX Access Control Lists */
 
--- linux-2.6.9/fs/ext2/balloc.c.zerofree2	2004-10-18 22:53:51.000000000 +0100
+++ linux-2.6.9/fs/ext2/balloc.c	2006-08-29 19:46:35.000000000 +0100
@@ -173,9 +173,28 @@ static void group_release_blocks(struct 
 	}
 }
 
+static void ext2_zero_blocks(struct super_block *sb, unsigned long block,
+		unsigned long count)
+{
+	unsigned long i;
+	struct buffer_head * bh;
+
+	for (i = 0; i < count; i++) {
+		bh = sb_getblk(sb, block+i);
+		if (!bh)
+			continue;
+
+		lock_buffer(bh);
+		memset(bh->b_data, 0, bh->b_size);
+		mark_buffer_dirty(bh);
+		unlock_buffer(bh);
+		brelse(bh);
+	}
+}
+
 /* Free given blocks, update quota and i_blocks field */
 void ext2_free_blocks (struct inode * inode, unsigned long block,
-		       unsigned long count)
+		       unsigned long count, int zero)
 {
 	struct buffer_head *bitmap_bh = NULL;
 	struct buffer_head * bh2;
@@ -200,6 +219,9 @@ void ext2_free_blocks (struct inode * in
 
 	ext2_debug ("freeing block(s) %lu-%lu\n", block, block + count - 1);
 
+	if (test_opt(sb, ZEROFREE) && zero)
+		ext2_zero_blocks(sb, block, count);
+
 do_more:
 	overflow = 0;
 	block_group = (block - le32_to_cpu(es->s_first_data_block)) /
--- linux-2.6.9/fs/ext2/super.c.zerofree2	2006-08-29 19:44:53.000000000 +0100
+++ linux-2.6.9/fs/ext2/super.c	2006-08-29 19:54:05.000000000 +0100
@@ -293,7 +293,7 @@ enum {
 	Opt_bsd_df, Opt_minix_df, Opt_grpid, Opt_nogrpid,
 	Opt_resgid, Opt_resuid, Opt_sb, Opt_err_cont, Opt_err_panic, Opt_err_ro,
 	Opt_nouid32, Opt_check, Opt_nocheck, Opt_debug, Opt_oldalloc, Opt_orlov, Opt_nobh,
-	Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl,
+	Opt_zerofree, Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl,
 	Opt_ignore, Opt_err,
 };
 
@@ -318,6 +318,7 @@ static match_table_t tokens = {
 	{Opt_oldalloc, "oldalloc"},
 	{Opt_orlov, "orlov"},
 	{Opt_nobh, "nobh"},
+	{Opt_zerofree, "zerofree"},
 	{Opt_user_xattr, "user_xattr"},
 	{Opt_nouser_xattr, "nouser_xattr"},
 	{Opt_acl, "acl"},
@@ -407,6 +408,9 @@ static int parse_options (char * options
 		case Opt_nobh:
 			set_opt (sbi->s_mount_opt, NOBH);
 			break;
+		case Opt_zerofree:
+			set_opt (sbi->s_mount_opt, ZEROFREE);
+			break;
 #ifdef CONFIG_EXT2_FS_XATTR
 		case Opt_user_xattr:
 			set_opt (sbi->s_mount_opt, XATTR_USER);
--- linux-2.6.9/fs/ext2/xattr.c.zerofree2	2006-08-29 19:40:46.000000000 +0100
+++ linux-2.6.9/fs/ext2/xattr.c	2006-08-29 19:55:25.000000000 +0100
@@ -679,7 +679,7 @@ ext2_xattr_set2(struct inode *inode, str
 
 			new_bh = sb_getblk(sb, block);
 			if (!new_bh) {
-				ext2_free_blocks(inode, block, 1);
+				ext2_free_blocks(inode, block, 1, 0);
 				error = -EIO;
 				goto cleanup;
 			}
@@ -712,24 +712,25 @@ ext2_xattr_set2(struct inode *inode, str
 
 	error = 0;
 	if (old_bh && old_bh != new_bh) {
+		unsigned long block = old_bh->b_blocknr;
 		struct mb_cache_entry *ce;
 		/*
 		 * If there was an old block and we are no longer using it,
 		 * release the old block.
 		 */
-		ce = mb_cache_entry_get(ext2_xattr_cache, old_bh->b_bdev,
-					old_bh->b_blocknr);
+		ce = mb_cache_entry_get(ext2_xattr_cache, old_bh->b_bdev, block);
 		lock_buffer(old_bh);
 		if (HDR(old_bh)->h_refcount == cpu_to_le32(1)) {
 			/* Free the old block. */
 			if (ce)
 				mb_cache_entry_free(ce);
 			ea_bdebug(old_bh, "freeing");
-			ext2_free_blocks(inode, old_bh->b_blocknr, 1);
+			unlock_buffer(old_bh);
 			/* We let our caller release old_bh, so we
 			 * need to duplicate the buffer before. */
 			get_bh(old_bh);
 			bforget(old_bh);
+			ext2_free_blocks(inode, block, 1, 1);
 		} else {
 			/* Decrement the refcount only. */
 			if (ce)
@@ -740,8 +741,8 @@ ext2_xattr_set2(struct inode *inode, str
 			mark_buffer_dirty(old_bh);
 			ea_bdebug(old_bh, "refcount now=%d",
 				le32_to_cpu(HDR(old_bh)->h_refcount));
+			unlock_buffer(old_bh);
 		}
-		unlock_buffer(old_bh);
 	}
 
 cleanup:
@@ -786,10 +787,10 @@ ext2_xattr_delete_inode(struct inode *in
 	if (HDR(bh)->h_refcount == cpu_to_le32(1)) {
 		if (ce)
 			mb_cache_entry_free(ce);
-		ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1);
+		unlock_buffer(bh);
 		get_bh(bh);
 		bforget(bh);
-		unlock_buffer(bh);
+		ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1, 1);
 	} else {
 		if (ce)
 			mb_cache_entry_release(ce);
--- linux-2.6.9/fs/ext2/inode.c.zerofree2	2006-08-29 19:44:53.000000000 +0100
+++ linux-2.6.9/fs/ext2/inode.c	2006-08-29 19:46:35.000000000 +0100
@@ -99,7 +99,7 @@ void ext2_discard_prealloc (struct inode
 		ei->i_prealloc_count = 0;
 		ei->i_prealloc_block = 0;
 		write_unlock(&ei->i_meta_lock);
-		ext2_free_blocks (inode, block, total);
+		ext2_free_blocks (inode, block, total, 0);
 		return;
 	} else
 		write_unlock(&ei->i_meta_lock);
@@ -462,7 +462,7 @@ static int ext2_alloc_branch(struct inod
 	for (i = 1; i < n; i++)
 		bforget(branch[i].bh);
 	for (i = 0; i < n; i++)
-		ext2_free_blocks(inode, le32_to_cpu(branch[i].key), 1);
+		ext2_free_blocks(inode, le32_to_cpu(branch[i].key), 1, 0);
 	return err;
 }
 
@@ -522,7 +522,7 @@ changed:
 	for (i = 1; i < num; i++)
 		bforget(where[i].bh);
 	for (i = 0; i < num; i++)
-		ext2_free_blocks(inode, le32_to_cpu(where[i].key), 1);
+		ext2_free_blocks(inode, le32_to_cpu(where[i].key), 1, 1);
 	return -EAGAIN;
 }
 
@@ -821,7 +821,7 @@ static inline void ext2_free_data(struct
 				count++;
 			else {
 				mark_inode_dirty(inode);
-				ext2_free_blocks (inode, block_to_free, count);
+				ext2_free_blocks (inode, block_to_free, count, 1);
 			free_this:
 				block_to_free = nr;
 				count = 1;
@@ -830,7 +830,7 @@ static inline void ext2_free_data(struct
 	}
 	if (count > 0) {
 		mark_inode_dirty(inode);
-		ext2_free_blocks (inode, block_to_free, count);
+		ext2_free_blocks (inode, block_to_free, count, 1);
 	}
 }
 
@@ -873,7 +873,7 @@ static void ext2_free_branches(struct in
 					   (__le32*)bh->b_data + addr_per_block,
 					   depth);
 			bforget(bh);
-			ext2_free_blocks(inode, nr, 1);
+			ext2_free_blocks(inode, nr, 1, 1);
 			mark_inode_dirty(inode);
 		}
 	} else
--- linux-2.6.9/fs/ext2/ext2.h.zerofree2	2006-08-29 19:44:53.000000000 +0100
+++ linux-2.6.9/fs/ext2/ext2.h	2006-08-29 19:46:35.000000000 +0100
@@ -85,7 +85,7 @@ extern unsigned long ext2_bg_num_gdb(str
 extern int ext2_new_block (struct inode *, unsigned long,
 			   __u32 *, __u32 *, int *);
 extern void ext2_free_blocks (struct inode *, unsigned long,
-			      unsigned long);
+			      unsigned long, int);
 extern unsigned long ext2_count_free_blocks (struct super_block *);
 extern unsigned long ext2_count_dirs (struct super_block *);
 extern void ext2_check_blocks_bitmap (struct super_block *);
--- linux-2.6.9/Documentation/filesystems/ext2.txt.zerofree2	2004-10-18 22:53:43.000000000 +0100
+++ linux-2.6.9/Documentation/filesystems/ext2.txt	2006-08-29 19:46:35.000000000 +0100
@@ -62,6 +62,8 @@ resgid=n			The group ID which may use th
 
 sb=n				Use alternate superblock at this location.
 
+zerofree			Zero data blocks when they are freed.
+
 grpquota,noquota,quota,usrquota	Quota options are silently ignored by ext2.
 
 
-------------- next part --------------
--- linux-2.6.9/include/linux/ext3_fs.h.zerofree3	2006-08-30 20:44:40.000000000 +0100
+++ linux-2.6.9/include/linux/ext3_fs.h	2006-08-30 20:47:19.000000000 +0100
@@ -355,6 +355,7 @@ struct ext3_inode {
 #define EXT3_MOUNT_POSIX_ACL		0x08000	/* POSIX Access Control Lists */
 #define EXT3_MOUNT_BARRIER		0x10000 /* Use block barriers */
 #define EXT3_MOUNT_RESERVATION		0x20000	/* Preallocation */
+#define EXT3_MOUNT_ZEROFREE		0x40000 /* Zero freed blocks */
 
 /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */
 #ifndef _LINUX_EXT2_FS_H
@@ -713,7 +714,7 @@ extern int ext3_bg_has_super(struct supe
 extern unsigned long ext3_bg_num_gdb(struct super_block *sb, int group);
 extern int ext3_new_block (handle_t *, struct inode *, unsigned long, int *);
 extern void ext3_free_blocks (handle_t *, struct inode *, unsigned long,
-			      unsigned long);
+			      unsigned long, int);
 extern void ext3_free_blocks_sb (handle_t *, struct super_block *,
 				 unsigned long, unsigned long, int *);
 extern unsigned long ext3_count_free_blocks (struct super_block *);
--- linux-2.6.9/fs/ext3/super.c.zerofree3	2006-08-30 20:45:30.000000000 +0100
+++ linux-2.6.9/fs/ext3/super.c	2006-08-30 20:47:19.000000000 +0100
@@ -631,7 +631,7 @@ enum {
 	Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
 	Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
 	Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0,
-	Opt_ignore, Opt_barrier, Opt_err, Opt_resize,
+	Opt_zerofree, Opt_ignore, Opt_barrier, Opt_err, Opt_resize,
 };
 
 static match_table_t tokens = {
@@ -674,6 +674,7 @@ static match_table_t tokens = {
 	{Opt_grpjquota, "grpjquota=%s"},
 	{Opt_jqfmt_vfsold, "jqfmt=vfsold"},
 	{Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
+	{Opt_zerofree, "zerofree"},
 	{Opt_ignore, "grpquota"},
 	{Opt_ignore, "noquota"},
 	{Opt_ignore, "quota"},
@@ -970,6 +971,9 @@ clear_qf_name:
 			match_int(&args[0], &option);
 			*n_blocks_count = option;
 			break;
+		case Opt_zerofree:
+			set_opt(sbi->s_mount_opt, ZEROFREE);
+			break;
 		default:
 			printk (KERN_ERR
 				"EXT3-fs: Unrecognized mount option \"%s\" "
--- linux-2.6.9/fs/ext3/balloc.c.zerofree3	2006-08-30 20:44:40.000000000 +0100
+++ linux-2.6.9/fs/ext3/balloc.c	2006-08-30 20:47:19.000000000 +0100
@@ -451,9 +451,28 @@ error_return:
 	return;
 }
 
+static void ext3_zero_blocks(struct super_block *sb, unsigned long block,
+		unsigned long count)
+{
+	unsigned long i;
+	struct buffer_head *bh;
+
+	for (i = 0; i < count; i++) {
+		bh = sb_getblk(sb, block+i);
+		if (!bh)
+			continue;
+
+		lock_buffer(bh) ;
+		memset(bh->b_data, 0, bh->b_size);
+		mark_buffer_dirty(bh);
+		unlock_buffer(bh) ;
+		brelse(bh);
+	}
+}
+
 /* Free given blocks, update quota and i_blocks field */
 void ext3_free_blocks(handle_t *handle, struct inode *inode,
-			unsigned long block, unsigned long count)
+			unsigned long block, unsigned long count, int zero)
 {
 	struct super_block * sb;
 	int dquot_freed_blocks;
@@ -463,6 +482,8 @@ void ext3_free_blocks(handle_t *handle, 
 		printk ("ext3_free_blocks: nonexistent device");
 		return;
 	}
+	if (test_opt(sb, ZEROFREE) && zero && !ext3_should_journal_data(inode))
+		ext3_zero_blocks(sb, block, count);
 	ext3_free_blocks_sb(handle, sb, block, count, &dquot_freed_blocks);
 	if (dquot_freed_blocks)
 		DQUOT_FREE_BLOCK(inode, dquot_freed_blocks);
--- linux-2.6.9/fs/ext3/inode.c.zerofree3	2006-08-30 20:44:40.000000000 +0100
+++ linux-2.6.9/fs/ext3/inode.c	2006-08-30 20:47:19.000000000 +0100
@@ -571,7 +571,7 @@ static int ext3_alloc_branch(handle_t *h
 		ext3_journal_forget(handle, branch[i].bh);
 	}
 	for (i = 0; i < keys; i++)
-		ext3_free_blocks(handle, inode, le32_to_cpu(branch[i].key), 1);
+		ext3_free_blocks(handle, inode, le32_to_cpu(branch[i].key), 1, 0);
 	return err;
 }
 
@@ -672,7 +672,7 @@ err_out:
 	if (err == -EAGAIN)
 		for (i = 0; i < num; i++)
 			ext3_free_blocks(handle, inode, 
-					 le32_to_cpu(where[i].key), 1);
+					 le32_to_cpu(where[i].key), 1, 0);
 	return err;
 }
 
@@ -1819,7 +1819,7 @@ ext3_clear_blocks(handle_t *handle, stru
 		}
 	}
 
-	ext3_free_blocks(handle, inode, block_to_free, count);
+	ext3_free_blocks(handle, inode, block_to_free, count, 1);
 }
 
 /**
@@ -1992,7 +1992,7 @@ static void ext3_free_branches(handle_t 
 				ext3_journal_test_restart(handle, inode);
 			}
 
-			ext3_free_blocks(handle, inode, nr, 1);
+			ext3_free_blocks(handle, inode, nr, 1, 0);
 
 			if (parent_bh) {
 				/*
--- linux-2.6.9/fs/ext3/xattr.c.zerofree3	2006-08-30 20:45:00.000000000 +0100
+++ linux-2.6.9/fs/ext3/xattr.c	2006-08-30 20:48:06.000000000 +0100
@@ -699,7 +699,7 @@ ext3_xattr_set_handle2(handle_t *handle,
 			new_bh = sb_getblk(sb, block);
 			if (!new_bh) {
 getblk_failed:
-				ext3_free_blocks(handle, inode, block, 1);
+				ext3_free_blocks(handle, inode, block, 1, 0);
 				error = -EIO;
 				goto cleanup;
 			}
@@ -746,7 +746,7 @@ getblk_failed:
 			if (ce)
 				mb_cache_entry_free(ce);
 			ea_bdebug(old_bh, "freeing");
-			ext3_free_blocks(handle, inode, old_bh->b_blocknr, 1);
+			ext3_free_blocks(handle, inode, old_bh->b_blocknr, 1, 0);
 
 			/* ext3_forget() calls bforget() for us, but we
 			   let our caller release old_bh, so we need to
@@ -845,7 +845,7 @@ ext3_xattr_delete_inode(handle_t *handle
 	if (HDR(bh)->h_refcount == cpu_to_le32(1)) {
 		if (ce)
 			mb_cache_entry_free(ce);
-		ext3_free_blocks(handle, inode, EXT3_I(inode)->i_file_acl, 1);
+		ext3_free_blocks(handle, inode, EXT3_I(inode)->i_file_acl, 1, 0);
 		get_bh(bh);
 		ext3_forget(handle, 1, inode, bh, EXT3_I(inode)->i_file_acl);
 	} else {
--- linux-2.6.9/Documentation/filesystems/ext3.txt.zerofree3	2004-10-18 22:53:51.000000000 +0100
+++ linux-2.6.9/Documentation/filesystems/ext3.txt	2006-08-30 20:47:19.000000000 +0100
@@ -108,6 +108,8 @@ noquota			(see fs/ext3/super.c, line 594
 grpquota
 usrquota
 
+zerofree		Zero data blocks when they are freed.
+
 
 Specification
 =============

From tytso at mit.edu  Fri Sep  8 20:10:34 2006
From: tytso at mit.edu (Theodore Tso)
Date: Fri, 8 Sep 2006 16:10:34 -0400
Subject: [PATCH] ext3: zero freed blocks
In-Reply-To: <200609080904.k8894raH015117@tiffany.internal.tigress.co.uk>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
	<200609080904.k8894raH015117@tiffany.internal.tigress.co.uk>
Message-ID: <20060908201034.GA7542@thunk.org>

On Fri, Sep 08, 2006 at 10:04:53AM +0100, Ron Yorston wrote:
> Add a zerofree mount option to the ext3 filesystem.  This causes freed
> blocks to be filled with zeros.
> 
> Zeroing is only applied to data blocks, not metadata.  This means that
> directory blocks and extended attributes are not zeroed.
> 
> Signed-off-by; Ron Yorston <rmy at tigress.co.uk>
               ^ Should be a ':' character.   :-)

Ideally, this wouldn't be done as a mount-time option, but rather only
if the secure_delete flag is set on the file.  That way you don't do
it for all files, but just those that need to be zeroed.

The patch also has the potential danger that the data blocks are
getting zeroed before the transaction which contains the unlink has
committed.  There is therefore the risk that the system might crash
after the blocks have been zero'ed, but before transaction has
committed.  In that case, the file will still be there, but some or
all of its contents will be zero'ed.  

The other thing which worries me about this patch is that if the
blocks which you have zero'ed out get reallocated and used for some
other file, and then data is written into the page cache and the page
gets written to disk before the zero'ized buffers hit the disk, the
new contents of the data blocks could get written.  The reason for
this is that there is no cache coherency enforced between the page
cache and buffer cache, and so it is necessary to be very careful when
a particular block transitions between from being modified via buffer
cache versus the page cache.

Anyway, there's a reason why secure delete is a more than a little bit
tricky, and why it's never been implemented up until now.  Not that
it's impossible to do, just that it's a lot more subtle than it looks.  :-)

Regards,

						- Ted


From adilger at clusterfs.com  Sat Sep  9 00:26:28 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Fri, 8 Sep 2006 18:26:28 -0600
Subject: [PATCH] ext3: zero freed blocks
In-Reply-To: <20060908201034.GA7542@thunk.org>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
	<200609080904.k8894raH015117@tiffany.internal.tigress.co.uk>
	<20060908201034.GA7542@thunk.org>
Message-ID: <20060909002628.GO6441@schatzie.adilger.int>

On Sep 08, 2006  16:10 -0400, Theodore Tso wrote:
> Ideally, this wouldn't be done as a mount-time option, but rather only
> if the secure_delete flag is set on the file.  That way you don't do
> it for all files, but just those that need to be zeroed.

Agreed.

> The patch also has the potential danger that the data blocks are
> getting zeroed before the transaction which contains the unlink has
> committed.  There is therefore the risk that the system might crash
> after the blocks have been zero'ed, but before transaction has
> committed.  In that case, the file will still be there, but some or
> all of its contents will be zero'ed.  

That might be considered a feature.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From tytso at mit.edu  Sat Sep  9 05:09:20 2006
From: tytso at mit.edu (Theodore Tso)
Date: Sat, 9 Sep 2006 01:09:20 -0400
Subject: [PATCH] ext3: zero freed blocks
In-Reply-To: <20060909002628.GO6441@schatzie.adilger.int>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
	<200609080904.k8894raH015117@tiffany.internal.tigress.co.uk>
	<20060908201034.GA7542@thunk.org>
	<20060909002628.GO6441@schatzie.adilger.int>
Message-ID: <20060909050920.GA10849@thunk.org>

On Fri, Sep 08, 2006 at 06:26:28PM -0600, Andreas Dilger wrote:
> > The patch also has the potential danger that the data blocks are
> > getting zeroed before the transaction which contains the unlink has
> > committed.  There is therefore the risk that the system might crash
> > after the blocks have been zero'ed, but before transaction has
> > committed.  In that case, the file will still be there, but some or
> > all of its contents will be zero'ed.  
> 
> That might be considered a feature.

I don't think so.  Deletes should be atomic.  I could certainly see
programs where a file should either be deleted, or not deleted.  For a
file to be partially corrupted but not deleted could ruin an
application's consistency assumptions.

						- Ted


From rmy at tigress.co.uk  Sat Sep  9 10:36:27 2006
From: rmy at tigress.co.uk (Ron Yorston)
Date: Sat, 09 Sep 2006 11:36:27 +0100
Subject: [PATCH] ext3: zero freed blocks
In-Reply-To: <20060908201034.GA7542@thunk.org>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
	<200609080904.k8894raH015117@tiffany.internal.tigress.co.uk>
	<20060908201034.GA7542@thunk.org>
Message-ID: <200609091036.k89AaRXA016086@tiffany.internal.tigress.co.uk>

I've removed Cc: centos at centos.org, as I think we've probably outstayed
our welcome on that list.

Theodore Tso <tytso at mit.edu> wrote:
>Ideally, this wouldn't be done as a mount-time option, but rather only
>if the secure_delete flag is set on the file.  That way you don't do
>it for all files, but just those that need to be zeroed.

We can use both a mount option and the secure delete flag.  As Ted wrote
on a previous occasion this came up on LKML:

>The obvious thing to do would be to make it a mount option, so that
>(a) recompilation is not necessary in order to use the feature, and
>(b) the feature can be turned on or off on a per-filesystem feature.
>In 2.6, it's possible to specify certain mount option to be specifed
>by default on a per-filesystem basis (via a new field in the
>superblock).
>
>So if you do things that way, then secure deletion would take place
>either if the secure deletion flag is set (so it can be enabled on a
>per-file basis), or if the filesystem is mounted with the
>secure-deletion mount option. 

Personally I find a mount option much more useful than a per-file flag.

>The patch also has the potential danger that the data blocks are
>getting zeroed before the transaction which contains the unlink has
>committed.  There is therefore the risk that the system might crash
>after the blocks have been zero'ed, but before transaction has
>committed.  In that case, the file will still be there, but some or
>all of its contents will be zero'ed.  

Indeed, I was aware of this possibility, and breaking guarantees about
the atomicity of delete is a bad thing.  (Does ext2 provide any such
guarantee?)

The original patch (http://lwn.net/Articles/171924/) by Nikolai Joukov
had code to call ext3_journal_dirty_data on the data blocks, which may
have been intended to address this issue.  But I ripped it out because
it failed horribly when I tried to delete a file bigger than physical
RAM.

>The other thing which worries me about this patch is that if the
>blocks which you have zero'ed out get reallocated and used for some
>other file, and then data is written into the page cache and the page
>gets written to disk before the zero'ized buffers hit the disk, the
>new contents of the data blocks could get written.  The reason for
>this is that there is no cache coherency enforced between the page
>cache and buffer cache, and so it is necessary to be very careful when
>a particular block transitions between from being modified via buffer
>cache versus the page cache.

What are the consequences of this?  Is there any danger of the other
file being corrupted?  If not, and if our purpose is just to ensure that
the original contents of the freed blocks are destroyed, does it matter if
they're overwritten with something other than the zeroes we intended?

Ron


From tytso at mit.edu  Sat Sep  9 13:21:54 2006
From: tytso at mit.edu (Theodore Tso)
Date: Sat, 9 Sep 2006 09:21:54 -0400
Subject: [PATCH] ext3: zero freed blocks
In-Reply-To: <200609091036.k89AaRXA016086@tiffany.internal.tigress.co.uk>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
	<200609080904.k8894raH015117@tiffany.internal.tigress.co.uk>
	<20060908201034.GA7542@thunk.org>
	<200609091036.k89AaRXA016086@tiffany.internal.tigress.co.uk>
Message-ID: <20060909132154.GB24906@thunk.org>

On Sat, Sep 09, 2006 at 11:36:27AM +0100, Ron Yorston wrote:
> >The other thing which worries me about this patch is that if the
> >blocks which you have zero'ed out get reallocated and used for some
> >other file, and then data is written into the page cache and the page
> >gets written to disk before the zero'ized buffers hit the disk, the
> >new contents of the data blocks could get written.  The reason for
> >this is that there is no cache coherency enforced between the page
> >cache and buffer cache, and so it is necessary to be very careful when
> >a particular block transitions between from being modified via buffer
> >cache versus the page cache.
> 
> What are the consequences of this?  Is there any danger of the other
> file being corrupted?  If not, and if our purpose is just to ensure that
> the original contents of the freed blocks are destroyed, does it matter if
> they're overwritten with something other than the zeroes we intended?
> 

Yes, that's precisely what I'm worried about.  Specifically, if you
have this sequence of events:

1) File gets deleted; the file contents get zero'ed out via the the
buffer cache.  Since process of zeroing the files happen in the
background, for a large file, this could continue for a long time...

2) In the meantime, one or more of the disk blocks that was used by
the old file are reallocated for a new file.  The application writes
data to the new file, which is stored in the page cache.

3) The application calls fsync() and the contents of the new file are
flushed from the page cache and written to disk.

4) The dirty buffers containing the zero'ed out contents of the block
are written to disk, overwriting the contents of the new file.

5) Data is lost.

One way of solving this problem is to zero the blocks in the
foreground, and not allow the unlink to proceed until the data blocks
are overwritten.  Another way of solving the problem would be to not
allow those data blocks to be allocated until the zeroization buffers
have been written out.  Yet another way would be try to determine if
there is an outstanding buffer cache write from an attempt to zero the
free blocks, and abort the buffer cache write before doing the page
writeout.  That last would not be trivial, and would require violating
a number of abstraction boundaries...

Another question is to ask is whether or not you care that the freed
blocks might not be zero'ed if the system crashes before the buffer
cache is written out.  Currently, there is a chance that after a
system crash some deleted file blocks won't be zero'ed.  Depending on
your requirements, that might or might not be fatal, though.  

						- Ted


From tytso at mit.edu  Mon Sep 11 04:48:29 2006
From: tytso at mit.edu (Theodore Tso)
Date: Mon, 11 Sep 2006 00:48:29 -0400
Subject: how does ext3 handle no communication  to storage
In-Reply-To: <44F5ACF8.2000705@bnl.gov>
References: <44F33E3A.8020805@bnl.gov> <20060828205822.GB4944@thunk.org>
	<44F37285.8000104@bnl.gov>
	<20060829082003.GM20105@schatzie.adilger.int>
	<44F458AF.7040506@bnl.gov> <20060829170351.GA30599@thunk.org>
	<44F5ACF8.2000705@bnl.gov>
Message-ID: <20060911044829.GC24653@thunk.org>

[ Apologies for the delayed response, I've been travelling in Germany
  and Japan over the past week and a half... ]

On Wed, Aug 30, 2006 at 11:21:28AM -0400, Sev Binello wrote:
> What's the best way to keep informed as to when the patch
> to the kernel is made and released ?

Probably the best way is to subscribe to the
linux-ext4 at vger.kernel.org mailing list...

					- Ted


From mail-lists at karan.org  Fri Sep  8 13:52:23 2006
From: mail-lists at karan.org (Karanbir Singh)
Date: Fri, 08 Sep 2006 14:52:23 +0100
Subject: [CentOS] Re: wiping of unused space on ext3
In-Reply-To: <200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
Message-ID: <45017597.7060608@karan.org>

Ron Yorston wrote:
> "Mindaugas" <mind at bi.lt> wrote:
>>  I was asked if it is possible to zero unused space in ext3 partition?
> 
> I have a couple of patches that add a zerofree mount option to ext2 and
> ext3 filesystems.  The ext2 version is much better tested and more
> complete:  it zeros all file data blocks, directory blocks and extended
> attributes (though not inode data).  The ext3 patch only handles file
> data, not metadata.
> 
> I've been meaning to submit these to LKML, but since you ask let's give
> them an airing here first.

Ron,

thanks for these patch's - I dont think we can have them included in any 
official centos-repository hosted kernel, but its good to know that 
people, should they need this, can get to them here.

- K
-- 
Karanbir Singh : http://www.karan.org/ : 2522219 at icq


From rmy at tigress.co.uk  Tue Sep 12 20:10:56 2006
From: rmy at tigress.co.uk (Ron Yorston)
Date: Tue, 12 Sep 2006 21:10:56 +0100
Subject: [PATCH] ext3: zero freed blocks
In-Reply-To: <20060909132154.GB24906@thunk.org>
References: <00dd01c6d24e$d29d15f0$f20214ac@bite.lt>
	<200609080901.k8891RHs015100@tiffany.internal.tigress.co.uk>
	<200609080904.k8894raH015117@tiffany.internal.tigress.co.uk>
	<20060908201034.GA7542@thunk.org>
	<200609091036.k89AaRXA016086@tiffany.internal.tigress.co.uk>
	<20060909132154.GB24906@thunk.org>
Message-ID: <200609122010.k8CKAvdN018778@tiffany.internal.tigress.co.uk>

Theodore Tso <tytso at mit.edu> wrote:
>1) File gets deleted; the file contents get zero'ed out via the the
>buffer cache.  Since process of zeroing the files happen in the
>background, for a large file, this could continue for a long time...
>
>2) In the meantime, one or more of the disk blocks that was used by
>the old file are reallocated for a new file.  The application writes
>data to the new file, which is stored in the page cache.
>
>3) The application calls fsync() and the contents of the new file are
>flushed from the page cache and written to disk.
>
>4) The dirty buffers containing the zero'ed out contents of the block
>are written to disk, overwriting the contents of the new file.
>
>5) Data is lost.

I'm having no luck in generating any data loss with this sequence of
events.  Any suggestions as to how it might be possible to force it
to happen?

Ron


From jayjitkumar.lobhe at patni.com  Fri Sep 15 02:34:06 2006
From: jayjitkumar.lobhe at patni.com (Jayjitkumar Lobhe)
Date: Thu, 14 Sep 2006 22:34:06 -0400 (EDT)
Subject: Root filesystem on ext2
Message-ID: <47164.208.250.32.6.1158287646.squirrel@192.168.175.202>

Dear All,

I have a following query:

- My initrd image is created using ext2 filesystem.
- The filesystem type of / is specified as ext3 in /etc/fstab file.
- I dont mount the real root during execution of linuxrc because I
referred some documents saying that if you dont mount real root from
linuxrc the kernel will mount it after linuxrc is finished.
- The system boots up successfully, mount command shows / partition
mounted as ext3 but /proc/mount shows it as ext2.

Is this because in the kernel ext3 is built as module? Or
Is this because my image is created using ext2 filesystem?
When is the exact point when kernel mounts real root?(This seems not to be
fitting in this mailing list.)

Thanks in advance. It will be a great help if my queries get answered.

Regards,
Jayjit


From samuel at bcgreen.com  Sun Sep 24 17:55:10 2006
From: samuel at bcgreen.com (Stephen Samuel)
Date: Sun, 24 Sep 2006 10:55:10 -0700
Subject: Retaining undelete data on ext3
In-Reply-To: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
Message-ID: <4516C67E.10609@bcgreen.com>

Having just spent a day trying to recover a deleted ext3 file
for a friend, I'm wondering about this way of maintining
undelete information in ext3, like is done for ext2:

The last step in the deletion process would be to put back
the (previously zeroed) block pointers.  Since it gets logged
to the journal, I _think_ that this should be safe.  The worst
that would happen is that, if the plug gets pulled in the
middle of a file delete, the old block pointers would be
unavailable --  I don't see this as a killer issue, since
editing the filesystem to do an undelete should be considered an
emergency operation anyways.


From keld at dkuug.dk  Sun Sep 24 19:00:00 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Sun, 24 Sep 2006 21:00:00 +0200
Subject: Retaining undelete data on ext3
In-Reply-To: <4516C67E.10609@bcgreen.com>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com>
Message-ID: <20060924190000.GB4263@rap.rap.dk>

On Sun, Sep 24, 2006 at 10:55:10AM -0700, Stephen Samuel wrote:
> Having just spent a day trying to recover a deleted ext3 file
> for a friend, I'm wondering about this way of maintining
> undelete information in ext3, like is done for ext2:
> 
> The last step in the deletion process would be to put back
> the (previously zeroed) block pointers.  Since it gets logged
> to the journal, I _think_ that this should be safe.  The worst
> that would happen is that, if the plug gets pulled in the
> middle of a file delete, the old block pointers would be
> unavailable --  I don't see this as a killer issue, since
> editing the filesystem to do an undelete should be considered an
> emergency operation anyways.

I have a design to improve ext3 so that one could salvage all files,
even if you accidently reformated the partition, Available at 
http://std.dkuug.dk/keld/lazy3.txt
This design has been reviewed by Ted.

I also have some patches for debugfs to undelete files in ext3,
available at http://std.dkuug.dk/keld/readme-salvage.html

best regards
keld


From tytso at mit.edu  Sun Sep 24 19:53:19 2006
From: tytso at mit.edu (Theodore Tso)
Date: Sun, 24 Sep 2006 15:53:19 -0400
Subject: Retaining undelete data on ext3
In-Reply-To: <4516C67E.10609@bcgreen.com>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com>
Message-ID: <20060924195319.GC11083@thunk.org>

On Sun, Sep 24, 2006 at 10:55:10AM -0700, Stephen Samuel wrote:
> Having just spent a day trying to recover a deleted ext3 file
> for a friend, I'm wondering about this way of maintining
> undelete information in ext3, like is done for ext2:
> 
> The last step in the deletion process would be to put back
> the (previously zeroed) block pointers.  Since it gets logged
> to the journal, I _think_ that this should be safe.  The worst
> that would happen is that, if the plug gets pulled in the
> middle of a file delete, the old block pointers would be
> unavailable --  I don't see this as a killer issue, since
> editing the filesystem to do an undelete should be considered an
> emergency operation anyways.

Yep, that's what would have to be done.  The other caveat is that
storing all of the previously zeroed block pointers temporarily in
memory could take quite a bit of memory, especially if what is being
deleted is really big.  Consider that if a DVD iso image file is being
deleted, betewen 4 and 5 megabytes of non-swappable (and on x86, it
would have to be lowmem/ZONE_NORMAL) kernel memory would be required!
Of course, storing the information as a series of extents would be an
obvious optimization, which would work on all but a very badly
fragmented file (for example, if said DVD .iso image was created when
the filesystem was close to 100% full).  

The are some other ways it could be done that would be more optimized,
but the bottom line is that main reason why it hasn't be done is
because the people who could do it haven't had the time to implement
it.  We've been working on other features that are higher priority,
either for ourselves or for our employers.

Regards,

						- Ted


From tytso at mit.edu  Sun Sep 24 20:45:13 2006
From: tytso at mit.edu (Theodore Tso)
Date: Sun, 24 Sep 2006 16:45:13 -0400
Subject: Retaining undelete data on ext3
In-Reply-To: <20060924190000.GB4263@rap.rap.dk>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com> <20060924190000.GB4263@rap.rap.dk>
Message-ID: <20060924204512.GA25658@thunk.org>

On Sun, Sep 24, 2006 at 09:00:00PM +0200, Keld J?rn Simonsen wrote:
> I have a design to improve ext3 so that one could salvage all files,
> even if you accidently reformated the partition, Available at 
> http://std.dkuug.dk/keld/lazy3.txt
> This design has been reviewed by Ted.

To be fair, reviewed != to "approve of all aspects of the design".  We
exchanged e-mails for a while on the subject, yes.  Note that the
design has a number of holes in it --- for example, simply saying,
"don't blank the inode when deleting it" is not so trivial if you also
want to maintain ext3's consistency guarantees.  So when the design
says things like "My idea is to not clear the inodes, when they are
marked as free", that's roughly equivalent to saying, "My idea is to
purify Uranium by using some really big centrifuges".  It is both
simultaneously true and not useful.  The hard part is all in the
engineering.  :-)

> I also have some patches for debugfs to undelete files in ext3,
> available at http://std.dkuug.dk/keld/readme-salvage.html

This should probably be turned into its own standalone program, since
it's far more than the scope of debugfs is intended to be.  So I don't
intend to merge them into debugfs.

Regards,

						- Ted


From adilger at clusterfs.com  Mon Sep 25 15:48:18 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Mon, 25 Sep 2006 09:48:18 -0600
Subject: Retaining undelete data on ext3
In-Reply-To: <4516C67E.10609@bcgreen.com>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com>
Message-ID: <20060925154818.GC22010@schatzie.adilger.int>

On Sep 24, 2006  10:55 -0700, Stephen Samuel wrote:
> Having just spent a day trying to recover a deleted ext3 file
> for a friend, I'm wondering about this way of maintining
> undelete information in ext3, like is done for ext2:
> 
> The last step in the deletion process would be to put back
> the (previously zeroed) block pointers.  Since it gets logged
> to the journal, I _think_ that this should be safe.  The worst
> that would happen is that, if the plug gets pulled in the
> middle of a file delete, the old block pointers would be
> unavailable --  I don't see this as a killer issue, since
> editing the filesystem to do an undelete should be considered an
> emergency operation anyways.

I've written a couple of times the best way to do this, while improving
unlink/truncate performance at the same time (see last sentence):

        "It would be possible to walk the inode and precompute the number
	of bitmaps and group descriptors that would be modified by the
	operation and try to start a single transaction of that size.  If
	this transaction can be started (true in most cases), then we are no
        longer required to zero out all of the [dt]indirect blocks (as we
        do not have to worry about restarting the operation) and we only
        have to update the block bitmaps and their group summaries, reducing
        the amount of IO considerably for block-mapped files.  Also, the
        walking of the file metadata blocks can be done in forward order
        and also asynchronous readahead can be started for indirect blocks
        to make more efficient use of the disk.  As an added benefit we
        would regain the ability to undelete files in ext3 because we no
        longer have to zero out all of the metadata blocks."

The only issue is that nobody has worked on implementing this yet, and I
don't have time.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From samuel at bcgreen.com  Mon Sep 25 22:23:32 2006
From: samuel at bcgreen.com (Stephen Samuel)
Date: Mon, 25 Sep 2006 15:23:32 -0700
Subject: Retaining undelete data on ext3
In-Reply-To: <20060924195319.GC11083@thunk.org>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com> <20060924195319.GC11083@thunk.org>
Message-ID: <451856E4.8060507@bcgreen.com>

As far as I can tell, the only thing that gets zeroed
is the block pointers in the inode (i.e. 12 direct pointers
and one each of single, double and tripple indirects).
so, I'm presuming that all that should need to be
regenerated (and saved), above and beyond what is
already done, is the pointers in the inode itself, which
should take slightly less core than the whole inode entry.

I just did a restore of a 1.5GB tar file from ext3, and the
only information that I had to recover was the pointers
that were in the inode.

Identifying the triple indirect block (real easy) meant that I was only
missing 1MB+ of the file, and finding the double indirect (only slightly
harder) meant
that I was only missing another 48K.
Hunting that last 48K (12 blocks) out of the universe of unallocated blocks
was the real bitch of the recovery process.
If I had those 12 direct block pointers, I could have probably recoveed
the entire tar file in under an hour. and with the extra two pointers
(single and double and indirect) my time would have been down to 15 minutes
(mostly loading software and reading directions).


Theodore Tso wrote:
> On Sun, Sep 24, 2006 at 10:55:10AM -0700, Stephen Samuel wrote:
>   
>> .....
>> The last step in the deletion process would be to put back
>> the (previously zeroed) block pointers.  Since it gets logged
>> to the journal, I _think_ that this should be safe.  The worst
>>     
>
> Yep, that's what would have to be done.  The other caveat is that
> storing all of the previously zeroed block pointers temporarily in
> memory could take quite a bit of memory, especially if what is being
> deleted is really big.  Consider that if a DVD iso image file is being
>   


From guolin at alexa.com  Tue Sep 26 01:27:08 2006
From: guolin at alexa.com (Guolin Cheng)
Date: Tue, 26 Sep 2006 01:27:08 -0000
Subject: Strange Fedora Booting problem: can not mount "LABEL=*"
	partitions
Message-ID: <41089CB27BD8D24E8385C8003EDAF7ABBA487B@karl.alexa.com>

Hi,

 Sorry, NPTL instead of NTPL, typo. too embarrassed. :(

 --Guolin 

-----Original Message-----
From: Guolin Cheng 
Sent: Thursday, April 01, 2004 10:37 PM
To: Fedora (E-mail); Redhat Ext3 (E-mail); jgarzik at redhat.com
Subject: Strange Fedora Booting problem: can not mount "LABEL=*"
partitions


Hi, 

   Just got Fedora FC1 vanilla 2.4.25kernel+libata8patch booting problems, FC1 complains that it can not automatically find&found partitions specified with "LABEL=" in /etc/fstab, and then falls me into repair mode. In the repair mode I can mount it manually without any problems. More interesting are: 1) I have several partitions specified with "LABEL=*" in /etc/fstab, but FC1 always can not identify same partition even on different machines; 2) the default&upgraded ntpl kernel boots up without problems.  My fstab is attached below:

LABEL=/                 /                       ext3    defaults        1 1
LABEL=/0                /0                      ext3    defaults        1 2
/dev/hdc1               /1                      ext3    defaults        1 2
LABEL=/alexa            /alexa                  ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
LABEL=/usr              /usr                    ext3    defaults        1 2
LABEL=/var              /var                    ext3    defaults        1 2
/dev/hda7               swap                    swap    defaults        0 0
/dev/hda6               swap                    swap    defaults        0 0
/dev/hda8               swap                    swap    defaults        0 0
/dev/fd0                /mnt/floppy             auto    noauto,owner,kudzu 0 0
ops-test1.alexa.com guolin 134%

 FC1 stops on partitions "LABEL=/var" on two machines, stops on partition "LABEL=/" on the 3rd machine. While the default|upgraded NTPL kernel (with SMP problem) boots without a glitch, my vanilla 2.4.25 kernel plus libata patch 2.4.25-libata8 fails with the above symptoms described.

 The solution to fix it is:  manually run "e2fsck -y -f  /dev/hd?, tune2fs -j /dev/hd?; e2label /dev/hd? <LABEL>" again even there is no problem with file system, journal node and ext2 label, then reboot. 
  
  SInce we have several hundreds of RH8 machines to upgrade to Fedora, we can not endure to fix booting problem one by one, So where is the problem? File system utilites? 2.4.25 kernel? or the libata patch? 

  The machines has Fedora Core 1 with all packages upgraded: util-linux-2.11y-29, e2fsprogs-1.34-1, 2.4.25+2.4.25-libata8. 

  The system disk's partitions were originally created under Redhat 8.0. This upgrade to FC1 is as simple as: booting the machines into a FC1 diskless mode, then create file system on existing /, /usr, /var partitions resides on system disk, label 3 partitions and and dump system tarballs onto them,  install lilo bootload onto system disk  and reboot. The simple&efficient way works great for years for us except this time. :(

  Any suggestions? and what's the difference between 2.4.25-libata8 patch and 2.4.25-libata16 (bleeding-edge) patches?


  Thanks a lot.

  --Guolin Cheng
  

_______________________________________________
Ext3-users mailing list
Ext3-users at redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users


-- 
fedora-list mailing list
fedora-list at redhat.com
To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list


From sdang at MIT.EDU  Tue Sep 26 18:09:33 2006
From: sdang at MIT.EDU (Sabin Dang)
Date: Tue, 26 Sep 2006 14:09:33 -0400
Subject: EXT3-fs: invalid journal inode.
Message-ID: <114B4AC1-3BB2-4E76-9628-99F945AAE231@mit.edu>

Hi Everyone,

I have a server which has a raid array on /dev/sdb. After a crash I  
tried to mount the array and it failed with:

mount: wrong fs type, bad option, bad superblock on /dev/sdb,
        missing codepage or other error

dmesg reports:
EXT3-fs: invalid journal inode.

I tried to run e2fsck and it failed (The output can be seen below).

If anyone has any suggestions on how I can restore the filesystem I  
would greatly appreciate the help.

Thanks in advance,
Sabin


*** BEGIN e2fsck output *****

root at 0[sabin]# e2fsck -b 32768 /dev/sdb
e2fsck 1.38 (30-Jun-2005)
Superblock has an invalid ext3 journal (inode 8).
Clear<y>? yes

*** ext3 journal has been deleted - filesystem is now ext2 only ***

Superblock doesn't have has_journal flag, but has ext3 journal inode.
Clear<y>? yes

/dev/sdb was not cleanly unmounted, check forced.
e2fsck: Illegal doubly indirect block found while reading bad blocks  
inode
This doesn't bode well, but we'll try to go on...
Pass 1: Checking inodes, blocks, and sizes
Bad block inode has illegal block(s).  Clear<y>? yes

Illegal block #9 (2933653514) in bad block inode.  CLEARED.
Group 32's block bitmap (1048576) is bad.  Relocate<y>? yes

Block 8 in the primary group descriptors is on the bad block list

If the block is really bad, the filesystem can not be fixed.
You can remove this block from the bad block list and hope
that the block is really OK.  But there are no guarantees.

Clear<y>? yes

Bad block inode has an indirect block (1048577) that conflicts with
filesystem metadata.  CLEARED.
Bad block inode has an indirect block (1048576) that conflicts with
filesystem metadata.  CLEARED.

The bad block inode has probably been corrupted.  You probably
should stop now and run e2fsck -c to scan for bad blocks
in the filesystem.
Continue<y>? no

e2fsck: aborted
root at 0[sabin]# e2fsck -c  /dev/sdb
e2fsck 1.38 (30-Jun-2005)
Superblock has an invalid ext3 journal (inode 8).
Clear<y>? yes

*** ext3 journal has been deleted - filesystem is now ext2 only ***

Superblock doesn't have has_journal flag, but has ext3 journal inode.
Clear<y>? yes

Inode count in superblock is 151584768, should be 152633344.
Fix<y>? yes

ext2fs_block_iterate: Ext2 file too big while sanity checking the bad  
blocks inode


From tweeks at rackspace.com  Tue Sep 26 18:39:28 2006
From: tweeks at rackspace.com (tweeks)
Date: Tue, 26 Sep 2006 13:39:28 -0500
Subject: EXT3-fs: invalid journal inode.
In-Reply-To: <114B4AC1-3BB2-4E76-9628-99F945AAE231@mit.edu>
References: <114B4AC1-3BB2-4E76-9628-99F945AAE231@mit.edu>
Message-ID: <200609261339.28244.tweeks@rackspace.com>

On Tuesday 26 September 2006 01:09 pm, Sabin Dang wrote:
> Hi Everyone,
>
> I have a server which has a raid array on /dev/sdb. After a crash I
> tried to mount the array and it failed with:
>
> mount: wrong fs type, bad option, bad superblock on /dev/sdb,
>         missing codepage or other error


Umm.. there's no superblock (or filesystem) on /dev/sdb.  Filesystems are on 
partitions.. such as /dev/sdb1 or sdb2.  Not the raw device.

> dmesg reports:
> EXT3-fs: invalid journal inode.
>
> I tried to run e2fsck and it failed (The output can be seen below).
> If anyone has any suggestions on how I can restore the filesystem I
> would greatly appreciate the help.

Ouch.. hope you didn't just hose your own filesystem.

Just to be safe... boot from CD into rescue mode.. and try mounting it the 
PARTITIONS as ext2.  If problems persist, nuke the remaining journal (if any) 
my mounting it as ext2 and/or using fsck.ext2.  Once you get it clean at the 
ext2 level and can cleanly mount it.. recreate the journal with tune2fs -j:
	# tune2fs -j /dev/sdb2 (for example)

Then try rembooting ans see how things go.

Tweeks


From evilninja at gmx.net  Tue Sep 26 18:45:35 2006
From: evilninja at gmx.net (Christian)
Date: Tue, 26 Sep 2006 19:45:35 +0100 (BST)
Subject: EXT3-fs: invalid journal inode.
In-Reply-To: <114B4AC1-3BB2-4E76-9628-99F945AAE231@mit.edu>
References: <114B4AC1-3BB2-4E76-9628-99F945AAE231@mit.edu>
Message-ID: <Pine.LNX.4.64.0609261939400.20590@sheep.housecafe.de>

On Tue, 26 Sep 2006, Sabin Dang wrote:
> I have a server which has a raid array on /dev/sdb.

So, is it a hardware raid or something? are there any device related 
messages in the syslog?

> root at 0[sabin]# e2fsck -b 32768 /dev/sdb
> e2fsck 1.38 (30-Jun-2005)

A couple of questions here:

  - Do you really meant to check sdb, not sdb1 or sth.?
  - Why did you have to specify an alternative superblock here?
    Did it not run without -b?
  - Any chance you you could use a more current version of
    e2fsprogs (like 1.39 or so)?
  - Did you backup sdb *before* trying to fsck?

Christian.
-- 
BOFH excuse #110:

The rolling stones concert down the road caused a brown out


From adilger at clusterfs.com  Tue Sep 26 18:51:12 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 26 Sep 2006 12:51:12 -0600
Subject: EXT3-fs: invalid journal inode.
In-Reply-To: <Pine.LNX.4.64.0609261939400.20590@sheep.housecafe.de>
References: <114B4AC1-3BB2-4E76-9628-99F945AAE231@mit.edu>
	<Pine.LNX.4.64.0609261939400.20590@sheep.housecafe.de>
Message-ID: <20060926185112.GM22010@schatzie.adilger.int>

On Sep 26, 2006  19:45 +0100, Christian wrote:
> On Tue, 26 Sep 2006, Sabin Dang wrote:
> >I have a server which has a raid array on /dev/sdb.

How large is this array?  If > 2TB and your kernel does not have
CONFIG_LBD enabled, then you may have massive filesystem corruption.

I posted a patch to ext2-devel to detect this, and it is in the
latest (2.6.18?) kernel.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


From evilninja at gmx.net  Tue Sep 26 20:38:22 2006
From: evilninja at gmx.net (Christian)
Date: Tue, 26 Sep 2006 21:38:22 +0100 (BST)
Subject: EXT3-fs: invalid journal inode.
In-Reply-To: <7805A353-7CE5-4335-9944-CFBD300825D7@MIT.EDU>
References: <114B4AC1-3BB2-4E76-9628-99F945AAE231@mit.edu>
	<Pine.LNX.4.64.0609261939400.20590@sheep.housecafe.de>
	<7805A353-7CE5-4335-9944-CFBD300825D7@MIT.EDU>
Message-ID: <Pine.LNX.4.64.0609262131470.20590@sheep.housecafe.de>


(please post on-list, so that all ppl can read/reply to your mail)

On Tue, 26 Sep 2006, Sabin Dang wrote:
> It is a hardware raid (3ware, RAID 5 + hostspare ) and the raid reports all 
> drives are fine

...and no errors in syslog then, I suppose.
How big is the array?

> I have a backup, but unfortunately the data on the raid is needed very 
> quickly to meet a deadline. Restoring from backups is possible but is time 
> consuming (a process I've started on another system already, just hoping to 
> get things up and running quickly).

Sure. What I meant was: if you can (i.e. if you have another at least 
equal-sized disk(array)): dd your "bad" sdb to this other device as a 
backup, before attempting to play with fsck and "fsck -n" already tells 
you that the fs is severely damaged. That way you could play back the 
(nevertheless corrupt) original sdb if fsck is "fixing" more than 
needed.

Christian.
-- 
BOFH excuse #43:

boss forgot system password


From samnospam at bcgreen.com  Sun Sep 24 17:48:23 2006
From: samnospam at bcgreen.com (Stephen Samuel)
Date: Sun, 24 Sep 2006 10:48:23 -0700
Subject: Retaining undelete data on ext3
In-Reply-To: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
Message-ID: <4516C4E7.5000307@bcgreen.com>

Having just spent a day trying to recover a deleted ext3 file
for a friend, I'm wondering about this way of maintining
undelete information in ext3, like is done for ext2:

The last step in the deletion process would be to put back
the (previously zeroed) block pointers.  Since it gets logged
to the journal, I _think_ that this should be safe.  The worst
that would happen is that, if the plug gets pulled in the
middle of a file delete, the old block pointers would be
unavailable --  I don't see this as a killer issue, since
editing the filesystem to do an undelete should be considered an
emergency operation anyways.

-- 
Stephen Samuel +1(778)861-7641             samnospam at bcgreen.com
		   http://www.bcgreen.com/
   Powerful committed communication. Transformation touching
     the jewel within each person and bringing it to light.


From samnospam at bcgreen.com  Mon Sep 25 22:22:42 2006
From: samnospam at bcgreen.com (Stephen Samuel)
Date: Mon, 25 Sep 2006 15:22:42 -0700
Subject: Retaining undelete data on ext3
In-Reply-To: <20060924195319.GC11083@thunk.org>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com> <20060924195319.GC11083@thunk.org>
Message-ID: <451856B2.3090601@bcgreen.com>

As far as I can tell, the only thing that gets zeroed
is the block pointers in the inode (i.e. 12 direct pointers
and one each of single, double and tripple indirects).
so, I'm presuming that all that should need to be
regenerated (and saved), above and beyond what is
already done, is the pointers in the inode itself, which
should take slightly less core than the whole inode entry.

I just did a restore of a 1.5GB tar file from ext3, and the
only information that I had to recover was the pointers
that were in the inode.

Identifying the triple indirect block (real easy) meant that I was only
missing 1MB+ of the file, and finding the double indirect (only slightly 
harder) meant
that I was only missing another 48K.
Hunting that last 48K (12 blocks) out of the universe of unallocated blocks
was the real bitch of the recovery process.
If I had those 12 direct block pointers, I could have probably recoveed
the entire tar file in under an hour. and with the extra two pointers
(single and double and indirect) my time would have been down to 15 minutes
(mostly loading software and reading directions).


Theodore Tso wrote:
> On Sun, Sep 24, 2006 at 10:55:10AM -0700, Stephen Samuel wrote:
>   
>> .....
>> The last step in the deletion process would be to put back
>> the (previously zeroed) block pointers.  Since it gets logged
>> to the journal, I _think_ that this should be safe.  The worst
>>     
>
> Yep, that's what would have to be done.  The other caveat is that
> storing all of the previously zeroed block pointers temporarily in
> memory could take quite a bit of memory, especially if what is being
> deleted is really big.  Consider that if a DVD iso image file is being
>   

-- 
Stephen Samuel +1(778)861-7641             samnospam at bcgreen.com
		   http://www.bcgreen.com/
   Powerful committed communication. Transformation touching
     the jewel within each person and bringing it to light.


From keld at dkuug.dk  Wed Sep 27 08:57:48 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Wed, 27 Sep 2006 10:57:48 +0200
Subject: Retaining undelete data on ext3
In-Reply-To: <20060924204512.GA25658@thunk.org>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com> <20060924190000.GB4263@rap.rap.dk>
	<20060924204512.GA25658@thunk.org>
Message-ID: <20060927085748.GA13358@rap.rap.dk>

On Sun, Sep 24, 2006 at 04:45:13PM -0400, Theodore Tso wrote:
> On Sun, Sep 24, 2006 at 09:00:00PM +0200, Keld J?rn Simonsen wrote:
> > I have a design to improve ext3 so that one could salvage all files,
> > even if you accidently reformated the partition, Available at 
> > http://std.dkuug.dk/keld/lazy3.txt
> > This design has been reviewed by Ted.
> 
> To be fair, reviewed != to "approve of all aspects of the design".  We
> exchanged e-mails for a while on the subject, yes. 

Yes, you did not approve the design, but you looked at it and found some
things that were not implementable, and I then corrected the design.

> Note that the
> design has a number of holes in it --- for example, simply saying,
> "don't blank the inode when deleting it" is not so trivial if you also
> want to maintain ext3's consistency guarantees.  So when the design
> says things like "My idea is to not clear the inodes, when they are
> marked as free", that's roughly equivalent to saying, "My idea is to
> purify Uranium by using some really big centrifuges".  It is both
> simultaneously true and not useful.  The hard part is all in the
> engineering.  :-)

Yaeh, the remark "My idea is to not clear the inodes, when they are
marked as free" is meant to be a general outline of the idea, and then
the more practical aspects are outlined further in the paper.

Which guarantees are being breached with the design?

> > I also have some patches for debugfs to undelete files in ext3,
> > available at http://std.dkuug.dk/keld/readme-salvage.html
> 
> This should probably be turned into its own standalone program, since
> it's far more than the scope of debugfs is intended to be.  So I don't
> intend to merge them into debugfs.

yes, it is probably a standalone program. I also have some ideas for 
repairing a system with io-errors, where the inodes are intact, but my
programming is driven by myself having problems to solve, and I don't
have a damaged fs that I need to repair at the moment.

Anyway, I find that I need a number of the capabilities of debugfs when
one tries to salvage files in a damaged fs, and it would be cumbersome
to swith between debugfs and a salvage program, and a waiste to
implement and maintain the debugfs capabilities in a new salvation
program, so maybe it is best to have the rescue capabilities built into
debugfs anyway.

best regards
keld


From tytso at mit.edu  Wed Sep 27 14:16:21 2006
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 27 Sep 2006 10:16:21 -0400
Subject: Retaining undelete data on ext3
In-Reply-To: <451856B2.3090601@bcgreen.com>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com> <20060924195319.GC11083@thunk.org>
	<451856B2.3090601@bcgreen.com>
Message-ID: <20060927141621.GB9483@thunk.org>

On Mon, Sep 25, 2006 at 03:22:42PM -0700, Stephen Samuel wrote:
> As far as I can tell, the only thing that gets zeroed
> is the block pointers in the inode (i.e. 12 direct pointers
> and one each of single, double and tripple indirects).
> so, I'm presuming that all that should need to be
> regenerated (and saved), above and beyond what is
> already done, is the pointers in the inode itself, which
> should take slightly less core than the whole inode entry.

That surprises me, and I'm not sure that's always true, in particular
if the transaction touches so many block allocation bitmaps that the
unlink gets broken up into multiple transactions.  In that case I
would think the indirect blocks might have to be partially cleared so
that the on-disk image is consistent if we crash in the middle of the
unlink.  Still, I haven't crawled through the code in detail in a
while, and it's possible that we do the block_forget on indirect block
boundaries to avoid this.  

But if it's just a matter of saving and restoring the inode fields,
yes, that would be a much simpler patch.

						- Ted


From samnospam at bcgreen.com  Wed Sep 27 15:16:13 2006
From: samnospam at bcgreen.com (Stephen Samuel)
Date: Wed, 27 Sep 2006 08:16:13 -0700
Subject: Retaining undelete data on ext3
In-Reply-To: <20060927141621.GB9483@thunk.org>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com> <20060924195319.GC11083@thunk.org>
	<451856B2.3090601@bcgreen.com> <20060927141621.GB9483@thunk.org>
Message-ID: <451A95BD.3090502@bcgreen.com>

Well, All I can say is that I just reconstructed most of
a 1.5GB file, by doing nothing more than hunting down
the double indirect block, and the trick seemed to work fine.

I can try and reconstruct a larger file, and see what
happens to it, but it empirically seems like  zeroing
indirect blocks must, at most, be limited to severely
fragmented large files.
Even if that's the case, being able to reconstruct
_most_ file easily seems like a nice improvement over
the current situation, and would be easy to recognize
for non-sparse files.

I made some changes to dls, and wrote a perl program
to do the trick. I'll be releasing it in a while, once I get
the time to tidy it up a bit and document the work.
(probably next week).


Theodore Tso wrote:
> On Mon, Sep 25, 2006 at 03:22:42PM -0700, Stephen Samuel wrote:
>   
> That surprises me, and I'm not sure that's always true, in particular
> if the transaction touches so many block allocation bitmaps that the
> unlink gets broken up into multiple transactions.  In that case I
> would think the indirect blocks might have to be partially cleared so
> that the on-disk image is consistent if we crash in the middle of the
> unlink.  Still, I haven't crawled through the code in detail in a
> while, and it's possible that we do the block_forget on indirect block
> boundaries to avoid this.  
>
> But if it's just a matter of saving and restoring the inode fields,
> yes, that would be a much simpler patch.
>   


-- 
Stephen Samuel +1(778)861-7641             samnospam at bcgreen.com
		   http://www.bcgreen.com/
   Powerful committed communication. Transformation touching
     the jewel within each person and bringing it to light.


From samuel at bcgreen.com  Wed Sep 27 15:43:45 2006
From: samuel at bcgreen.com (Stephen Samuel)
Date: Wed, 27 Sep 2006 08:43:45 -0700
Subject: Retaining undelete data on ext3
In-Reply-To: <20060927141621.GB9483@thunk.org>
References: <S1751239AbWIXR36/20060924172958Z+453@vger.kernel.org>
	<4516C67E.10609@bcgreen.com> <20060924195319.GC11083@thunk.org>
	<451856B2.3090601@bcgreen.com> <20060927141621.GB9483@thunk.org>
Message-ID: <451A9C31.6000703@bcgreen.com>

Well, All I can say is that I just reconstructed most of
a 1.5GB file, by doing nothing more than hunting down
the double indirect block, and the trick seemed to work fine.

I can try and reconstruct a larger file, and see what
happens to it, but it empirically seems like  zeroing
indirect blocks must, at most, be limited to severely
fragmented large files.
Even if that's the case, being able to reconstruct
_most_ file easily seems like a nice improvement over
the current situation, and would be easy to recognize
for non-sparse files.

I made some changes to dls, and wrote a perl program
to do the trick. I'll be releasing it in a while, once I get
the time to tidy it up a bit and document the work.
(probably next week).


Theodore Tso wrote:
> On Mon, Sep 25, 2006 at 03:22:42PM -0700, Stephen Samuel wrote:
>   
> That surprises me, and I'm not sure that's always true, in particular
> if the transaction touches so many block allocation bitmaps that the
> unlink gets broken up into multiple transactions.  In that case I
> would think the indirect blocks might have to be partially cleared so
> that the on-disk image is consistent if we crash in the middle of the
> unlink.  Still, I haven't crawled through the code in detail in a
> while, and it's possible that we do the block_forget on indirect block
> boundaries to avoid this.  
>
> But if it's just a matter of saving and restoring the inode fields,
> yes, that would be a much simpler patch.
>