File system checking on ext3 after a system crash

Mon Apr 9 20:14:51 UTC 2007

We have backups etc but that's all very time consuming when trying to 
restore in an operational env.
So,  I thought well we could run e2image every night,
and if the file system is totally shot (ie. sometimes after e2fsck we 
don't have much of a file system left),
 to the point where we have  to restore from backup,
then hey we could give e2image a shot and just lose a limited amount of 
data.
Is that too naive ?
I got the impression below, that creating an image may be too time 
consuming ?
I'm talking about filesystems  about 500Gb, and don't change a real lot.

Thanks.
-Sev

Theodore Tso wrote:
> On Mon, Apr 09, 2007 at 02:29:57PM -0400, Sev Binello wrote:
>   
>>> 3) Periodically, and at a non-peak time, use the e2image program to
>>> save a backup copy of the filesystem metadata.  Do this *especially*
>>> if you don't have space to do a real backup.  This will give you at
>>> least some measure of a saving throw against a single bad disk write
>>> (caused by malfunctioning storage hardware, or the aforementioned
>>> buggy binary-only graphical driver written in C++ with the pointer
>>> error) from destroying a huge numer of files.
>>>       
>>    I noted this response with interest.
>>    I was unaware of this tool.
>>     
>
> It's been around since e2fsprogs 1.20 (May 20, 2001), but it hasn't
> gotten a lot of play outside of my "Recovering From Hard Drive
> Disasters" Usenix tutorial.  Anyone feel like writing a HOWTO
> document?  :-)
>
>   
>>    I did a quick test and looks simple to use, are there any caveats or 
>> hidden gotchas ?
>>    I understand it will only restore to the state it was in when the 
>> image was taken,
>>    but in a pinch that maybe an alternative we could use.
>>     
>
> In general I'd recommend against using the e2image -I option.  As I've
> stated in the man page, it is rarely the right answer.  It's there
> primarily so I can do a demonstration of recovering from a mke2fs (and
> it is quite the impressive demo), but unless the e2image is very
> fresh, it is very likely that it will do more harm than good.
>
> The main use of the e2image file is that you can use it with debugfs:
>
> 	debugfs -d /dev/sda2 -i sda2.e2i
>
> Now you can use the dump and rdump commands to copy out critical files
> from the damaged filesystem.
>
>   
>>    Any idea how long it takes to create/restore ?
>>     
>
> The main cost is the time to read the entire inode table from the
> filesystem and write it back out to the e2image file, so it really
> depends on the size of the filesystem.  On my
> when-I-have-time-for-a-quick-hack list, I have adding a new option to
> e2image which assumes that the filesystem bitmap blocks are
> trustworthy and will only back up the portion of the inode table which
> is actually in use.  That will almost certainly be in the next version
> of e2fsprogs, since that's a pretty simple change.
>
>   
>>    Would it make sense to run on a daily basis ?
>>     
>
> If you have sufficient amounts of off-peak time, yes!  
>
>   
>>    Also, wondering if you could point me to documentation explaining 
>> how to
>>    respond to e2fsck questions when it finds problems in the file system.
>>     
>
> Hmm, there really isn't any.  In general the right answer is almost
> always 'yes', but sometimes I'll take a quick look at the filesystem
> using debugfs before answering yes just in case manual intervention
> could do a better job.   
>
> The big thing is that if e2fsck wants to relocate an inode table, you
> almost always want to stop and backup metadata blocks using e2image
> first.  In fact I'm thinking about revamping that logic since right
> now the potential for doing great harm to the filesystem is far too
> high.  So the fact that you might want to say 'n' there is really more
> of a sane of a e2fsck bug, or at least misdesign, more than anything
> else.
>
> Regards,
>
> 						- Ted
>   

-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev at bnl.gov