File system checking on ext3 after a system crash

Mon Apr 9 12:56:25 UTC 2007

On Mon, Apr 09, 2007 at 07:23:57AM -0400, Balu manyam wrote:
> Ted -- Thanks for your response - It was indeed very helpful.
> I realized a full fsck was enforced due to  the FS went unchecked for more
> than 180 days (default period when the FS was created with mke2fs -j
> <blockdevice>)
> 
> So my question for you and folks in the group -- Can I safely disable this
> behavior   of  routine fsck  with tune2fs -i 0 <blockdevice) .(By doing this
> I am assuming that the e2fsck program does a log replay whenever there is a
> system crash - and a manual intervention is needed whenever this fails.)

Your assumption is correct.

> Are there negative implications by doing the above.

If you are 100% sure that your storage subsystem will never flip a bit
or corrupt a block, and the kernel is 100% bug-free, and you aren't
using some proprietary binary graphics driver with potential pointer
bugs that might corrupt your buffer cache --- then it is 100% safe to
disable the routine fsck.   :-)

It's basically there as a paranoia check so that even if everything is
going great, every once in a long while a full check gets done.  It's
not actually the best way to do things, of course.  For someone who
wants to be 100% sure that they don't lose any data, I recommend three steps:

1)  Backups, backups, backups!

2) If you are using LVM/DM, and you have the capability to create
copy-on-write snapshots, and once a month or so, at 3am on a Sunday
night, or some other low-utilization time, have a cron script fire of
the COW snapshot, and then run e2fsck -n on that snapshot.  If e2fsck
reports any errors, have the script mail the results to the
administrator, so the administrator can schedule downtime to run an
fsck to fix the error.  If no errors are reported, the script can use
"tune2fs -C 0 -T now /dev/sdXX" to reset the number of moutns since
the last filesystem check and last checked time on the original
filesystem (it is safe to do this on a mounted filesystem).

3) Periodically, and at a non-peak time, use the e2image program to
save a backup copy of the filesystem metadata.  Do this *especially*
if you don't have space to do a real backup.  This will give you at
least some measure of a saving throw against a single bad disk write
(caused by malfunctioning storage hardware, or the aforementioned
buggy binary-only graphical driver written in C++ with the pointer
error) from destroying a huge numer of files.

Regards,

						- Ted