Ext3 fsck times

Marcelino Mata mmata at multimatic.com
Tue Sep 19 18:47:11 UTC 2006


 
(Pentium 3 1.2Ghz server with Redhat 3 ES and 1.2GB RAM)

Several months ago we had a PCI-X slot die which was the slot for the
RAID controller.  To be safe, I let the system perform full FSCK check
on all ext3 filesystems.  I was shocked how slow the process was.  It
took over 12 hours (do not remember exact number) on 500Gb filesystem.
If this fsck event happened at other times (like system rebooted during
the day, I would have serious user issues.  If I understand things
correctly, full fsck are forced every 180 days unless I change it with
tunefs.  I an reluctant to change it since those defaults are probably
there for good reason.  In desktop environment, I noticed fsck event has
occurred after reboot even when the tunefs time or mount times were not
met.  This tells me the ext3 filesystem can get inconsistent and kernel
developers added the fsck event for good reason. 

Another thing I noticed is that the fsck time was greatly reduced when I
manually triggered it to complete a growfs operation on the 500Gb
filesystem.  In that case, I believe it performed fsck in under 15
minutes.  Was my longer fsck time due to the fact that the volume
suffered seriously hardware loss?  I did not notice any additional error
messages during the hardware failure fsck.  Running fsck on the 500Gb
filesystem manually when it has "clean state" only takes seconds so
maybe things are not that bad?

Assuming that I should plan for fsck downtime, I believe my only options
are the following:

1) Switch to non-supported Red Hat filesystem (XFS, ReiserFS).  Would
this really help?
2) Pay big dollars for Veritas FS
3) Tunefs it to every 6 months and create weekend cron fsck job which :
      a) Shuts down NFS and Samba
      b) umount filesystem.
      c) performs fsck on filesystem.
      d) mounts, start NFS and Samba
(as I have multiple filesystems ranging from 40Gb to 500Gb, I could
stagger the fsck to different weekends)
4) Just reboot the server every 6 months and let automatic fsck take
care of everything.
5) Mixture of (3) and (4).

The problem with (3) is that I would have to monitor it so that the fsck
operation completes successfully.  I noticed sometimes fsck requires
second system reboot to clear all settings.  

My question is how do people manage this in a production environment?  

Marcelino






More information about the redhat-list mailing list