rebooting more often to stop fsck problems and total disk loss

Andreas Dilger adilger at
Mon Mar 19 21:27:19 UTC 2007

On Mar 19, 2007  17:15 -0400, ahlist wrote:
> Quite often we'll have a server that either needs a really long fsck
> (10 hours - 200 gig drive) or an fsck that evntually results in
> everything going to lost+found (pretty much a total loss).

Strange.  We get 1TB/hr fscks these days unless the filesystem is
completely corrupted and has a lot of duplicate blocks.

> Would rebooting these servers monthly (or some other frequency) stop this?

What else is important is that if you do an fsck you run with "-f" to
actually check the filesystem instead of just the superblock.  e2fsck
will only do a full e2fsck if the kernel detected disk corruption, OR
if the "last checked" time is > 6 months or {20 < X < 40} mounts have
happened since the last check time.  See tune2fs(8) for details.

> Is it correct to visualize this as small errors compounding over time
> thus more frequent reboots would allow quick fsck's to fix the errors
> before they become huge?

That is definitely true.  If the bitmaps get corrupted, then this will
spread corruption throughout the filesystem.

> (OS is redhat 7.3 and el3)

I would instead suggest updating to a newer kernel (e.g. RHEL4 2.6.9) as
this has fixed a LOT of bugs in ext3.  Also, make sure you are using the
newest e2fsck available, as some bugs have been fixed there also.

Cheers, Andreas
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

More information about the Ext3-users mailing list