forced fsck (again?)

Theodore Tso tytso at MIT.EDU
Tue Jan 22 22:52:48 UTC 2008


On Tue, Jan 22, 2008 at 02:34:35PM -0800, Valerie Henson wrote:
> This will be ironic coming from me, but I think the ext3 defaults for
> forcing a file system check are a little too conservative for many
> modern use cases.  The two cases I have in mind in particular are:

Yeah.  To the extent that people are using devicemapper/LVM
everywhere, there is a much better solution.  To wit:

#!/bin/sh
#
# e2croncheck

VG=closure
VOLUME=root
SNAPSIZE=100m
EMAIL=tytso at mit.edu

TMPFILE=`mktemp -t e2fsck.log.XXXXXXXXXX`

set -e
START="$(date +'%Y%m%d%H%M%S')"
lvcreate -s -L ${SNAPSIZE} -n "${VOLUME}-snap" "${VG}/${VOLUME}"
if nice logsave -as $TMPFILE e2fsck -p -C 0 "/dev/${VG}/${VOLUME}-snap" && \
   nice logsave -as $TMPFILE e2fsck -fy -C 0 "/dev/${VG}/${VOLUME}-snap" ; then
  echo 'Background scrubbing succeeded!'
  tune2fs -C 0 -T "${START}" "/dev/${VG}/${VOLUME}"
else
  echo 'Background scrubbing failed! Reboot to fsck soon!'
  tune2fs -C 16000 -T "19000101" "/dev/${VG}/${VOLUME}"
  if test -n "EMAIL"; then 
    mail -s "E2fsck of /dev/${VG}/${VOLUME} failed!" $EMAIL < $TMPFILE
  fi
fi
lvremove -f "${VG}/${VOLUME}-snap"
rm $TMPFILE

> * Servers with long uptimes that need very low data unavailability
> times.  Imagine you have a machine room full of servers that have all
> been up and running happily for more than 180 days - the preferred
> case.

And the server should be checking the filesystem every month or so.
But with the long, extended uptime, it doesn't happen.  Using LVM and
the above script solves that problem.

> * Laptops.  If suspend and resume doesn't work on your laptop, you'll
> be rebooting (and remounting) a lot, perhaps several times a day.  The
> preferred solution is to get Matthew Garrett to fix your laptop, but
> if you can't, fscking every 10-30 days seems a little excessive.

It's sad that it's <named kernel developer> to get suspend/resume
working.  But yeah, it's either Matthew or someone like Nigel from the
TuxOnIce lists to help you, or maybe a few other people.

Checking from cron is I believe the right answer, here, too, as long
as there is a check to make sure you're running on AC before doing the
check.

So ---- for someone who has time, I offer the following challenge.
Take the above script, and enhance it in the following ways:

	* Read a configuration file to see which filesystem(s) to
          check and to which e-mail the error reports should be sent.

	* Have the script abort the check if the system appears to be
          running off of a battery.

	* Have the config file define a time period (say, 30 days),
          and have the script test to see if the last_mount time is
          greater than the time interval.  If it is, then it does the
          check, otherwise it skips it.

With these enhancements, in the laptop case the script could be fired
off by cron every night at 3am, and if a month has gone by without a
check, AND the laptop is running off the AC mains, the check happens
automatically, in the background.

> I'm not sure what the best solution is - print warnings for several
> days/mounts before the force fsck? print warnings but don't force
> fsck? increase the default days/mounts before force fsck? base force
> fsck intervals on write activity? - but in practice I find myself
> telling people about "tune2fs -c 0 -i 0" a lot.  I use it on all my
> file systems and run fsck by hand every few months (or more often when
> I'm working on fsck :) ).

Well, this isn't a complete solution, because a lot of people don't
use LVM, often because they don't trust initrd's to do the right thing
--- and quite frankly, I can't blame them.  But doing this kind of
thing is so much better that maybe it would actually help convert more
kernel developers to use LVM on their boot filesystem.  (Well,
probably not.  That's probably being too optimistic.  :-)

       		     	     	      -	Ted




More information about the Ext3-users mailing list