forced fsck (again?)

Fri Jan 25 00:36:05 UTC 2008

On Jan 24, 2008  07:19 -0500, Bryan Kadzban wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Andreas Dilger wrote:
> > The problem with this is that ext2/3/4, along with most other
> > filesystems will fail to mount if passed an unknown mount option.
> 
> Uh oh.  Yeah, that's a problem.
> 
> I was under the impression that all the tools would ignore unknown
> options -- if that's not the case, then we probably need to come up with
> something else.  Automatically determining the snapshot size sounds like
> a good idea, but I'm not sure how to do it.  (I'm not sure what decides
> the snapshot size that you need -- it looks like the number of changes
> that you're going to make to the snapshot, or maybe the number of
> changes that you're going to make to both the snapshot and the real LV?

Since we aren't making any changes to the LV it is only the changes that
are made to the original volume that consume space in the volume.

>  In either case, I'm not sure how to find that out.  Maybe just using
> "all available space in the VG" is a better idea anyway.)

I made a wild guess of 1/500 of the total volume size.  Making the snapshot
size a linear function of the volume size makes sense, because the fsck
time is generally linear with the volume size, and the amount of change
in the original volume (and hence the space needed in the snapshot) is
also a linear function of how long the fsck runs.

Having a minimum size for things like the journal, and a maximum size of
the free space in the VG definitely makes sense.

Another thing worth checking in the script is if there is an existing
snapshot volume (maybe left over if the script was interrupted by a crash)
and delete it before recreating the volume.  It also makes sense to have
a very clear name like "{lvname}.fsck.temporary.20080124" that can be
easily seen by the user as not very useful, and can also be deleted by
the script safely.

> True, but what about determining whether it has to run at all (based on
> the last-check time)?  Although, I suppose it would work to leave the
> check interval set in the superblock, and avoid using fsck.* -f; that
> way each fsck would be able to determine if it should do a full check or
> not.

I would just run the script from cron.weekly instead of every night.  If
we miss the check for a few days this isn't harmful, and better than
annoying users.

> Or maybe rewriting in C would work; then I could just use getmntent.
> Although I'm not exactly a fan of writing something like this in C,
> either; shell is more powerful, except for this "reading fstab" thing.

No, I'd rather have a shell script...  Less long-term maintenance.

> > But I've come to think that /etc/fstab is the wrong thing to use for 
> > input.  This script is only useful for LVM volumes, so getting a list
> > of LVs is more appropriate I think.
> 
> True, except the no-LVs behavior of lvscan, lvs, and any of the other
> tools that I was looking at yesterday is decidedly non-optimal.

What is the problem there?  My simple test showed "lvs" on a system
w/o LVM reports "No volume groups found" to stderr, and that can
easily be ignored.

> We'd still need to find the FS type, although I believe udev provides
> some programs that may be helpful (if we want to rely on them being
> installed).  volume_id, in particular, should provide that info.

If it's part of e2fsprogs, then using "blkid" is much better, since it
is also part of e2fsprogs.

	export `blkid -s TYPE $FS | cut -d' ' -f2`

will set an environment variable TYPE={fstype}.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.