Forced FSCK on Bad Reboot

James Wilkinson fedora at westexe.demon.co.uk
Fri Jul 15 22:17:14 UTC 2005


Mike McCarty wrote lots of stuff about reliability.

Try this. Reliability is like safety (aircraft, space, rail, whatever).
You're never going to get anything that is really 100% reliable. If you
really want 99.9999% reliability, you want something like an IBM
mainframe or what I think they're now calling HP NonStop (used to be
Tandem, IIRC). PC hardware, as you observe, just isn't good enough to be
totally reliable.

I've written (and deleted) several paragraphs about utility functions,
and the trade-off transportation engineers have to make. But, basically,
Bad Things Happen. There are ways to minimise risk. Some of them are
expensive. Depending on the nature of the risk, and how likely and how
serious it is, not all ways to minimise risk are worth the expense,
especially if the probability of Bad Things Happening are rare.

This is different for different people. Some people find it worthwhile
to pay a lot extra for a car with slightly better crash protection.
Others don't. But someone has to make the choices for how Fedora comes
"out of the box".

There are always ways to make computing more reliable. There are a whole
number of sanity checks in the Linux kernel and gcc that can be turned
on. They are turned off because in normal usage they don't catch
anything, and make things much slower.

> Ok. But I still haven't heard why after we *know* we didn't shut down, we
> don't do a full check by default, and let the user optionally do a limited
> quick check. Sounds backwards to me.

Without journalling, I regularly got into a real mess on unexpected
power downs. I seem to remember there being a greater than 50% chance
that I'd lose some files. So a full fsck was certainly worthwhile.

With journalling, corruption on crash is very rare. The exact figures
vary (and I doubt are collected).

Corruption can happen anyway. So why not fsck every single boot? (That
actually sounds right for you, given your values for time and risk. [1])
I just don't think corruption is *particularly* more likely after a
system crash.

I *do* think that fscking every few boots is worthwhile anyway.

James.

[1] That's not intended to be negative. Everyone has their own attitude
to the value of data and the value of time.

-- 
E-mail address: james | They say the heat and the flies here can drive a man
@westexe.demon.co.uk  | insane. But you don't have to believe that, and nor
                      | does that bright mauve elephant that just cycled
                      | past.  -- Terry Pratchett, The Last Continent




More information about the fedora-list mailing list