Forced FSCK on Bad Reboot

Fri Jul 15 01:20:22 UTC 2005

James Wilkinson wrote:

>I wrote:
>  
>
>>Ext3 is a "journalling" filesystem (as are Reiser, xfs, jfs, and NTFS).
>>Linux keeps a journal on disk of what it's doing, and makes sure that
>>the journal reaches disk before it makes any changes to the filesystem
>>structure on disk.
>>    
>>
>
>Mike McCarty wrote:
>  
>
>>Hmm. So if a power failure occurs during the update of the journal, the disc
>>is corrupted anyway.
>>    
>>
>
>Erm... no.
>
>Imagine a journal as being a bit like a paper tape. You can read
>anywhere, but you can only meaningfully add to the end of the journal.
>So if a power failure occurs during update, and you get corruption
>there, then you know that's the end of the journal, and you can treat it
>as though the update never happens.
>  
>
This reminds me of the loooooong debates with newbies over
metastable states. They always think that adding another layer
of flip/flops to synchronize will make the problem go away.

>Look at it this way (in a fixed-width font):
>
>time --->
>        A                   B                   C
>update journal,     |  update disks   | update journal,
>open transaction    |                 | close transaction.
>
>If the power dies at all during A, and the journal is corrupt, then
>Linux knows the update can never have happened, and the filesystem
>structure itself is good.
>  
>
No, incorrect. If the power dies during A, Linux *may be able to detect
the corruption*.

Let us, for the moment, lay aside the fact that corruption can mimic correct
journal entries. This is a very low probability event, but one which can 
occur.

Let's just consider another eventuality.

No disc write is currently taking place. All system buffers have been
written to disc. The system is quiescent. All cache is coherent.

A power brown-out takes place, and the processor on the disc goes crazy
and scribbles on a few tracks, then the low voltage monitor resets the
disc uController, and the processor.
Power comes back, and we start to come up.
Guess what?
The journal indicates that the disc is in good shape, no outstanding or
partially completed writes.

So, using the "indestructable" ext3 file system, with its "incorruptible"
journal, we proceed to boot.

BTW, what I just described is a known failure mode on a commonly-
used disc uController. If the LVM or Power Good line bounces a few
times in a space of a hundred milliseconds or so, there are several discs
which are known to scribble on the platters. Usually the scribbling takes
place near the landing zone, but can be anywhere.

[snip]

>>It's like a COBOL programmer back in the bad old
>>days, who claimed that, since he always used databases which had
>>journals and a separate commit call, his databases could never get
>>corrupted. I argued and argued with this guy. Sadly, one day he found out
>>I was correct, and had no recovery plan.
>>    
>>
>
>Like I say: all hardware sucks, all software sucks. COBOL guy ought to
>have had at least two backup plans.
>  
>
But your argument sounds exactly like his.
He had a backup. He did weekly backups.
He didn't need more often, because the database "couldn't" get
corrupted. And the disc wasn't going to die anytime soon.
Disc early mortality was something he was willing to lose
some data over, because these were high-quality discs
which did self-diagnostics constantly.

[snip]

>And if the power goes, Linux just replays the journal.
>  
>
Which indicates that the disc is in fine shape, although it might
be a real mess.

[snip]

>  
>
>>Reading between the lines,
>>I'll guess that what you are saying is ext3 uses a lot of disc cache with
>>write-back rather than write-through policy,
>>    
>>
>
>That's normal on practically all OSes these days: it seriously helps
>performance.
>  
>
MMM? Define "performance". I don't consider unnoticed data losss performing.

>  
>
>>and journals what it has done
>>to the memory copy. Thus unwritten system buffers at power down don't
>>corrupt the disc.
>>
>>Frankly, I'd rather use write-through.
>>    
>>
>
>Possibly. It can seriously slow down disk operations. Note that ext2, at
>least by default, does *not* use write-through.
>  
>
Ok. If I had an ext2 system, I'd also use write-through.

>  
>
>>In any case, I don't see any argument for not using an extended fsck on
>>a reboot after improper shutdown, which was my original question.
>>    
>>
>
>It sounds to me as though you've got lots of experience with computers,
>and have an accordingly low opinion of their reliability. As a result,
>you want data-critical stuff to be relatively simple and obviously safe,
>and all possible checks to take place.
>
>It's just ... modern computers *aren't* simple. Eventually, you have to
>treat some of this stuff as black boxes, and rely on its own internal
>error checking.
>
>  
>
No. What I don't want happening is cross-linked files and nobody notices.
Data corruption I can handle. UNNOTICED data corruption is a whole
'nother ball of wax.

>Yes, filesystem corruption still happens. But it's not noticably more
>likely to happen as a result of improper shutdowns. Periodically running
>a full fsck is sensible anyway: read man tune2fs and look at the -c and
>-i options.
>
>A quick look at /etc/rc.d/rc.sysinit suggests that doing
>touch /forcefsck
>as root will force a full fsck at the next reboot.
>  
>
Ok. But I still haven't heard why after we *know* we didn't shut down, we
don't do a full check by default, and let the user optionally do a limited
quick check. Sounds backwards to me.

>You could always edit that script and /etc/init.d/halt so that
>/forcefsck is created at boot time and removed at normal shutdown.
>
>Note that edits to rc.sysinit or halt won't survive an RPM update of
>initscripts.
>
>Hope this helps,
>
>James.
>  
>

Glad to meet you.

Mike

-- 
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!