[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: another seriously corrupt ext3 -- pesky journal

On Aug 21, 2003  15:28 -0400, Erez Zadok wrote:
> In message <20030821190811 GC1040 matchmail com>, Mike Fedyk writes:
> > There's no need to support it in the kernel.  The inode number is kept in
> > the superblock, and that's updated at mkfs and tune2fs time, not from the
> > kernel.
> > 
> > Also, there isn't a second inode, it's just that the inode number is being
> > kept in the superblocks too.
> How does the kernel know to write the journal data first to some data block
> belonging to inode X, and then to another data block of inode Y?  Both X and
> Y are journal inodes, right?  Will there be a reserved inum other than 8,
> for the backup journal?
> Is there some magic in which the kernel can identify any number of special
> journal inodes?
> And while we're at it, why only one backup journal inode?  Why not several?
> If it's good enough to have several copies of superblocks etc., then why not
> the journal (for those willing to pay the performance penalty)?

There are not, AFAICS, two copies of the journal being kept, which would
require kernel changes and cause an even larger performance hit for ext3.

Instead, the journal inode number is being kept in all of the backup
superblocks (I don't think it was in the past).  Secondly, there is a
new "backup journal inode" (also kept in the superblock + backups),
which I infer holds a duplicate of the blocks allocated to the journal.

Having only the inode i_blocks field duplicated in a backup inode means
that there is no (new) overhead writing to the journal, yet if the journal
inode itself gets corrupted (very possible because it shares the same disk
block with the root inode and is right at the beginning of the disk), we
have a chance to recover the journal data.  As a result, the journal itself
will very likely have backups of recently-written blocks and can "self heal"
from all sorts of nasty corruptions.

What would also be needed (not sure if this is implemented or not) is that
in the case of a corrupt superblock e2fsck assumes "needs_recovery" is set
if "has_journal" is set and the (backup) journal inode can be read, so that
the journal replay is actually done.  That will almost always result in the
primary superblock being restored from somewhere in the journal, along with
other useful things like bitmaps and such.

Cheers, Andreas
Andreas Dilger

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]