Retaining undelete data on ext3
tytso at mit.edu
Mon Oct 9 03:12:09 UTC 2006
On Sun, Oct 08, 2006 at 11:38:22PM +0200, Bodo Thiesen wrote:
> BTW: When I talked about a transaction I
> obviously meant something different than you, on the other hand that was my
> fault. What I meant with transaction is something like an atom. Moving a
> file from directory A to directory B needs (at least) four updates, the
> inodes of the directories and the directory data blocks. I would say, that
> this update is one transaction. But you would say, that is only a part of a
> transaction, as you would put deletion of another file, writing some data
> to an iso image and whatever else in the same transaction. So, just replace
> my "transactions" by "transaction atoms", and then read again, what I
> wrote, maybe that makes my idea more clearer.
Ah, but that brings up the other problem; which is for a really big
file, your "transaction atom" might not fit in a single "transaction".
Remember, it's not just about keeping the inode, indirect block,
double indirect, and triple indirect blocks up to date; it's also
about all of those block allocation bitmaps; and for a big file, the
number of block bitmaps you might have to touch can grow very large
indeed. If the number of blocks that have to be touched during the
unlink is larger than the space left for the journal, then we have to
write a consistent snapshot of the inode, indirect, double indirect,
and triple indirect blocks, plus all of the block bitmaps. And if you
try to "restore" the blocks afterwards, that's potentially an extra
block that needs to be journaled in the new transaction, and getting
that all right is more than a little bit tricky.
Now, the good news is that we are using bforget in journal_forget now,
and that at least some of the time, restoring the i_blocks pointers
will allow the inode to be recovered --- although if the unlink
operation takes multiple transactions, you won't get the entire inode
recovered that way.
The bottom line is the interaction of truncate and journalling gets
tricky, if you want it to be 100% reliable. If you're willing to
settle for "mostly working", it's probably not that hard.
More information about the Ext3-users