[long] major problems on fs; e2fsck running out of memory

Mon Jun 2 21:04:52 UTC 2014

* "Theodore Ts'o" <tytso at mit.edu> hat geschrieben:

Hi Theodore.

> That being said, it's pretty clear that portions of the inode table
> and block group descriptor was badly corrupted. [...]

Keith is not the first one with problems of this class and he will
probably not be the last one. He later told us, that at first, mounting
the file system still worked. And that acually means (taking low level
software errors or hardware errors out of the equation), that actually
e2fsck created the current situation. In my opinion, e2fsck has one major
flaw actually causing this sort of troubles:

e2fsck tries to fix the errors as it actually finds them. That's bad,
because at that point it's still unclear, whether the problem can be
safely fixed yet. So, the thing e2fsck SHOULD do is:

1. Scan the file system for all errors, remember the errors BUT DON'T
   TOUCH ANYTHING.
2. Once all errors (including differences in allocation bitmaps) have been
   collected, it should then first summarize the bugs (like: 100 times
   checksum errors, 56000 times illegal pointers etc) and then ask, what
   to do.
3. Actually fix the errors one-by-one taking into account calculated
   allocation bitmaps (instead of the ones stored in the file system).
   Some errors have to be fixed before other ones, resolving multiple used
   clusters being the first kind of errors to be fixed).

This would not only allow the user to cancel at this point with no changes
to the file system being done yet, it would also allow e2fsck to make
sure, that newly allocated clusters will always go to clusters, which are
actually not in use.

What do you think about this?

Regards, Bodo