Re: FSCK of corrupted ext3 filesystem

On May 23 Darryl Bond wrote:

I have a 1.3TB ext3 filesystem that has been in service for about 3 months.
About 6 days ago the Emulex fibrechannel controller logged a SCSI error and the filesystem changed to RO.
It appears that the filesystem instantly changes to RO and prevents the journal from working, therefore invalidating the filesystem.
The filesystem was unmounted and a remount was attempted. The mount failed due to errors and an fsck came up with errors.

Top output looks like this:

4562 root 25 0 780m 214m 236 R 99.9 42.6 6211:44 fsck.ext3

I'm seeing something rather similar, and not for the first time :-\

The MD layer failed a drive (on a 3ware Escalade card), but somehow the fs got wind of this and aborted the journal.

My fsck is on an Opteron, it's entirely CPU-bound, occupying about 1.4G of my 2G RAM, stuck in pass 2 six days in. My strace isn't picking up any calls.

My question is basically the same as Darryl's. How long do I give it?

(I did SIGKILL an earlier invocation as I hadn't passed the "-y" option.)

As my volume is all backup data, I'm willing to poke at it with debugfs if people on this list think it's worth a try. Maybe I can mark it as not having errors, and try to mount it? Or maybe there's a way of making fsck less thorough?

I don't like the idea of not having backups for more than a week. What I did last time this happened was to run mke2fs and start again from scratch. Can I do better this time?


