FSCK of corrupted ext3 filesystem

Fri Aug 19 12:32:29 UTC 2005

On May 23 Darryl Bond wrote:

> I have a 1.3TB ext3 filesystem that has been in service for about 3 months.
> About 6 days ago the Emulex fibrechannel controller logged a SCSI error and 
> the filesystem changed to RO.
> It appears that the filesystem instantly changes to RO and prevents the 
> journal from working, therefore invalidating the filesystem.
> The filesystem was unmounted and a remount was attempted. The mount failed due 
> to errors and an fsck came up with errors.
>
> Top output looks like this:
>
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 4562 root          25   0  780m   214m  236 R 99.9         42.6   6211:44 
> fsck.ext3

I'm seeing something rather similar, and not for the first time :-\

The MD layer failed a drive (on a 3ware Escalade card), but somehow the fs 
got wind of this and aborted the journal.

My fsck is on an Opteron, it's entirely CPU-bound, occupying about 1.4G of 
my 2G RAM, stuck in pass 2 six days in. My strace isn't picking up any 
calls.

My question is basically the same as Darryl's. How long do I give it?

(I did SIGKILL an earlier invocation as I hadn't passed the "-y" option.)

As my volume is all backup data, I'm willing to poke at it with debugfs if 
people on this list think it's worth a try. Maybe I can mark it as not 
having errors, and try to mount it? Or maybe there's a way of making fsck 
less thorough?

I don't like the idea of not having backups for more than a week. What I 
did last time this happened was to run mke2fs and start again from 
scratch. Can I do better this time?

Matt