FSCK of corrupted ext3 filesystem
Matt Bernstein
mb/ext3 at dcs.qmul.ac.uk
Fri Aug 19 12:32:29 UTC 2005
On May 23 Darryl Bond wrote:
> I have a 1.3TB ext3 filesystem that has been in service for about 3 months.
> About 6 days ago the Emulex fibrechannel controller logged a SCSI error and
> the filesystem changed to RO.
> It appears that the filesystem instantly changes to RO and prevents the
> journal from working, therefore invalidating the filesystem.
> The filesystem was unmounted and a remount was attempted. The mount failed due
> to errors and an fsck came up with errors.
>
> Top output looks like this:
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4562 root 25 0 780m 214m 236 R 99.9 42.6 6211:44
> fsck.ext3
I'm seeing something rather similar, and not for the first time :-\
The MD layer failed a drive (on a 3ware Escalade card), but somehow the fs
got wind of this and aborted the journal.
My fsck is on an Opteron, it's entirely CPU-bound, occupying about 1.4G of
my 2G RAM, stuck in pass 2 six days in. My strace isn't picking up any
calls.
My question is basically the same as Darryl's. How long do I give it?
(I did SIGKILL an earlier invocation as I hadn't passed the "-y" option.)
As my volume is all backup data, I'm willing to poke at it with debugfs if
people on this list think it's worth a try. Maybe I can mark it as not
having errors, and try to mount it? Or maybe there's a way of making fsck
less thorough?
I don't like the idea of not having backups for more than a week. What I
did last time this happened was to run mke2fs and start again from
scratch. Can I do better this time?
Matt
More information about the Ext3-users
mailing list