Very slow ext3 fsck
Matt Bernstein
mb/ext3 at dcs.qmul.ac.uk
Fri Feb 23 07:21:03 UTC 2007
On Feb 22 Theodore Tso wrote:
> On Thu, Feb 22, 2007 at 10:34:26AM +0000, Jeremy Sanders wrote:
>> We have an ext3 file system which is 3.5TB in size (on top of lvm). Free are
>> 172049011 out of 854473728 4096K blocks, and 396540654 out of 427245568
>> inodes. This is using Scientific Linux 4.4 (a RHEL clone). The filesystem
>> consists of multiple backups created with rsync using --link-dest, which
>> hard links files which haven't been modified to the previous copy. There
>> are several hundred days worth of these backups.
I have had this exact same problem with this exact same set-up (though it
was FC5/x86_64 on a 1.5T volume) just under a year ago.
>> I decided to fsck the file system, but unfortunately fsck is extremely slow.
>> It has been going now for 67 hours and appears to be completely cpu bound
>> (no obvious disk access) and stuck at the "Pass 2: Checking directory
>> structure" stage. It doesn't respond to a normal kill or ctrl+c.
I sent Ted an e2image of the fs (which admittedly was huge), but suspect
he didn't have time or resource to see what was going on.
> Did you run fsck out of a command-line? It should respond to a normal
> kill or ctrl-c. If it isn't I have to wonder whether the device
> driver is locked up for some reason.
It's definitely fsck being confused; I also observed it wasn't making any
syscalls. We both have large numbers of files with high link counts. I
found I _could_ fsck the volume in a couple of hours if I had less than
(IIRC) about 50 days' backups, but at least at some point after that fsck
would stick in pass 2 for more than a week--at which point I gave up,
trashed the fs (since my fsck was necessitated by hardware failure) and
started again. You could mount the fs and punt last night's backup to a
pristine fs and fsck that if you have the terabytes available.
> Also, how much memory do you have? 3.5TB is pretty big, and if you
> don't have enough memory, it could just simply be a matter of the
> system paging its brains out.
In my case the process was 1.6G on a 2G machine. No paging. Definitely
e2fsck CPU-bound.
HTH
Matt
More information about the Ext3-users
mailing list