Very slow ext3 fsck

Matt Bernstein mb/ext3 at dcs.qmul.ac.uk
Fri Feb 23 07:21:03 UTC 2007


On Feb 22 Theodore Tso wrote:

> On Thu, Feb 22, 2007 at 10:34:26AM +0000, Jeremy Sanders wrote:
>> We have an ext3 file system which is 3.5TB in size (on top of lvm). Free are
>> 172049011 out of 854473728 4096K blocks, and 396540654 out of 427245568
>> inodes. This is using Scientific Linux 4.4 (a RHEL clone). The filesystem
>> consists of multiple backups created with rsync using --link-dest, which
>> hard links files which haven't been modified to the previous copy. There
>> are several hundred days worth of these backups.

I have had this exact same problem with this exact same set-up (though it 
was FC5/x86_64 on a 1.5T volume) just under a year ago.

>> I decided to fsck the file system, but unfortunately fsck is extremely slow.
>> It has been going now for 67 hours and appears to be completely cpu bound
>> (no obvious disk access) and stuck at the "Pass 2: Checking directory
>> structure" stage. It doesn't respond to a normal kill or ctrl+c.

I sent Ted an e2image of the fs (which admittedly was huge), but suspect 
he didn't have time or resource to see what was going on.

> Did you run fsck out of a command-line?  It should respond to a normal
> kill or ctrl-c.  If it isn't I have to wonder whether the device
> driver is locked up for some reason.

It's definitely fsck being confused; I also observed it wasn't making any 
syscalls. We both have large numbers of files with high link counts. I 
found I _could_ fsck the volume in a couple of hours if I had less than 
(IIRC) about 50 days' backups, but at least at some point after that fsck 
would stick in pass 2 for more than a week--at which point I gave up, 
trashed the fs (since my fsck was necessitated by hardware failure) and 
started again. You could mount the fs and punt last night's backup to a 
pristine fs and fsck that if you have the terabytes available.

> Also, how much memory do you have?  3.5TB is pretty big, and if you
> don't have enough memory, it could just simply be a matter of the
> system paging its brains out.

In my case the process was 1.6G on a 2G machine. No paging. Definitely 
e2fsck CPU-bound.

HTH

Matt




More information about the Ext3-users mailing list