[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Very slow ext3 fsck

On Feb 22 Theodore Tso wrote:

On Thu, Feb 22, 2007 at 10:34:26AM +0000, Jeremy Sanders wrote:
We have an ext3 file system which is 3.5TB in size (on top of lvm). Free are
172049011 out of 854473728 4096K blocks, and 396540654 out of 427245568
inodes. This is using Scientific Linux 4.4 (a RHEL clone). The filesystem
consists of multiple backups created with rsync using --link-dest, which
hard links files which haven't been modified to the previous copy. There
are several hundred days worth of these backups.

I have had this exact same problem with this exact same set-up (though it was FC5/x86_64 on a 1.5T volume) just under a year ago.

I decided to fsck the file system, but unfortunately fsck is extremely slow.
It has been going now for 67 hours and appears to be completely cpu bound
(no obvious disk access) and stuck at the "Pass 2: Checking directory
structure" stage. It doesn't respond to a normal kill or ctrl+c.

I sent Ted an e2image of the fs (which admittedly was huge), but suspect he didn't have time or resource to see what was going on.

Did you run fsck out of a command-line?  It should respond to a normal
kill or ctrl-c.  If it isn't I have to wonder whether the device
driver is locked up for some reason.

It's definitely fsck being confused; I also observed it wasn't making any syscalls. We both have large numbers of files with high link counts. I found I _could_ fsck the volume in a couple of hours if I had less than (IIRC) about 50 days' backups, but at least at some point after that fsck would stick in pass 2 for more than a week--at which point I gave up, trashed the fs (since my fsck was necessitated by hardware failure) and started again. You could mount the fs and punt last night's backup to a pristine fs and fsck that if you have the terabytes available.

Also, how much memory do you have?  3.5TB is pretty big, and if you
don't have enough memory, it could just simply be a matter of the
system paging its brains out.

In my case the process was 1.6G on a 2G machine. No paging. Definitely e2fsck CPU-bound.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]