e2fsck hanging

Brian Davidson bdavids1 at gmu.edu
Mon Mar 12 21:40:48 UTC 2007


I'm trying to run e2fsck on a ~6TB filesystem which is about 90%  
full.  We're doing backup to disk to this filesystem, and have a  
number of hard links (link counts up to 90).

strace shows:

write(1, "Pass 2: Checking ", 17)       = 17
write(1, "directory", 9)                = 9
write(1, " structure\n", 11)            = 11
mmap(NULL, 91574272, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,  
-1, 0) = 0x2b4299dbd000
mmap(NULL, 91574272, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,  
-1, 0) = 0x2b429f512000
mmap(NULL, 506724352, PROT_READ|PROT_WRITE, MAP_PRIVATE| 
MAP_ANONYMOUS, -1, 0) = 0x2b42a4c67000
mmap(NULL, 596029440, PROT_READ|PROT_WRITE, MAP_PRIVATE| 
MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x23e56000)                         = 0x5eb000
mmap(NULL, 596164608, PROT_READ|PROT_WRITE, MAP_PRIVATE| 
MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS| 
MAP_NORESERVE, -1, 0) = 0x2b430a09e000
munmap(0x2b430a09e000, 401408)          = 0
munmap(0x2b430a200000, 647168)          = 0
mprotect(0x2b430a100000, 135168, PROT_READ|PROT_WRITE) = 0
mmap(NULL, 596029440, PROT_READ|PROT_WRITE, MAP_PRIVATE| 
MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
lseek(3, 6303744, SEEK_SET)             = 6303744
read(3, "\2\0\0\0\f\0\1\2.\0\0\0\2\0\0\0\f\0\2\2..\0\0\v\0\0\0 
\24"..., 4096) = 4096
lseek(3, 6307840, SEEK_SET)             = 6307840
read(3, "\v\0\0\0\f\0\1\2.\0\0\0\2\0\0\0\364\17\2\2..\0\0\0\0\0"...,  
4096) = 4096
lseek(3, 6311936, SEEK_SET)             = 6311936
read(3, "\0\0\0\0\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,  
4096) = 4096
lseek(3, 6316032, SEEK_SET)             = 6316032
read(3, "\0\0\0\0\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,  
4096) = 4096
lseek(3, 6320128, SEEK_SET)             = 6320128
read(3, "\0\0\0\0\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,  
4096) = 4096
lseek(3, 41709568, SEEK_SET)            = 41709568
read(3, "\323\0\0\0\f\0\1\2.\0\0\0\226\2\252+\f\0\2\2..\0\0\324"...,  
4096) = 4096
lseek(3, 41713664, SEEK_SET)            = 41713664
read(3, "\324\0\0\0\f\0\1\2.\0\0\0\323\0\0\0\f\0\2\2..\0\0\214 
\300"..., 4096) = 4096
lseek(3, 41717760, SEEK_SET)            = 41717760
read(3, "\325\0\0\0\f\0\1\2.\0\0\0\226\2\252+\f\0\2\2..\0\0\326"...,  
4096) = 4096

And, that's it.  No more output.

A backtrace from gdb shows:

(gdb) bt
#0  0x0000000000418aa5 in get_icount_el (icount=0x5cf170,
ino=732562070, create=1) at icount.c:251
#1  0x0000000000418dd7 in ext2fs_icount_increment (icount=0x5cf170,
ino=732562070, ret=0x7fffffa79a96)
      at icount.c:339
#2  0x000000000040a3cf in check_dir_block (fs=0x5af560,
db=0x2b7070cc6064, priv_data=0x7fffffa79c90) at pass2.c:1021
#3  0x0000000000416c69 in ext2fs_dblist_iterate (dblist=0x5c3f20,
func=0x409980 <check_dir_block>,
      priv_data=0x7fffffa79c90) at dblist.c:234
#4  0x0000000000408d9d in e2fsck_pass2 (ctx=0x5ae700) at pass2.c:149
#5  0x0000000000403102 in e2fsck_run (ctx=0x5ae700) at e2fsck.c:193
#6  0x0000000000401e50 in main (argc=Variable "argc" is not available.
) at unix.c:1075


It's stuck inside the while loop in get_icount_el() (line 251).

I've added more memory to the server (up to 6 GB now), and am re- 
running e2fsck.  Additionally, I upped /proc/sys/vm/max_map_count to  
20,000,000 (just pulled that number out of the air).  It takes 6 or 7  
hours to get the part where it locks up, so I'm not sure if this is  
going to help or not.  I figured while it's running I would post here  
to see if anyone has any additional insights.

Thanks!

Brian Davidson
George Mason University




More information about the Ext3-users mailing list