[Linux-cluster] GFS filesystem hangs

Thu Nov 15 08:40:35 UTC 2007

We have a a GFS filesystem (1 of 100 on this server in particular) that will consistently hang. I haven't identified the circumstances around it, but there is some speculation that it may occur during heavy usage, though that isn't for sure.  When this happens, the load average on the system skyrockets.

The mountpoint is /omni_mnt/clients/j2

When I say hang, cd sometimes hangs, ls will hang, etc. Programs and file operations certainly hang. Sometimes just cd'ing into the mountpoint, other times into a large subdirectory.

ie:

# cd /omni_mnt/clients/j2
root at hlpom500:[/omni_mnt/clients/j2 <mailto:root at hlpom500:[/omni_mnt/clients/j2> ]
# ls
<normal output>
root at hlpom500:[/omni_mnt/clients/j2 <mailto:root at hlpom500:[/omni_mnt/clients/j2> ]
# cd stmt
root at hlpom500:[/omni_mnt/clients/j2/stmt <mailto:root at hlpom500:[/omni_mnt/clients/j2/stmt> ]
# ls

<hangs here, shell must be killed>

In the past, shutting down and rebooting the 2 systems that mount this gfs has cleared the issue.

Info:

RHEL ES 4 u5
kernel 2.6.9-55.0.2.ELsmp
GFS 2.6.9-72.2.0.2

Not sure what is helpful. but here are some outputs from the system while the fs was hung.. I have a lockdump also, but it is 4,650 lines. I can send it along if needed. Any suggestions on data to gather in the future are welcomed.

Thanks!

Brian Fair

gfs_tool gettune ************************************************************************

ilimit1 = 100
ilimit1_tries = 3
ilimit1_min = 1
ilimit2 = 500
ilimit2_tries = 10
ilimit2_min = 3
demote_secs = 300
incore_log_blocks = 1024
jindex_refresh_secs = 60
depend_secs = 60
scand_secs = 5
recoverd_secs = 60
logd_secs = 1
quotad_secs = 5
inoded_secs = 15
glock_purge = 0
quota_simul_sync = 64
quota_warn_period = 10
atime_quantum = 3600
quota_quantum = 60
quota_scale = 1.0000   (1, 1)
quota_enforce = 1
quota_account = 1
new_files_jdata = 0
new_files_directio = 0
max_atomic_write = 4194304
max_readahead = 262144
lockdump_size = 131072
stall_secs = 600
complain_secs = 10
reclaim_limit = 5000
entries_per_readdir = 32
prefetch_secs = 10
statfs_slots = 64
max_mhc = 10000
greedy_default = 100
greedy_quantum = 25
greedy_max = 250
rgrp_try_threshold = 100
statfs_fast = 0

gfs_tool counters ************************************************************************

                                  locks 246
                             locks held 127
                           freeze count 0
                          incore inodes 101
                       metadata buffers 4
                        unlinked inodes 2
                              quota IDs 3
                     incore log buffers 0
                         log space used 0.05%
              meta header cache entries 0
                     glock dependencies 0
                 glocks on reclaim list 0
                              log wraps 85
                   outstanding LM calls 2
                  outstanding BIO calls 0
                       fh2dentry misses 0
                       glocks reclaimed 1316856
                         glock nq calls 194073094
                         glock dq calls 193851427
                   glock prefetch calls 102749
                          lm_lock calls 903612
                        lm_unlock calls 833348
                           lm callbacks 1769983
                     address operations 71707236
                      dentry operations 23750382
                      export operations 0
                        file operations 139487453
                       inode operations 38356847
                       super operations 110620113
                          vm operations 1052447
                        block I/O reads 241669
                       block I/O writes 3295626