[Linux-cluster] GFS directory freezing unexpectedly under pressure ...

Sun Nov 4 14:51:43 UTC 2007

Hi everybody,

    when my GFS (v. 1) filesystems experience some heavy load (ex. 
vmware virtual machine OS installation, oracle rman backup using gfs 
filesystems as flash recovery area), they "freeze" unexpectedly.
    More precisely, not the whole gfs filesystem freezes, but only the 
directory interested by the load: I can't even ls the contents of that 
directory, everything interesting it seems to hang hopelessly. I can't 
find any related errors in my logs; cluster utilities output (clustat, 
group_tool -v, cman_tool nodes) looks absolutely normal ... (no fencing 
is occurring): I can even go on working on the other directories of the 
same gfs!!!
The only way out I've found is to restart the cluster. I can reproduce 
deterministically the problem, but I don't know how to debug it.

   I noted that the problem arises on both my 2-node and my 1-node 
cluster, either when I mount gfs with 'noquota,noatime' or not.

   Your help is my hope:
   thank you in advance!
   Tyzan

___________________________________________________________________________________________________ 

Linux xxxxxxxxxxxxxxxx 2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 
2007 x86_64 x86_64 x86_64 GNU/Linux

lvm2-cluster-2.02.16-3.el5
kmod-gfs-0.1.16-5.2.6.18_8.1.8.el5
gfs2-utils-0.1.25-1.el5
gfs-utils-0.1.11-3.el5
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5

[root at orarac1 ~]# gfs_tool gettune /share
ilimit1 = 100
ilimit1_tries = 3
ilimit1_min = 1
ilimit2 = 500
ilimit2_tries = 10
ilimit2_min = 3
demote_secs = 300
incore_log_blocks = 1024
jindex_refresh_secs = 60
depend_secs = 60
scand_secs = 5
recoverd_secs = 60
logd_secs = 1
quotad_secs = 5
inoded_secs = 15
quota_simul_sync = 64
quota_warn_period = 10
atime_quantum = 3600
quota_quantum = 60
quota_scale = 1.0000   (1, 1)
quota_enforce = 1
quota_account = 1
new_files_jdata = 0
new_files_directio = 0
max_atomic_write = 4194304
max_readahead = 262144
lockdump_size = 131072
stall_secs = 600
complain_secs = 10
reclaim_limit = 5000
entries_per_readdir = 32
prefetch_secs = 10
statfs_slots = 64
max_mhc = 10000
greedy_default = 100
greedy_quantum = 25
greedy_max = 250
rgrp_try_threshold = 100