[Linux-cluster] rhel4u7 gfs locking up - unable to obtain cluster lock
Simmons, Dan A
jds at techma.com
Thu Mar 26 22:00:19 UTC 2009
Hi All,
I have a production Redhat 4u7 GFS cluster that has locked up 5 times in the
last week. The cluster consists of 12 nodes. 3 of the nodes run Oracle RAC
and the rest run home grown applications. The system has heavy read/write to
the shared gfs disks. The symptoms seem similar to those described in
bugzilla 247766 -- my cluster locks up and I am unable to do anything except
reboot the entire cluster. Prior to the system locking up I get an error in
/var/log/messages "unable to obtain cluster lock: connection timed out" on
one of the nodes but nothing else appears in the logs. There are 4 gfs
volumes. The current stats from the busiest volume are:
locks 68763
locks held 33981
incore inodes 33778
metadata buffers 210
unlinked inodes 0
quota IDs 5
incore log buffers 0
log space used 0.34%
meta header cache entries 0
glock dependencies 0
glocks on reclaim list 0
log wraps 17
outstanding LM calls 0
outstanding BIO calls 0
fh2dentry misses 0
glocks reclaimed 41083300
glock nq calls 39290298
glock dq calls 26025821
glock prefetch calls 34071947
lm_lock calls 54069538
lm_unlock calls 40805646
lm callbacks 94932089
address operations 2335588
dentry operations 4683578
export operations 0
file operations 3179652
inode operations 9595976
super operations 39907494
vm operations 0
block I/O reads 34785108
block I/O writes 344510
I would be grateful for any advice, especially regarding locks and tuning. I
am tempted to set the glock_purge to 50 as described as a fix for the RHEL4u4
locking problem but worry that this might screw things up worse.
The specifics for the system are as follows:
Rhel4u7 smp kernel 2.6.9-78.0.1
gfs 6.1.18-1
gfs-kernel-smp-2-6-9-80.9
rgmanager 1.9.80-1
cman 1.0.24-1
ccs 1.0.12-1
magma 1.0.8-1
magma-plugin 1.0.14-1
lvm2-cluster 2.02.37-3
fence 1.32.63-1
J. Dan
More information about the Linux-cluster
mailing list