[Linux-cluster] Node fenced when mounting gfs

Wed Jun 8 05:27:34 UTC 2005

Hi,

I have a two-node cluster running GFS from RHEL4 cvs tag (pulled on June 
1st).
I have several GFS LVMs, one of them now uses 16GB storage and 371659 
inode (from df -k and df -i).
Other GFS LVMs uses less inodes.

When only one node running, all is OK, I can mount and access all GFS LVMs.
The problem is, when I have both node running and I try to mount that 
particular LVM, the node that tries to mount it (lincluster2) has these 
messages on syslog :

Jun  1 02:46:41 lincluster2 GFS: Trying to join cluster "lock_dlm", 
"lincluster:newapp"
Jun  1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: Joined 
cluster. Now mounting FS...
Jun  1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: jid=1: Trying 
to acquire journal lock...
Jun  1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: jid=1: 
Looking at journal...
Jun  1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: jid=1: Done
Jun  1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: Scanning for 
log elements...

and it seems to hang. I assume it uses all CPU power to scan log elements.
That would've been OK, if only the other node could know that it was 
still alive. Problem is it doesn't.

Jun  1 02:51:33 lincluster1 kernel: CMAN: removing node lincluster2 from 
the cluster : Missed too many heartbeats
Jun  1 02:51:33 lincluster1 fenced[4365]: lincluster2 not a cluster 
member after 0 sec post_fail_delay
Jun  1 02:51:33 lincluster1 fenced[4365]: fencing node "lincluster2"
Jun  1 02:51:38 lincluster1 fenced[4365]: fence "lincluster2" success

It seems that lincluster2 is too busy scanning log elements that it 
cannot even send CMAN heartbeat. Which makes lincluster1 thinks 
lincluster2 is dead, thus fencing it, and rebooting it.

Any ideas how to fix this?

Regards,

Fajar