[Linux-cluster] GFS crash

David Teigland teigland at redhat.com
Tue Oct 4 17:06:05 UTC 2005


On Tue, Oct 04, 2005 at 09:27:58AM +1000, Chmouel Boudjnah wrote:
> Hello,
> 
> I had a crash on a server using GFS-6.1 with kernel 2.6.9-11.ELsmp, i am
> using GFS with an AOE SAN drive. 
> 
> I am not sure if the problem is with AOE SAN or with GFS would be great
> to tell me so i can redirect the bug report to the CORAID people.
> 
> So i have first in the logs some weird stuff about sataide (i am not
> sure if the SAN is using that) :
> 
> Sep 30 17:43:20 srv kernel: e send einval to 2
> Sep 30 17:43:20 srv kernel: sataide send einval to 2
> Sep 30 17:43:20 srv last message repeated 38 times
> Sep 30 17:43:20 srv kernel: sataide unlock ff050383 no id

The dlm is returning errors for both remote and local lock requests,
indicating that it doesn't know about any of the locks being requested.
That's often because the dlm was "shut down" by cman when cman lost its
connection to the cluster.  There are usually log messages from cman, too,
saying what has happened.

Is AOE using the same network as cman?  If so, you might try putting them
on two different networks.

> Sep 30 17:43:22 srv kernel: lock_dlm:  Assertion failed on line 353 of
> file /usr/src/build/574067-i686/BUILD/smp/src/dlm/lock.c
> Sep 30 17:43:22 srv kernel: lock_dlm:  assertion:  "!error"
> Sep 30 17:43:22 srv kernel: lock_dlm:  time = 2509316164
> Sep 30 17:43:22 srv kernel: sataide: error=-22 num=5,5bf2f1 lkf=801
> flags=84

This is the typical assertion failure you get when gfs can't acquire any
locks.

Dave




More information about the Linux-cluster mailing list