[Linux-cluster] Node hang

Manuel Bujan bujan at isqsolutions.com
Thu Feb 17 19:43:59 UTC 2005


Hello guys,

After 3 days of a heavy read/write test load one of our nodes crash with the 
following error:

Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0: fatal: 
invalid metadata block
Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0:   bh = 
13156295 (magic)
Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0:   function = 
gfs_get_data_buffer
Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0:   file = 
/usr/src/cluster/gfs-kernel/src/gfs/dio.c, line = 1328
Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0:   time = 
1108659988
Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0: about to 
withdraw from the cluster
Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0: waiting for 
outstanding I/O
Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0: telling LM 
to withdraw
Feb 17 12:06:35 atmail-1 kernel: lock_dlm: withdraw abandoned memory

We are mounting our GFS partition using the noatime option, and quotas has 
been disabled in order to improve performance. The aplications currently 
running are "postfix, apache, and Courier/Imap".

We are using the CVS version available on Feb 14 around 5:00 PM.

Any light with this matter ?
Is there any way to know which file exactly was trying to read or write the 
server when it crash based on the log ?

Regards
Bujan 




More information about the Linux-cluster mailing list