[Linux-cluster] GFS cluster freezing after a few hours..

DRand at amnesty.org DRand at amnesty.org
Mon Oct 29 19:26:44 UTC 2007


Hi,

We've just setup a three node GFS cluster on Debian Etch using qlogic HBA 
against a SAN.

gfs_tool 1.03.00 (built Mar  8 2007 23:38:09)
Copyright (C) Red Hat, Inc.  2004-2005  All rights reserved.

Linux cms2 2.6.18-5-amd64 #1 SMP Tue Oct 2 20:37:02 UTC 2007 x86_64 
GNU/Linux

We start the cluster and it works fine for a while..

/sbin/lock_gulmd -n aicluster -s cms1,cms2,cmsqa
sleep 1
/bin/mount -t gfs -o acl /dev/sda /san

But eventually after hours or a day something freezes/hangs and we can't 
issue any commands like df/ls/du etc..

There is no evidence that anything is wrong though.. This command seems to 
show a working cluster right?

cmsqa:/home/alfresco# gulm_tool nodelist cms1
 Name: cms2
  ip    = ::ffff:192.168.1.139
  state = Logged in
  last state = Logged out
  mode = Slave
  missed beats = 0
  last beat = 1193685839882270
  delay avg = 10003803
  max delay = 755383848
 
 Name: cmsqa
  ip    = ::ffff:128.1.32.134
  state = Logged in
  last state = Logged out
  mode = Slave
  missed beats = 0
  last beat = 1193685841974801
  delay avg = 10003928
  max delay = 138560844
 
 Name: cms1
  ip    = ::ffff:192.168.1.137
  state = Logged in
  last state = Was Logged in
  mode = Master
  missed beats = 0
  last beat = 1193685842490217
  delay avg = 10003231
  max delay = 10007256


Any ideas? We need to reboot the boxes to get the cluster back.

Damon.
Working to protect human rights worldwide

DISCLAIMER
Internet communications are not secure and therefore Amnesty International Ltd does not accept legal responsibility for the contents of this message. If you are not the intended recipient you must not disclose or rely on the information in this e-mail. Any views or opinions presented are solely those of the author and do not necessarily represent those of Amnesty International Ltd unless specifically stated. Electronic communications including email might be monitored by Amnesty International Ltd. for operational or business reasons.

This message has been scanned for viruses by Postini.
www.postini.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20071029/246763e3/attachment.htm>


More information about the Linux-cluster mailing list