[Linux-cluster] CMAN panicing...

HAWKER, Dan dan.hawker at astrium.eads.net
Mon Dec 11 14:41:46 UTC 2006



Hi All,

Have been testing a RH cluster for a month or so and was working fine.
Predictably enough, now that its in production, its playing up a bit.

Have 3 nodes setup as a 3-node cluster. Am only using it for GFS. We don't
run any *real* cluster apps. The nodes are Gonzo, Kermit & Voyager. The
shared storage is an EMC iSCSI unit, each node has a Qlogic HBA inside.

They seem to be having similar error messages. Almost as if they are
simultaneously fencing eachother off and hence causing problems.

I have attached a jpg screen grab of Kermit, (Voyager auto-rebooted when
this happened) and the /var/log/messages from the node that survived.

Any ideas what is causing this, and more importantly a direction I can aim
at to fix it...

TIA

Dan

#############
/var/log/messages on Gonzo
Dec 11 13:52:57 gonzo kernel: CMAN: removing node voyager.poc from the
cluster : Missed too many heartbeats
Dec 11 13:53:03 gonzo kernel: CMAN: removing node kermit.poc from the
cluster : No response to messages
Dec 11 13:53:09 gonzo kernel: CMAN: quorum lost, blocking activity
Dec 11 13:57:27 gonzo kernel: CMAN: node voyager.poc rejoining
Dec 11 13:57:27 gonzo kernel: CMAN: quorum regained, resuming activity
#############



This email (including any attachments) may contain confidential and/or privileged information or information otherwise protected from disclosure.
If you are not the intended recipient, please notify the sender immediately, do not copy this message or any attachments and do not use it for any purpose or disclose its content to any person, but delete this message and any attachments from your system.
Astrium disclaims any and all liability if this email transmission was virus corrupted, altered or falsified.
---------------------------------------------------------------------
Astrium Limited, Registered in England and Wales No. 2449259
Registered Office: Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS, England
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.jpg
Type: image/jpeg
Size: 48318 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061211/5b74ce8d/attachment.jpg>


More information about the Linux-cluster mailing list