[Linux-cluster] fenced spinning?

Jeff Sturm jeff.sturm at eprize.com
Fri Nov 27 16:46:32 UTC 2009


Found the bug report for this:

 

https://bugzilla.redhat.com/show_bug.cgi?id=444529

 

It has been fixed, but not in my version.  I need to determine whether I
can simply fence the affected nodes without compromising the cluster
(since the fence daemon itself is affected).  Since our production
cluster is currently stable, I'll probably try this on a test cluster.

 

Later we'll attempt a rolling upgrade of the cluster to get the bug fix.

 

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Sturm
Sent: Wednesday, November 25, 2009 7:44 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] fenced spinning?

 

CentOS 5.2, 26-node cluster.

 

Today I restarted one node.  It left the cluster, rebooted and joined
the cluster without incident.  Everything is fine but... fenced has the
CPU pegged.

 

No useful log messages.  strace says it is spinning on poll/recvfrom:

 

poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN, revents=POLLIN},
{fd=7, events=POLLIN}, {fd=8, events=POLLIN, revents=POLLNVAL}], 4, -1)
= 2

recvfrom(5, 0x7fffb074ab40, 20, 64, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)

poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN, revents=POLLIN},
{fd=7, events=POLLIN}, {fd=8, events=POLLIN, revents=POLLNVAL}], 4, -1)
= 2

recvfrom(5, 0x7fffb074ab40, 20, 64, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)

 

Anything else useful I can do to diagnose?  What are the chances I can
recover this node nicely without making things worse?

 

Any help/ideas appreciated,

 

Jeff

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091127/68f3accb/attachment.htm>


More information about the Linux-cluster mailing list