[Linux-cluster] CS5 / about loop "Node is undead"
lhh at redhat.com
Mon Jun 9 20:25:40 UTC 2008
On Wed, 2008-06-04 at 14:47 +0200, Alain Moulle wrote:
> About my problem of node entering a loop :
> Jun 3 15:54:49 s_sys at xn2 qdiskd: <notice> Writing eviction notice for node 1
> Jun 3 15:54:50 s_sys at xn2 qdiskd: <notice> Node 1 evicted
> Jun 3 15:54:51 s_sys at xn2 qdiskd: <crit> Node 1 is undead.
> I notice that just before entering this loop, I have a message :
> Jun 3 15:54:47 s_sys at xn2 fenced: fencing node "xn1"
> Jun 3 15:54:48 s_sys at xn2 qdiskd: <info> Assuming master role
> but never the message :
> Jun 3 15:54:47 s_sys at xn2 fenced: fence "xn1" success
> Nethertheless, the service of xn1 is well failovered by xn2, but
> then after the reboot of xn1, we can't start again the CS5 due
> to the problem of infernal loop "Node is undead" on xn2.
> whereas when it works correctly, both messages :
> fencing node "xn1"
> fence "xn1" success
> are successive (after about 30s)
> So my question is : could this pb of infernal loop "Node is undead"
> be systematically due to a failed fencing phase of xn2 towards xn1 ?
> PS: note that I have applied patch :
Yes. If qdiskd thinks the node is dead and the node started writing to
the disk again (which is what fencing should prevent), it will display
More information about the Linux-cluster