[Linux-cluster] qdiskd not properly failing nodes??

Lon Hohberger lhh at redhat.com
Wed Sep 13 21:46:30 UTC 2006


On Wed, 2006-09-13 at 15:40 -0400, Andrea Westervelt wrote:
> 
> 
> ______________________________________________________________________
> 
> Lon,
>  
> fenced is running and based on the manpage it seems like dropping
> below a score of ½ should cause a reboot? 

It currently expects the quorate partition (remember, this node is no
longer quorate) to fence the node rather than taking action itself.

>  I guess I am a little confused on what the heuristics/scoring are
> meant to do.  Can you explain the role of the master partition and
> what the expected outcome of an insufficient score should be?

The master node is a node with sufficient score to declare itself online
according to the heuristics that you supply in the qdisk configuration.
Assuming it maintains its score, it arbitrates what other nodes join the
"master" partition.  If a node becomes part of the master partition, the
node advertises quorum device votes to CMAN.

Insufficient scores should cause a node to remove itself from the master
partition and tell CMAN that the quorum device is offline.  This should
cause CMAN on a node in the qdisk master partition to fence the node
(assuming that this causes the node to transition from
quorate->inquorate).

I'm guessing what is happening here in your case is that CMAN is still
seeing the node - even though it's inquorate - and it's not fencing it
-- is that right?  A transition from quorate->inquorate should cause the
node to get fenced.

That sounds like a bug (pretty easy to fix, too).

-- Lon




More information about the Linux-cluster mailing list