[Linux-cluster] multipathed quorum disk

Lon Hohberger lhh at redhat.com
Wed May 28 15:45:40 UTC 2008


On Wed, 2008-05-28 at 12:57 +1100, Darrin De Groot wrote:
> 
> Hi, 
> 
> I am running a 4 node cluster with a multipathed quorum disk,
> configured to use the path /dev/dm-1. The problem that I am having is
> that if I lose one path to the disk (am testing by pulling one fibre),
> the node is almost always fenced (one node, once, managed to stay up,
> out of more than 10 attempts). Is there some timeout that needs
> changing to give qdiskd the time to realise that a path is down? I
> have tried an interval of 3 seconds with at TKO of 10, with no
> success, and a token timeout set at 45000ms: 
> 
> <totem consensus="4800" join="60" token="45000"
> token_retransmits_before_loss_const="20"/> 
>         <quorumd device="/dev/dm-1" interval="3" min_score="1"
> tko="10" votes="3"/> 
> 

As a general rule, you want qdiskd's timeout to exceed the path failover
time with some time for the I/Os to get out after a path failover
completes.  As a general rule of thumb, totem's token timeout needs to
approximately double the qdisk timeout.  E.g.:

  <totem token="120000" ... /> 
  <quorumd device="/dev/dm-1"
   interval="3" min_score="1" tko="20" votes="3"
  />

[Note: Obviously, I think qdiskd should algorithmically determine fairly
optimial timings based on the totem token timeout in the future. ]

-- Lon




More information about the Linux-cluster mailing list