[Linux-cluster] CS4 Update 2 / problem with systems dump ?

Wed Mar 22 15:24:41 UTC 2006

David Teigland wrote:
> On Wed, Mar 22, 2006 at 08:59:17AM +0100, Alain Moulle wrote:
> 
>>>>You might set a fencing delay that would allow the dump to complete, e.g.
>>>>  <fence_daemon post_fail_delay="10">
>>>>  </fence_daemon>
> 
> 
>>OK but does that mean that one we have patched this, the peer node will
>>wait in all cases this delay before fencing the node with problem, even
>>if this node is not dumping , right ?
> 
> 
> When fenced goes to fence a failed node, it waits 10s before actually
> killing it.  That applies to all nodes that fail.
> 
> 
>>So, the workaround that you propose is to be used only this way :
>>1. a node has crashed and was about to dump but has been fenced.
>>2. patch the post_fail_delay
>>3. re-start CS4 on both nodes
>>4. wait for a new crash and dump, and in this case, the failover
>>   will take at least the post_fail_delay value.
> 
> 
> I'm not sure what you mean by this, but it doesn't sound right.
> post_fail_delay would be added permanently to cluster.conf which
> is the same on all nodes... you don't change it.
>
> Dave
>
>

Yes, that's what I have understood, and as dump can take let's
say 20mn, that means that I'll have to put <fence_daemon post_fail_delay="1200">
but only in case of real problem, to let the failed node ending its dump.
But we can envisage this only if we already have had a system crash,
because of the long time to failover, otherwise we must keep 10s for
fence delay, that's why I propose the list above.
Right ?
Alain