[Linux-cluster] post_fail_delay versus deadnode_timeout

Riaan van Niekerk riaan at obsidian.co.za
Thu Sep 14 13:54:13 UTC 2006


Riaan van Niekerk wrote:
> hi
> 
> We are trying to capture diskdumps when a lock_dlm kernel panic happens 
> and need to increase either post_fail_delay or deadnode_timeout to 
> prevent the dumping node from being fenced.
> 
> Is there any advantages or disadvantages to using either? Which is 
> recommended?
> 
> post_fail_delay and diskdump has come up previously, with some good 
> answers from David
> http://www.redhat.com/archives/linux-cluster/2006-June/msg00037.html
> 
> note: for capturing a "sysrq t", we manually increase deadnode_timeout, 
> and decrease it back again, but don't have this luxury with a kernel 
> panic (which can happen at any time).
> 
> Riaan
> 

Having spent some time researching this, and with some help from Red Hat 
Support, here is an attempt at an answer. I use power-fencing. Some of 
these might not apply to I/O fencing:


post_fail_delay

Pros:
- single place to change it (cluster.conf) makes it global across the 
cluster
- If failed node is detected, resources will relocated immediately 
(instead of waiting for the deadnode_timeout to be reached and then 
relocate)
- usage case: post-kernel panic, when you need to capture a disk-/netdump

Cons:
- Fence daemon needs to be restarted to apply (e.g. in all likelihood 
you need to reboot all nodes)
- Slight annoyance: depending on how long you set the post_fail_delay, a 
node may be restarting already, and is then fenced, requiring another 
restart.


deadnode_timeout

Pros:
- can be set dynamically
- useful if you have warning that the problem will materialize (we have 
a scenario like that)
- usage case: when you need to run "sysrq t" or some intrusive command 
which would cause a node to be fenced otherwise: Increase, sysrq, decrease

Cons:
- need to set on all nodes
- Not persistent. Need to hack cman init script to make persistent.


corrections/additions welcome
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riaan.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060914/88a15863/attachment.vcf>


More information about the Linux-cluster mailing list