[Cluster-devel] Patch: making DLM more robust

David Teigland teigland at redhat.com
Tue Nov 30 17:30:51 UTC 2010


On Tue, Nov 30, 2010 at 05:57:50PM +0100, Menyhart Zoltan wrote:
> Hi,
> 
> An easy first step to make DLM more robust can be adding a time out protection
> to the lock space cration operation, while waiting for a "dlm_controld" action.
> A new memeber "ci_dlm_controld_secs" is added to "dlm_config" to set up time out
> in seconds, DEFAULT_DLM_CTRL_SECS is 5 seconds.
> 
> At the same time, signals can be enabled and handled, too.
> 
> DLM_USER_CREATE_LOCKSPACE will be able to return new error codes:
> -EINTR or -ETIMEDOUT.
> 
> Could you please tell me why the signals are blocked within "device_write()"?
> I think it is safe to allow signals, surely in your original code sequences
> waiting in an uninterruptible way.

Thanks, I'll take a look; as long as it's disabled by default I don't
expect I'd object much.  There are two main problems with this idea,
though, that need to be handled before it's generally usable:

1. The kernel can wait on user space indefinately during completely normal
situations, e.g. the loss of quorum or fencing failures can delay
completion indefinately.  This means you can easily introduce false
failures when using a timeout.  EINTR, since it's driven by user
intervention, is a better idea, e.g. killing a mount process.

2. The difficulty, even with EINTR, is correctly and cleanly unwinding the
dlm_controld state.

Dave




More information about the Cluster-devel mailing list