[Linux-cluster] Some GDLM questions
David Teigland
teigland at redhat.com
Mon Jul 5 02:39:47 UTC 2004
> I understand the above but its still not clear to me how a
> locking application would get fenced. On startup the application
> could check that the cluster member has joined the fence domain.
> This will ensure that it gets fenced if something goes wrong.
>
> What's not clear is how the fence process will shut down (or
> suspend) the locking application while fencing the node. Fencing
> seems to be related to blocking access to I/O devices.
I'm not entirely sure what you're asking, but I hope a long and broad answer
might answer it.
say there's a two node cluster of nodes A and B
both nodes are running cman, fence, dlm and some application using the dlm
1. node A: hangs and is unresponsive
2. node B: cman detects that A has failed
3. node B: all cluster services are stopped/suspended
(these services are fence and dlm in this example)
4. node B: while dlm service is stopped, it blocks all lock requests
5. node B: cluster still has quorum because of special "two_node" config
6. node B: fence service is started/enabled
7. node B: fence service fences node A
8. node B: dlm service is started/enabled
9. node B: dlm service recovers the application's lock space and
lock requests proceed as usual
If the fencing method in step 7 only blocks access to i/o devices from node A,
node A could potentially "revive" and continue running. The dlm on node B no
longer accepts A as a member of the lockspace so any dlm messages from A will
be ignored by B.
Depending on the application this may not be sufficient to prevent a revived
node A from causing problems. If so, the simplest thing is to use a fencing
method that resets the power on node A rather than simply blocking its device
i/o.
--
Dave Teigland <teigland at redhat.com>
More information about the Linux-cluster
mailing list