[Cluster-devel] unfencing
David Teigland
teigland at redhat.com
Fri Feb 20 21:44:32 UTC 2009
Fencing devices that do not reboot a node, but just cut off storage have
always required the impractical step of re-enabling storage access after the
node has been reset. We've never provided a mechanism to automate this
unfencing.
Below is an outline of how we might automate unfencing with some simple
extensions to the existing fencing library, config scheme and agents. It does
not involve the fencing daemon (fenced). Nodes would unfence themselves when
they start up. We might also consider a scheme where a node is unfenced by
*other* nodes when it starts up, if that has any advantage over
self-unfencing.
cluster3 is the context, but a similar thing would apply to a next generation
unified fencing system, e.g.
https://www.redhat.com/archives/cluster-devel/2008-October/msg00005.html
init.d/cman would run:
cman_tool join
fence_node -U <ourname>
qdiskd
groupd
fenced
dlm_controld
gfs_controld
fence_tool join
The new step fence_node -U <name> would call libfence:fence_node_undo(name).
[fence_node <name> currently calls libfence:fence_node(name) to fence a node.]
libfence:fence_node_undo(node_name) logic:
for each device_name under given node_name,
if an unfencedevice exists with name=device_name, then
run the unfencedevice agent with first arg of "undo"
and other args the normal combination of node and device args
(any agent used with unfencing must recognize/support "undo")
[logic derived from cluster.conf structure and similar to fence_node logic]
Example 1:
<clusternode name="foo" nodeid="3">
<fence>
<method="1">
<device name="san" node="foo"/>
</method>
</fence>
</clusternode>
<fencedevices>
<fencedevice name="san" agent="fence_scsi"/>
</fencedevices>
<unfencedevices>
<unfencedevice name="san" agent="fence_scsi"/>
</unfencedevices>
fence_node_undo("foo") would:
- fork fence_scsi
- pass arg string: undo node="foo" agent="fence_scsi"
[Note: we've talked about fence_scsi getting a device list from
/etc/cluster/fence_scsi.conf instead of from clvm. It would require
more user configuration, but would create fewer problems and should
be more robust.]
Example 2:
<clusternode name="bar" nodeid="4">
<fence>
<method="1">
<device name="switch1" port="4"/>
<device name="switch2" port="6"/>
</method>
<method="2">
<device name="apc" port="4"/>
</method>
</fence>
</clusternode>
<fencedevices>
<fencedevice name="switch1" agent="fence_brocade" ipaddr="1.1.1.1"/>
<fencedevice name="switch2" agent="fence_brocade" ipaddr="2.2.2.2"/>
<fencedevice name="apc" agent="fence_apc" ipaddr="3.3.3.3"/>
</fencedevices>
<unfencedevices>
<unfencedevice name="switch1" agent="fence_brocade" ipaddr="1.1.1.1"/>
<unfencedevice name="switch2" agent="fence_brocade" ipaddr="2.2.2.2"/>
</unfencedevices>
fence_node_undo("bar") would:
- fork fence_brocade
- pass arg string: undo port="4" agent="fence_brocade" ipaddr="1.1.1.1"
- fork fence_brocade
- pass arg string: undo port="6" agent="fence_brocade" ipaddr="2.2.2.2"
- ignore device "apc" because it's not found under <unfencedevices>
More information about the Cluster-devel
mailing list