[Linux-cluster] [PATCH] WTI RSM serial passthrough for fence_wti

Thu Mar 15 20:00:04 UTC 2007

Problem statement:

Fencing fails if WTI device loses network connectivity.  It sure would
be Nice(tm) to have a backup method to access the fence device.

Solution:

With this patch, you can use your WTI RSM serial port server as a backup
method to access your WTI IPS/NPS/NBB/TPS/. remote power controller in
the event that the IPS/*/etc. loses its network connectivity (I'll just
say "IPS" from now on to mean "all of 'em").

IMPORTANT: If you want it to work with your particular serial port
server (that is not a WTI RSM), I welcome your patch or your
hardware. ;)

This patch was tested with an IPS-800 as the back-end and a RSM-8 as the
front-end.  It *should* work with any RSM and any supported WTI remote
power switch (e.g. NPS, IPS, NBB, TPS series).

Configuration notes:

* You must enable "Direct Connect" access for the port connected to the
IPS. (/P [number], option 31).  It should be set to "On - Password".

* The perl Net::Telnet module does not work with the RSM's standard
telnet port; I don't know why (nor do I care to figure it out :) ).
However, because of this, you must enable "Raw Socket Access" (/N,
option 31); using the standard direct access telnet port will not work.

* You may use a script to retrieve the password (use rsm_passwd_script
option instead of rsm_passwd).

* You must add a new fence device (manually) to the cluster.conf
(example below).  There is no UI support for this.

...
                <clusternode name="green" nodeid="2" votes="1">
                        <multicast addr="225.0.0.12" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device name="ips-rack9"
                                                port="2"/>
                                </method>
                                <method name="2">
                                        <device name="ips-rack9-backup"
                                                port="2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
               <fencedevice agent="fence_wti" 
                            name="ips-rack9"
                            passwd="wti"
                            ipaddr="ips-rack9"/>
               <fencedevice agent="fence_wti"
                            name="ips-rack9-backup"
                            passwd="wti"
                            ipaddr="rsm-rack9"
                            rsm_enable="1"
                            rsm_login="super"
                            rsm_passwd="super"
                            tcpport="3108"/>
        </fencedevices>
...

In my case, I plugged the IPS into port 8 on the RSM - so the raw port
becomes '3108'.  This reduces the complexity of the patch (when compared
to parsing RSM output).  Note that the passwd, agent, and port (plug
number) stay the same, but the host we're talking to changes, and a
bunch of additional stuff is added to talk to the RSM prior to even
getting to the IPS.

How to test:

* Apply patch to CVS/head; rebuild / install fence_wti.

* Configure your cluster similar to above (don't forget to run ccs_tool
update)

* Pull the network jack from your WTI power switch.

* Run fence_node <nodename>.  The first fence level will fail, but the
second one should succeed.

Other information:

* This patch has been tested with the same RSM with direct access for
the port set to 'no password' as well.  In this case, the relevant extra
fence device would more or less look like the following:

...
               <fencedevice agent="fence_wti"
                            name="ips-rack9-backup"
                            passwd="wti"
                            ipaddr="rsm-rack9"
                            serial="1"
                            tcpport="3108"/>
...

I do not recommend this configuration because it is possible to log out
of the RSM without being logged out of the IPS.  The effect is that
someone might be able to reconnect to the existing IPS session without
being prompted for any password (that's bad!).  The upshot is that raw
serial mode *MIGHT* work with other serial port server appliances which
support raw / unpassworded / direct telnet access.

FAQ:

Q: What units was this patch developed on?
A: WTI RSM-8 & WTI IPS-800

Q: What versions of Linux-cluster will this work with?
A: Currently, just CVS - this is a new patch.

Q: The RSM supports SSH - why is there no SSH support?
A: Because it would increase the complexity of the fence_wti agent
significantly.  Patches accepted.

Q: Why not just have two hosts (rsm_host, for example) and do the
retrying internally?
A: fenced already retries and already has provisions for doing this;
implementing it within fence agents is redundant (not to mention, it
adds needless complexity).

Q: Why did you choose the WTI RSM?
A: First, because people like solutions from as few vendors as possible
- so, the RSM + IPS was a natural fit in that regard.  Second, because
it's what I have on-hand.

Q: Why don't you support the [insert your favorite serial server here]?
A: Because I don't have one and because it adds even more complexity to
the agent.  Taking patches (should amount to changing the login/password
expect strings in the rsm-login section...).

-- Lon

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence-wti-rsm-serial-passthrough.patch
Type: text/x-patch
Size: 4968 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070315/62af5335/attachment.bin>