[Linux-cluster] iLO device as fencing device

James Parsons jparsons at redhat.com
Tue Dec 12 21:01:11 UTC 2006


Eric Kerin wrote:

> Coman Iliut wrote:
>
>> We are using it, too with good results. We had to write our own 
>> fencing method. The one supplied is too slow. ILO allows you to send 
>> a RESET command that is faster.
>>
>> Also, we wanted a more efficient use of the secure socket (the 
>> default fence_ilo lets the socket time out, then reconnects, etc) and 
>> we wanted to detect the case when node 1 cannot access the ILO of 
>> node 2 because node 2 is not there anymore (powered off, for example) 
>> or because node 1 lost network access.
>
> How are you differentiating between node 2 being powered off vs the 
> network cord used for iLO on node 2 being unplugged? 

The fence_ilo agent has an optional param that can be included in the 
cluster.conf file -- 'force="1"' will have the agent immediately power 
the node to be fenced off, and then check for status...the default 
action for the agentis to first check status and then begin the fence 
action...using this parameter reduces fence time using ilo to around 7 
seconds. Why is this param action not the default action for the agent? 
All of the agents employ a similar approach to fencing...first, status 
is checked to make certain that the node to be fenced is even up, then 
the fence action is made, then the kill is confirmed, then if the action 
is a reboot, the system is brought up, and then its status confirmed one 
final time...any problems along the way are logged. On the ilo, with its 
unsustainable ssl connection, building up and tearing down the 
connection 5 times (along with the necessary actions) takes about 40 
seconds - yuk...hence the force option which just shoots it first and 
asks questions later. Most other agents can run through our paranoid 
multiple status check methodology in just a couple of seconds -- as they 
use telnet and allow you to keep the connection open between actions.

I wish we could use another connection method for ilo that was faster - 
say, snmp - but snmp support in ilo is read-only, you cannot power a 
system down with a mib command. At least, that was the way it was before 
ilo2. Maybe things have changed for the better with ilo2.

For a deeper description of the 'force' param for ilo, please see:
http://sources.redhat.com/cluster/doc/cluster_schema.html

Regards,

-Jim




More information about the Linux-cluster mailing list