[Linux-cluster] Error messages during Fence operation

James Parsons jparsons at redhat.com
Wed Jan 2 20:34:39 UTC 2008


Randy Brown wrote:

> I forgot....I'm using Centos 5 with latest patches and kernel.
>
> Randy Brown wrote:
>
>> I am using an APC Masterswitch Plus as my fencing device.  I am 
>> seeing this in my logs now when fencing occurs:
>>
>> Dec 31 11:36:26 nfs1-cluster fenced[3848]: agent "fence_apc" reports: 
>> Traceback (most recent call last):   File "/sbin/fence_apc", line 
>> 829, in ?     main()   File "/sbin/fence_apc", line 289, in main     
>> do_login(sock)   File "/sbin/fence_apc", line 444, in do_login     i, 
>> mo, txt = sock.expect(regex_list, TELNET_TIMEOUT)
>> Dec 31 11:36:26 nfs1-cluster fenced[3848]: agent "fence_apc" 
>> reports:   File "/usr/lib/python2.4/telnetlib.py", line 620, in 
>> expect     text = self.read_very_lazy()   File 
>> "/usr/lib/python2.4/telnetlib.py", line 400, in read_very_lazy     
>> raise EOFError, 'telnet connection closed' EOFError: telnet 
>> connection closed
>> Dec 31 11:36:26 nfs1-cluster fenced[3848]: fence 
>> "nfs2-cluster.nws.noaa.gov" failed
>>
>> This used to work just fine.  If I run `fence_apc -a 192.168.42.30 -l 
>> cluster -n 1:7 -o Reboot -p <my password>` from the command line, 
>> fencing works as expected.  The relevant lines from my cluster.conf 
>> file are below.  I will gladly provide more information as necessary.
>
Is it possible that you are already telnet'ed into the switch from a 
terminal or somesuch when the fence attempt takes place? APC switches 
allow only one login at a time. I should/will add a log comment that 
mentions this as a possible reason.

If this is not the issue, well, we can keep digging...

-J




More information about the Linux-cluster mailing list