[Linux-cluster] sudden unfencing problem
laurence.schuler
laurence.schuler at nasa.gov
Sun Mar 24 19:42:52 UTC 2013
I will try the tcpdump, I've been able to do stupid stuff with the
fence_sanbox2 python script (repeat sequences of issue/expect; increase
the python delaybeforesend value) to get it to work most of the time,
but I don't believe the script code is wrong, there's something else
going on, perhaps the switch is in a funny state. Now, how would I
detect *that*!
Thanks,
--Larry
On 03/24/2013 12:00 PM, James Washer wrote:
> You might want to grab a tcpdump of the connection. Perhaps you'll be
> able to see a bit more of the conversation.
>
> On Sat, Mar 23, 2013 at 5:55 AM, Laurence Schuler
> <laurence.schuler at nasa.gov <mailto:laurence.schuler at nasa.gov>> wrote:
>
> I have a two node cluster that has been running fine for a couple of
> months (little to 0 reboots though). We recently updated the software
> with the latest Centos 6 software but now the cluster will not
> start. It
> keeps throwing errors during startup when attempting to unfence the
> disks. I have hard reset the fiber switch, and reset both hosts, but
> when I run fence_sanbox2, I am unable to either enable, disable or
> even
> get status of the switch ports. This is the error I get.
>
> > [root at web1 lschule3]# /usr/sbin/fence_sanbox2 -a 192.168.1.190 -l
> > admin -S FCpass.sh -o enable -n 5 -v
> > telnet> set binary
> > Negotiating binary mode with remote host.
> > telnet> open 192.168.1.190 -23
> > Trying 192.168.1.190...
> > Connected to 192.168.1.190.
> > Escape character is '^]'.
> >
> > Firmware V8.0.13.8.0
> >
> > r3fc1 login:
> >
> >
> > Establishing connection... Please wait.
> >
> > *****************************************************
> > * *
> > * Command Line Interface SHell (CLISH) *
> > * *
> > *****************************************************
> >
> > SystemDescription SANbox 5800 FC Switch
> > HostName r3fc1
> > EthIPv4NetworkAddr 192.168.1.190
> > EthIPv6NetworkAddr fe80::2c0:00:00:90b
> > MACAddress 00:c0:dd:77:10:0b
> > WorldWideName 10:00:00:c0:dd:24:09:0b
> > SerialNumber 1236H00833
> > SymbolicName r3fc1
> > ActiveSWVersion V8.0.13.8.0
> > ActiveTimestamp Mon Apr 2 18:32:33 2012
> > POSTStatus Passed
> > LicensedPorts 12
> > SwitchMode Full Fabric
> >
> > The alarm log is empty.
> >
> > r3fc1 #> r3fc1 #> Failed: Unable to switch to admin section
> > [root at web1 lschule3]#
>
> I can manually telnet into the FC switch and execute the appropriate
> commands to enable/disable ports. But the fence_sanbox2 script
> will not.
> The fence_sanbox2 code has not changed, however python has been
> upgraded
> from 2.6.6-29 to 2.6.6-36.
>
> Has anyone else seen this? Know of a fix? Am I doing/not doing
> something
> stupid? I seem to recall running this command before during setup
> and it
> worked just fine then.
>
> Thanks for any help!
>
> --
> Laurence Schuler (Larry)
> Laurence.Schuler at nasa.gov <mailto:Laurence.Schuler at nasa.gov>
> Systems Support ADNET
> Systems, Inc
> Scientific Visualization Studio
> http://svs.gsfc.nasa.gov
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> --
>
>
> - jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130324/badff39e/attachment.htm>
More information about the Linux-cluster
mailing list