[Linux-cluster] need help - Fencing problem

Ben Turner bturner at redhat.com
Tue Sep 14 20:58:44 UTC 2010


Yep, thats what I see too.  The host name of your nodes should be different than the host name of your fence devices.  You probably used the correct hostname when you ran it manually, thats why they succeeded.  Try changing the hostname= in cluster.conf to what you used in tests where the reboot was successful.

-Ben


----- "umesh susvirkar" <susvirkar.3616 at gmail.com> wrote:

> Hi
> 
> 
> from your cluster.conf file
> 
> 
> 
> 
> <?xml version="1.0"?>
> <clusternode name=" node2.drctmb.com " nodeid="1" votes="1">
> <clusternode name=" node1.drctmb.com " nodeid="2" votes="1">
> <fencedevices>
> <fencedevice agent="fence_ilo" hostname=" node1.drctmb.com "
> login="root" name="NODE1" passwd="redhat123"/>
> <fencedevice agent="fence_ilo" hostname=" node2.drctmb.com "
> login="root" name="NODE2" passwd="redhat123"/>
> </fencedevices>
> 
> 
> Your node name & fence device hostname is same.that should be
> different.
> 
> As you mentioned following command in working
> 
> 
> fence_ilo -a "IP" -l "login" -p "Pass" -o status
> 
> 
> replace hostname of fencedevice with ip you specify with -a option &
> check.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Mon, Sep 13, 2010 at 1:14 PM, Girish Prajapati <
> girishpati at yahoo.com > wrote:
> 
> 
> 
> 
> 
> Hello,
> 
> Any update sir ?
> 
> 
> 
> 
> 
> From: Girish Prajapati < girishpati at yahoo.com >
> 
> To: linux clustering < linux-cluster at redhat.com >
> Sent: Fri, September 10, 2010 11:32:35 AM
> 
> 
> 
> Subject: Re: [Linux-cluster] need help - Fencing problem
> 
> 
> 
> 
> 
> 
> Hello Sir,
> 
> 1st and 2nd option passed successfully.
> i also try to run command with ilo's name and it run successfully so
> there is no issue of DNS.
> 
> i ) when i try to run fence_node command i get the following error:
> 
> [root at node1 ~]# fence_node node2.drctmb.com
> agent "fence_ilo" reports: Unable to connect/login to fencing device
> 
> ii) when i try to fence through Luci i get following error:
> 
> Sep 10 11:13:10 tmb luci[24270]: Unable to retrieve batch 1700106142
> status from node2.drctmb.com:11111 : fence_node failed:
> 
> Please let me know if there is any other why for troubleshoot
> 
> Thank you.
> 
> Regards,
> Girishkumar
> 
> 
> 
> 
> From: Ben Turner < bturner at redhat.com >
> To: linux clustering < linux-cluster at redhat.com >
> Sent: Thu, September 9, 2010 9:28:45 PM
> Subject: Re: [Linux-cluster] need help - Fencing problem
> 
> Judging from:
> 
> "Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports:
> Unable to connect/login to fencing device"
> 
> Chances are you are not using the correct username/password/IP or the
> ilo is not configured for telnet logins. Try the following:
> 
> 1. Login to the ilo via telnet from the command line. Be sure to use
> the username/password/IP you have in cluster.conf.
> 
> 2. If that is successful try:
> 
> # fence_ilo -v -a "Ilo IP from cluster.conf" -l "Ilo user from
> cluster.conf" -p "Ilo passwd from cluster.conf" -o status
> 
> The -v will display exactly what the fence agent sees and is very
> useful for debugging failing fences. If the status fails send me the
> output.
> 
> 3. If the fence_ilo successful try:
> 
> # fence_node <node name from cluster.conf>
> 
> If all 3 are successful then fencing is setup properly and there may
> be a problem running it from Luci, if any of the 3 fail post the error
> back to the list and I'll look at it.
> 
> -Ben
> 
> 
> 
> 
> 
> ----- "Girish Prajapati" < girishpati at yahoo.com > wrote:
> 
> > Hello,
> > i can run following command successfully from another node but still
> > getting same error message :
> >
> > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot
> >
> > Sep 9 14:37:00 node2 openais[2904]: [CLM ] Members Joined:
> > Sep 9 14:37:00 node2 openais[2904]: [SYNC ] This node is within the
> > primary component and will provide service.
> > Sep 9 14:37:00 node2 openais[2904]: [TOTEM] entering OPERATIONAL
> > state.
> > Sep 9 14:37:00 node2 openais[2904]: [CLM ] got nodejoin message
> > 192.168.0.28
> > Sep 9 14:37:00 node2 openais[2904]: [CPG ] got joinlist message from
> > node 1
> > Sep 9 14:37:00 node2 fenced[2923]: node1.drctmb.com not a cluster
> > member after 0 sec post_fail_delay
> > Sep 9 14:37:00 node2 fenced[2923]: fencing node " node1.drctmb.com "
> > Sep 9 14:37:10 node2 fenced[2923]: agent "fence_ilo" reports: Unable
> > to connect/login to fencing device
> > Sep 9 14:37:10 node2 fenced[2923]: fence " node1.drctmb.com " failed
> > Sep 9 14:37:15 node2 fenced[2923]: fencing node " node1.drctmb.com "
> > Sep 9 14:37:26 node2 fenced[2923]: agent "fence_ilo" reports: Unable
> > to connect/login to fencing device
> >
> > node1 rebooted and get connect to the cluster but now my webby
> service
> > not working see below log :
> >
> > Broadcast message from root (Thu Sep 9 14:32:41 2010):
> > The system is going down for system halt NOW!
> > Sep 9 14:19:22 node1 last message repeated 17 times
> > Sep 9 14:32:41 node1 shutdown[25506]: shutting down for system halt
> > Sep 9 14:32:41 node1 pcscd: winscard.c:304:SCardConnect() Reader
> > E-Gate 0 0 Not Found
> > Sep 9 14:32:43 node1 modclusterd: shutdown succeeded
> > Sep 9 14:32:43 node1 rgmanager: [25593]: <notice> Shutting down
> > Cluster Service Manager...
> > Sep 9 14:32:43 node1 clurgmgrd[3457]: <notice> Shutting down
> > Sep 9 14:32:43 node1 clurgmgrd[3457]: <notice> Shutting down
> > Sep 9 14:32:43 node1 clurgmgrd[3457]: <notice> Stopping service
> > service:webby
> > Sep 9 14:32:44 node1 avahi-daemon[3378]: Withdrawing address record
> > for 192.168.0.30 on eth0.
> > Read from remote host node1: Connection reset by peer
> > .
> > .
> > .
> > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/hda, packet devices
> > [this device CD/DVD] not SMART capable
> > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, opened
> > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, IE (SMART) not
> > enabled, skip device Try 'smartctl -s on /dev/sda' to turn on SMART
> > features
> > Sep 9 14:35:42 node1 smartd[3585]: Monitoring 0 ATA and 0 SCSI
> devices
> > Sep 9 14:35:42 node1 smartd[3604]: smartd has fork()ed into
> background
> > mode. New PID=3604.
> > Sep 9 14:35:42 node1 avahi-daemon[3412]: Service "SFTP File Transfer
> > on node1" (/services/sftp-ssh.service) successfully established.
> > Sep 9 14:35:45 node1 pcscd: winscard.c:304:SCardConnect() Reader
> > E-Gate 0 0 Not Found
> > Sep 9 14:35:45 node1 last message repeated 3 times
> > Sep 9 14:35:45 node1 kernel: mtrr: type mismatch for
> d8000000,2000000
> > old: uncachable new: write-combining
> > Sep 9 14:35:46 node1 clurgmgrd: [3491]: <err> Checking Existence Of
> > File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] >
> Failed
> > - File Doesn't Exist
> >
> >
> >
> > It seems that there problem in fencing device configuration.
> > Please find here my cluster.conf :
> >
> >
> > <?xml version="1.0"?>
> > <cluster alias="girish" config_version="21" name="girish">
> > <fence_daemon clean_start="0" post_fail_delay="0"
> > post_join_delay="3"/>
> > <clusternodes>
> > <clusternode name=" node2.drctmb.com " nodeid="1" votes="1">
> > <fence>
> > <method name="1">
> > <device name="NODE2"/>
> > </method>
> > </fence>
> > </clusternode>
> > <clusternode name=" node1.drctmb.com " nodeid="2" votes="1">
> > <fence>
> > <method name="1">
> > <device name="NODE1"/>
> > </method>
> > </fence>
> > </clusternode>
> > </clusternodes>
> > <cman expected_votes="1" two_node="1"/>
> > <fencedevices>
> > <fencedevice agent="fence_ilo" hostname=" node1.drctmb.com "
> > login="root" name="NODE1" passwd="redhat123"/>
> > <fencedevice agent="fence_ilo" hostname=" node2.drctmb.com "
> > login="root" name="NODE2" passwd="redhat123"/>
> > </fencedevices>
> > <rm>
> > <failoverdomains>
> > <failoverdomain name="prefer_node1" nofailback="0" ordered="1"
> > restricted="1">
> > <failoverdomainnode name=" node2.drctmb.com " priority="2"/>
> > <failoverdomainnode name=" node1.drctmb.com " priority="1"/>
> > </failoverdomain>
> > </failoverdomains>
> > <resources>
> > <fs device="/dev/sda1" force_fsck="0" force_unmount="0" fsid="8669"
> > fstype="ext3" mountpoint="/var/www/html" name="docroot"
> > self_fence="0"/>
> > <ip address="192.168.0.30" monitor_link="1"/>
> > <apache config_file="conf/httpd.conf" name="httpd"
> > server_root="/etc/httpd" shutdown_wait="5"/>
> > </resources>
> > <service autostart="1" domain="prefer_node1" exclusive="0"
> > name="webby" recovery="relocate">
> > <ip ref="192.168.0.30"/>
> > <fs ref="docroot"/>
> > <apache ref="httpd"/>
> > </service>
> > </rm>
> > <fence_xvmd/>
> > </cluster>
> > ~
> >
> > This is first time am working on Clustering so please help me.
> > Appreciate your help.
> >
> > Thank you.
> >
> >
> >
> > From: Brem Belguebli < brem.belguebli at gmail.com >
> > To: linux clustering < linux-cluster at redhat.com >
> > Sent: Thu, September 9, 2010 11:30:28 AM
> > Subject: Re: [Linux-cluster] need help - Fencing problem
> >
> > try run this from another node of the cluster
> >
> > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot
> >
> >
> > Additionnally, by connecting thru http to the Ilo, you should be
> able
> > to
> > see Ilo logs (in the general tab) and see if it is due to a lack of
> > licensing
> >
> >
> > On Wed, 2010-09-08 at 22:29 -0700, Girish Prajapati wrote:
> > > Hello...
> > >
> > > I have already configure BIOS for iLO.. but am not sure why i don
> > need
> > > to shared ??
> > > please anybody can help me out for this problem.
> > > Do i need any extra setup for fencing device ?
> > > thanks
> > >
> > >
> > >
> > >
> >
> ______________________________________________________________________
> > > From: ESGLinux < esggrupos at gmail.com >
> > > To: linux clustering < linux-cluster at redhat.com >
> > > Sent: Wed, September 8, 2010 2:57:25 PM
> > > Subject: Re: [Linux-cluster] need help - Fencing problem
> > >
> > > Hello,
> > >
> > >
> > > Have you configured the iLO devices entering in the BIOS?
> > >
> > >
> > > I remenber I have to set up the user/pass in the iLO and marked
> the
> > > iLo as not shared
> > >
> > >
> > >
> > >
> > > HTH,
> > >
> > >
> > > ESG
> > >
> > > 2010/9/8 Girish Prajapati < girishpati at yahoo.com >
> > > Hello Everybody,
> > > i am having problem of fencing a cluster node let me explain
> > > indetail :
> > > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and
> > > iLO 2as fencing device. Am managing cluster through Luci -
> > > (Conga). itseems everything is working fine. I can reboot
> > > cluster nodes through Luci and service get transfer to another
> > > node. After rebooting node connect to cluster automatically
> > > without any error.
> > > Problem is i can not do Fence this node through Luci, when i
> > > try to fence any node i get following error :
> > >
> > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo"
> > > reports: Unable to connect/login to fencing device
> > > Sep 8 14:51:16 node2 fence_node[9106]: Fence of
> > > " node1.drctmb.com " was unsuccessful
> > >
> > > my iLO license is : iLO 2 Advanced Evaluation
> > > Do i need to have license of iLO or there is problem in
> > > configuration of cluster ?
> > > how i can check cluster log in details.
> > >
> > > Appreciate your help.
> > > Thank you in advance.
> > >
> > > Regards,
> > > Girishkumar R Prajapati
> > >
> > >
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > >
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list