[Linux-cluster] need help - Fencing problem

umesh susvirkar susvirkar.3616 at gmail.com
Mon Sep 13 16:37:59 UTC 2010


Hi

from your cluster.conf file


<?xml version="1.0"?>
               <clusternode name="node2.drctmb.com" nodeid="1" votes="1">
                 <clusternode name="node1.drctmb.com" nodeid="2" votes="1">
         <fencedevices>
                <fencedevice agent="fence_ilo" hostname="node1.drctmb.com"
login="root" name="NODE1" passwd="redhat123"/>
                <fencedevice agent="fence_ilo" hostname="node2.drctmb.com"
login="root" name="NODE2" passwd="redhat123"/>
        </fencedevices>

Your node name & fence device hostname is same.that should be different.

As you mentioned following command in working

fence_ilo -a "IP" -l "login" -p "Pass" -o status

replace hostname of fencedevice with ip you specify with -a option & check.







On Mon, Sep 13, 2010 at 1:14 PM, Girish Prajapati <girishpati at yahoo.com>wrote:

> Hello,
>
> Any update sir ?
>
>
>
>  ------------------------------
> *From:* Girish Prajapati <girishpati at yahoo.com>
>
> *To:* linux clustering <linux-cluster at redhat.com>
> *Sent:* Fri, September 10, 2010 11:32:35 AM
>
> *Subject:* Re: [Linux-cluster] need help - Fencing problem
>
>  Hello Sir,
>
>  1st and 2nd option passed successfully.
> i also try to run command with ilo's name and it run successfully so there
> is no issue of DNS.
>
> i ) when i try to run fence_node command i get the following error:
>
> [root at node1 ~]# fence_node node2.drctmb.com
> agent "fence_ilo" reports: Unable to connect/login to fencing device
>
> ii) when i try to fence through Luci i get following error:
>
> Sep 10 11:13:10 tmb luci[24270]: Unable to retrieve batch 1700106142 status
> from node2.drctmb.com:11111: fence_node failed:
>
> Please let me know if there is any other why for troubleshoot
>
> Thank you.
>
> Regards,
> Girishkumar
>
>  ------------------------------
> *From:* Ben Turner <bturner at redhat.com>
> *To:* linux clustering <linux-cluster at redhat.com>
> *Sent:* Thu, September 9, 2010 9:28:45 PM
> *Subject:* Re: [Linux-cluster] need help - Fencing problem
>
> Judging from:
>
> "Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable
> to connect/login to fencing device"
>
> Chances are you are not using the correct username/password/IP or the ilo
> is not configured for telnet logins.  Try the following:
>
> 1.  Login to the ilo via telnet from the command line.  Be sure to use the
> username/password/IP you have in cluster.conf.
>
> 2.  If that is successful try:
>
> # fence_ilo -v -a "Ilo IP from cluster.conf" -l "Ilo user from
> cluster.conf" -p "Ilo passwd from cluster.conf" -o status
>
> The -v will display exactly what the fence agent sees and is very useful
> for debugging failing fences.  If the status fails send me the output.
>
> 3.  If the fence_ilo successful try:
>
> # fence_node <node name from cluster.conf>
>
> If all 3 are successful then fencing is setup properly and there may be a
> problem running it from Luci, if any of the 3 fail post the error back to
> the list and I'll look at it.
>
> -Ben
>
>
>
>
>
> ----- "Girish Prajapati" <girishpati at yahoo.com> wrote:
>
> > Hello,
> > i can run following command successfully from another node but still
> > getting same error message :
> >
> > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot
> >
> > Sep 9 14:37:00 node2 openais[2904]: [CLM ] Members Joined:
> > Sep 9 14:37:00 node2 openais[2904]: [SYNC ] This node is within the
> > primary component and will provide service.
> > Sep 9 14:37:00 node2 openais[2904]: [TOTEM] entering OPERATIONAL
> > state.
> > Sep 9 14:37:00 node2 openais[2904]: [CLM ] got nodejoin message
> > 192.168.0.28
> > Sep 9 14:37:00 node2 openais[2904]: [CPG ] got joinlist message from
> > node 1
> > Sep 9 14:37:00 node2 fenced[2923]: node1.drctmb.com not a cluster
> > member after 0 sec post_fail_delay
> > Sep 9 14:37:00 node2 fenced[2923]: fencing node "node1.drctmb.com"
> > Sep 9 14:37:10 node2 fenced[2923]: agent "fence_ilo" reports: Unable
> > to connect/login to fencing device
> > Sep 9 14:37:10 node2 fenced[2923]: fence "node1.drctmb.com" failed
> > Sep 9 14:37:15 node2 fenced[2923]: fencing node "node1.drctmb.com"
> > Sep 9 14:37:26 node2 fenced[2923]: agent "fence_ilo" reports: Unable
> > to connect/login to fencing device
> >
> > node1 rebooted and get connect to the cluster but now my webby service
> > not working see below log :
> >
> > Broadcast message from root (Thu Sep 9 14:32:41 2010):
> > The system is going down for system halt NOW!
> > Sep 9 14:19:22 node1 last message repeated 17 times
> > Sep 9 14:32:41 node1 shutdown[25506]: shutting down for system halt
> > Sep 9 14:32:41 node1 pcscd: winscard.c:304:SCardConnect() Reader
> > E-Gate 0 0 Not Found
> > Sep 9 14:32:43 node1 modclusterd: shutdown succeeded
> > Sep 9 14:32:43 node1 rgmanager: [25593]: <notice> Shutting down
> > Cluster Service Manager...
> > Sep 9 14:32:43 node1 clurgmgrd[3457]: <notice> Shutting down
> > Sep 9 14:32:43 node1 clurgmgrd[3457]: <notice> Shutting down
> > Sep 9 14:32:43 node1 clurgmgrd[3457]: <notice> Stopping service
> > service:webby
> > Sep 9 14:32:44 node1 avahi-daemon[3378]: Withdrawing address record
> > for 192.168.0.30 on eth0.
> > Read from remote host node1: Connection reset by peer
> > .
> > .
> > .
> > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/hda, packet devices
> > [this device CD/DVD] not SMART capable
> > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, opened
> > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, IE (SMART) not
> > enabled, skip device Try 'smartctl -s on /dev/sda' to turn on SMART
> > features
> > Sep 9 14:35:42 node1 smartd[3585]: Monitoring 0 ATA and 0 SCSI devices
> > Sep 9 14:35:42 node1 smartd[3604]: smartd has fork()ed into background
> > mode. New PID=3604.
> > Sep 9 14:35:42 node1 avahi-daemon[3412]: Service "SFTP File Transfer
> > on node1" (/services/sftp-ssh.service) successfully established.
> > Sep 9 14:35:45 node1 pcscd: winscard.c:304:SCardConnect() Reader
> > E-Gate 0 0 Not Found
> > Sep 9 14:35:45 node1 last message repeated 3 times
> > Sep 9 14:35:45 node1 kernel: mtrr: type mismatch for d8000000,2000000
> > old: uncachable new: write-combining
> > Sep 9 14:35:46 node1 clurgmgrd: [3491]: <err> Checking Existence Of
> > File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed
> > - File Doesn't Exist
> >
> >
> >
> > It seems that there problem in fencing device configuration.
> > Please find here my cluster.conf :
> >
> >
> > <?xml version="1.0"?>
> > <cluster alias="girish" config_version="21" name="girish">
> > <fence_daemon clean_start="0" post_fail_delay="0"
> > post_join_delay="3"/>
> > <clusternodes>
> > <clusternode name=" node2.drctmb.com " nodeid="1" votes="1">
> > <fence>
> > <method name="1">
> > <device name="NODE2"/>
> > </method>
> > </fence>
> > </clusternode>
> > <clusternode name="node1.drctmb.com" nodeid="2" votes="1">
> > <fence>
> > <method name="1">
> > <device name="NODE1"/>
> > </method>
> > </fence>
> > </clusternode>
> > </clusternodes>
> > <cman expected_votes="1" two_node="1"/>
> > <fencedevices>
> > <fencedevice agent="fence_ilo" hostname="node1.drctmb.com"
> > login="root" name="NODE1" passwd="redhat123"/>
> > <fencedevice agent="fence_ilo" hostname="node2.drctmb.com"
> > login="root" name="NODE2" passwd="redhat123"/>
> > </fencedevices>
> > <rm>
> > <failoverdomains>
> > <failoverdomain name="prefer_node1" nofailback="0" ordered="1"
> > restricted="1">
> > <failoverdomainnode name="node2.drctmb.com" priority="2"/>
> > <failoverdomainnode name="node1.drctmb.com" priority="1"/>
> > </failoverdomain>
> > </failoverdomains>
> > <resources>
> > <fs device="/dev/sda1" force_fsck="0" force_unmount="0" fsid="8669"
> > fstype="ext3" mountpoint="/var/www/html" name="docroot"
> > self_fence="0"/>
> > <ip address="192.168.0.30" monitor_link="1"/>
> > <apache config_file="conf/httpd.conf" name="httpd"
> > server_root="/etc/httpd" shutdown_wait="5"/>
> > </resources>
> > <service autostart="1" domain="prefer_node1" exclusive="0"
> > name="webby" recovery="relocate">
> > <ip ref="192.168.0.30"/>
> > <fs ref="docroot"/>
> > <apache ref="httpd"/>
> > </service>
> > </rm>
> > <fence_xvmd/>
> > </cluster>
> > ~
> >
> > This is first time am working on Clustering so please help me.
> > Appreciate your help.
> >
> > Thank you.
> >
> >
> >
> > From: Brem Belguebli <brem.belguebli at gmail.com>
> > To: linux clustering <linux-cluster at redhat.com>
> > Sent: Thu, September 9, 2010 11:30:28 AM
> > Subject: Re: [Linux-cluster] need help - Fencing problem
> >
> > try run this from another node of the cluster
> >
> > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot
> >
> >
> > Additionnally, by connecting thru http to the Ilo, you should be able
> > to
> > see Ilo logs (in the general tab) and see if it is due to a lack of
> > licensing
> >
> >
> > On Wed, 2010-09-08 at 22:29 -0700, Girish Prajapati wrote:
> > > Hello...
> > >
> > > I have already configure BIOS for iLO.. but am not sure why i don
> > need
> > > to shared ??
> > > please anybody can help me out for this problem.
> > > Do i need any extra setup for fencing device ?
> > > thanks
> > >
> > >
> > >
> > >
> > ______________________________________________________________________
> > > From: ESGLinux < esggrupos at gmail.com >
> > > To: linux clustering < linux-cluster at redhat.com >
> > > Sent: Wed, September 8, 2010 2:57:25 PM
> > > Subject: Re: [Linux-cluster] need help - Fencing problem
> > >
> > > Hello,
> > >
> > >
> > > Have you configured the iLO devices entering in the BIOS?
> > >
> > >
> > > I remenber I have to set up the user/pass in the iLO and marked the
> > > iLo as not shared
> > >
> > >
> > >
> > >
> > > HTH,
> > >
> > >
> > > ESG
> > >
> > > 2010/9/8 Girish Prajapati < girishpati at yahoo.com >
> > > Hello Everybody,
> > > i am having problem of fencing a cluster node let me explain
> > > indetail :
> > > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and
> > > iLO 2as fencing device. Am managing cluster through Luci -
> > > (Conga). itseems everything is working fine. I can reboot
> > > cluster nodes through Luci and service get transfer to another
> > > node. After rebooting node connect to cluster automatically
> > > without any error.
> > > Problem is i can not do Fence this node through Luci, when i
> > > try to fence any node i get following error :
> > >
> > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo"
> > > reports: Unable to connect/login to fencing device
> > > Sep 8 14:51:16 node2 fence_node[9106]: Fence of
> > > " node1.drctmb.com " was unsuccessful
> > >
> > > my iLO license is : iLO 2 Advanced Evaluation
> > > Do i need to have license of iLO or there is problem in
> > > configuration of cluster ?
> > > how i can check cluster log in details.
> > >
> > > Appreciate your help.
> > > Thank you in advance.
> > >
> > > Regards,
> > > Girishkumar R Prajapati
> > >
> > >
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > >
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100913/bf42ef67/attachment.htm>


More information about the Linux-cluster mailing list