[Linux-cluster] Hi
Parvez Shaikh
parvez.h.shaikh at gmail.com
Wed Oct 3 05:23:17 UTC 2012
A curious observation, there is a sudden surge of sending emails on private
addresses rather than sending over a mailing list.
Please send your doubts / questions on mailing list "
linux-cluster at redhat.com" instead of addressing personally.
Regarding configuration for manual fencing - I don't have it with me, it
was available with RHEL 5.5. Check it out in system-config-cluster tool if
you can add manual fencing.
Thanks,
Parvez
On Wed, Oct 3, 2012 at 10:46 AM, Renchu Mathew <renchumv at gmail.com> wrote:
> Hi Purvez,
>
> I am trying to setup a test cluster environmet. But I haven't doen
> fencing. Please find below error messages. Some time after the nodes
> restarted, the other node is going down. can you please send me
> theconfiguration for manual fencing?
>
>
>> > Please find attached my cluster setup. It is not stable
>> > and /var/log/messages shows the below errors.
>> >
>> >
>> > Sep 11 08:49:10 node1 corosync[1814]: [QUORUM] Members[2]: 1 2
>> > Sep 11 08:49:10 node1 corosync[1814]: [QUORUM] Members[2]: 1 2
>> > Sep 11 08:49:10 node1 corosync[1814]: [CPG ] chosen downlist:
>> > sender r(0) ip(192.168.1.251) ; members(old:2 left:1)
>> > Sep 11 08:49:10 node1 corosync[1814]: [MAIN ] Completed service
>> > synchronization, ready to provide service.
>> > Sep 11 08:49:11 node1 corosync[1814]: cman killed by node 2 because we
>> > were killed by cman_tool or other application
>> > Sep 11 08:49:11 node1 fenced[1875]: telling cman to remove nodeid 2
>> > from cluster
>> > Sep 11 08:49:11 node1 fenced[1875]: cluster is down, exiting
>> > Sep 11 08:49:11 node1 gfs_controld[1950]: cluster is down, exiting
>> > Sep 11 08:49:11 node1 gfs_controld[1950]: daemon cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 gfs_controld[1950]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cluster is down, exiting
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: daemon cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 fenced[1875]: daemon cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 rgmanager[2409]: #67: Shutting down uncleanly
>> > Sep 11 08:49:11 node1 rgmanager[17059]: [clusterfs] unmounting /Data
>> > Sep 11 08:49:11 node1 rgmanager[17068]: [clusterfs] Sending SIGTERM to
>> > processes on /Data
>> > Sep 11 08:49:16 node1 rgmanager[17104]: [clusterfs] unmounting /Data
>> > Sep 11 08:49:16 node1 rgmanager[17113]: [clusterfs] Sending SIGKILL to
>> > processes on /Data
>> > Sep 11 08:49:19 node1 kernel: dlm: closing connection to node 2
>> > Sep 11 08:49:19 node1 kernel: dlm: closing connection to node 1
>> > Sep 11 08:49:19 node1 kernel: dlm: gfs2: no userland control daemon,
>> > stopping lockspace
>> > Sep 11 08:49:22 node1 rgmanager[17149]: [clusterfs] unmounting /Data
>> > Sep 11 08:49:22 node1 rgmanager[17158]: [clusterfs] Sending SIGKILL to
>> > processes on /Data
>> >
>> >
>> >
>> > Also when I try to restart the cman service, below error comes.
>> > Starting cluster:
>> > Checking if cluster has been disabled at boot... [ OK ]
>> > Checking Network Manager... [ OK ]
>> > Global setup... [ OK ]
>> > Loading kernel modules... [ OK ]
>> > Mounting configfs... [ OK ]
>> > Starting cman... [ OK ]
>> > Waiting for quorum... [ OK ]
>> > Starting fenced... [ OK ]
>> > Starting dlm_controld... [ OK ]
>> > Starting gfs_controld... [ OK ]
>> > Unfencing self... fence_node: cannot connect to cman
>> > [FAILED]
>> > Stopping cluster:
>> > Leaving fence domain... [ OK ]
>> > Stopping gfs_controld... [ OK ]
>> > Stopping dlm_controld... [ OK ]
>> > Stopping fenced... [ OK ]
>> > Stopping cman... [ OK ]
>> > Unloading kernel modules... [ OK ]
>> > Unmounting configfs... [ OK ]
>> >
>> > Thanks again.
>> > Renchu Mathew
>> > On Tue, Sep 11, 2012 at 9:10 PM, Arun Eapen CISSP, RHCA
>> > <arun at redhat.com> wrote:
>> >
>> >
>> >
>> > Put the fenced in debug mode and copy the error messages, for
>> > me to
>> > debug
>> >
>> > On Tue, 2012-09-11 at 11:52 +0400, Renchu Mathew wrote:
>> > > Hi Arun,
>> > >
>> > > I have done the RH436 course in conducted by you at Redhat
>> > b'lore. How
>> > > r u?
>> > >
>> > > I have configured a 2 node failover cluster setup (almost
>> > same like
>> > > our RH436 lab setup in b'lore) It is almost ok except
>> > fencing. If I
>> > > pull the active node network cable it is not switching to
>> > the other
>> > > automatically. It is getting hung. Then I have to do this
>> > manually. Is
>> > > there any script for creating the dummy fencing in RHCS
>> > which will
>> > > restart or shutdown the other node. Please find attached my
>> > > cluster.conf file. is there anyway we can power fence using
>> > APC UPS.
>> > >
>> > > Could you please help me if you get some time.
>> > >
>> > > Thanks and regards
>> > > Renchu Mathew
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > Arun Eapen
>> > CISSP, RHC{A,DS,E,I,SS,VA,X}
>> > Senior Technical Consultant & Certification Poobah
>> > Red Hat India Pvt. Ltd.,
>> > No - 4/1, Bannergatta Road,
>> > IBC Knowledge Park,
>> > 11th floor, Tower D,
>> > Bangalore - 560029, INDIA.
>> >
>> >
>> >
>>
>>
>> --
>> Arun Eapen
>> CISSP, RHC{A,DS,E,I,SS,VA,X}
>> Senior Technical Consultant & Certification Poobah
>> Red Hat India Pvt. Ltd.,
>> No - 4/1, Bannergatta Road,
>> IBC Knowledge Park,
>> 11th floor, Tower D,
>> Bangalore - 560029, INDIA.
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121003/ecf3f31c/attachment.htm>
More information about the Linux-cluster
mailing list