[Linux-cluster] Hi

Wed Oct 3 05:23:17 UTC 2012

A curious observation, there is a sudden surge of sending emails on private
addresses rather than sending over a mailing list.

Please send your doubts / questions on mailing list "
linux-cluster at redhat.com" instead of addressing personally.

Regarding configuration for manual fencing - I don't have it with me, it
was available with RHEL 5.5. Check it out in system-config-cluster tool if
you can add manual fencing.

Thanks,
Parvez

On Wed, Oct 3, 2012 at 10:46 AM, Renchu Mathew <renchumv at gmail.com> wrote:

>  Hi Purvez,
>
> I am trying to setup a test cluster environmet. But I haven't doen
> fencing. Please find below error messages. Some time after the nodes
> restarted, the other node is going down. can you please send me
> theconfiguration for manual fencing?
>
>
>>  > Please find attached my cluster setup. It is not stable
>> > and /var/log/messages shows the below errors.
>> >
>> >
>> > Sep 11 08:49:10 node1 corosync[1814]:   [QUORUM] Members[2]: 1 2
>> > Sep 11 08:49:10 node1 corosync[1814]:   [QUORUM] Members[2]: 1 2
>> > Sep 11 08:49:10 node1 corosync[1814]:   [CPG   ] chosen downlist:
>> > sender r(0) ip(192.168.1.251) ; members(old:2 left:1)
>> > Sep 11 08:49:10 node1 corosync[1814]:   [MAIN  ] Completed service
>> > synchronization, ready to provide service.
>> > Sep 11 08:49:11 node1 corosync[1814]: cman killed by node 2 because we
>> > were killed by cman_tool or other application
>> > Sep 11 08:49:11 node1 fenced[1875]: telling cman to remove nodeid 2
>> > from cluster
>> > Sep 11 08:49:11 node1 fenced[1875]: cluster is down, exiting
>> > Sep 11 08:49:11 node1 gfs_controld[1950]: cluster is down, exiting
>> > Sep 11 08:49:11 node1 gfs_controld[1950]: daemon cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 gfs_controld[1950]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cluster is down, exiting
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: daemon cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 fenced[1875]: daemon cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 rgmanager[2409]: #67: Shutting down uncleanly
>> > Sep 11 08:49:11 node1 rgmanager[17059]: [clusterfs] unmounting /Data
>> > Sep 11 08:49:11 node1 rgmanager[17068]: [clusterfs] Sending SIGTERM to
>> > processes on /Data
>> > Sep 11 08:49:16 node1 rgmanager[17104]: [clusterfs] unmounting /Data
>> > Sep 11 08:49:16 node1 rgmanager[17113]: [clusterfs] Sending SIGKILL to
>> > processes on /Data
>> > Sep 11 08:49:19 node1 kernel: dlm: closing connection to node 2
>> > Sep 11 08:49:19 node1 kernel: dlm: closing connection to node 1
>> > Sep 11 08:49:19 node1 kernel: dlm: gfs2: no userland control daemon,
>> > stopping lockspace
>> > Sep 11 08:49:22 node1 rgmanager[17149]: [clusterfs] unmounting /Data
>> > Sep 11 08:49:22 node1 rgmanager[17158]: [clusterfs] Sending SIGKILL to
>> > processes on /Data
>> >
>> >
>> >
>> > Also when I try to restart the cman service, below error comes.
>> > Starting cluster:
>> >    Checking if cluster has been disabled at boot...        [  OK  ]
>> >    Checking Network Manager...                             [  OK  ]
>> >    Global setup...                                         [  OK  ]
>> >    Loading kernel modules...                               [  OK  ]
>> >    Mounting configfs...                                    [  OK  ]
>> >    Starting cman...                                        [  OK  ]
>> >    Waiting for quorum...                                   [  OK  ]
>> >    Starting fenced...                                      [  OK  ]
>> >    Starting dlm_controld...                                [  OK  ]
>> >    Starting gfs_controld...                                [  OK  ]
>> >    Unfencing self... fence_node: cannot connect to cman
>> >                                                            [FAILED]
>> > Stopping cluster:
>> >    Leaving fence domain...                                 [  OK  ]
>> >    Stopping gfs_controld...                                [  OK  ]
>> >    Stopping dlm_controld...                                [  OK  ]
>> >    Stopping fenced...                                      [  OK  ]
>> >    Stopping cman...                                        [  OK  ]
>> >    Unloading kernel modules...                             [  OK  ]
>> >    Unmounting configfs...                                  [  OK  ]
>> >
>> > Thanks again.
>> > Renchu Mathew
>> > On Tue, Sep 11, 2012 at 9:10 PM, Arun Eapen CISSP, RHCA
>> > <arun at redhat.com> wrote:
>> >
>> >
>> >
>> >         Put the fenced in debug mode and copy the error messages, for
>> >         me to
>> >         debug
>> >
>> >         On Tue, 2012-09-11 at 11:52 +0400, Renchu Mathew wrote:
>> >         > Hi Arun,
>> >         >
>> >         > I have done the RH436 course in conducted by you at Redhat
>> >         b'lore. How
>> >         > r u?
>> >         >
>> >         > I have configured a 2 node failover cluster setup (almost
>> >         same like
>> >         > our RH436 lab setup in b'lore) It is almost ok except
>> >         fencing. If I
>> >         > pull the active node network cable it is not switching to
>> >         the other
>> >         > automatically. It is getting hung. Then I have to do this
>> >         manually. Is
>> >         > there any script for creating the dummy fencing in RHCS
>> >         which will
>> >         > restart or shutdown the other node. Please find attached my
>> >         > cluster.conf file. is there anyway we can power fence using
>> >         APC UPS.
>> >         >
>> >         > Could you please help me if you get some time.
>> >         >
>> >         > Thanks and regards
>> >         > Renchu Mathew
>> >         >
>> >         >
>> >         >
>> >
>> >
>> >
>> >         --
>> >         Arun Eapen
>> >         CISSP, RHC{A,DS,E,I,SS,VA,X}
>> >         Senior Technical Consultant & Certification Poobah
>> >         Red Hat India Pvt. Ltd.,
>> >         No - 4/1, Bannergatta Road,
>> >         IBC Knowledge Park,
>> >         11th floor, Tower D,
>> >         Bangalore - 560029, INDIA.
>> >
>> >
>> >
>>
>>
>> --
>> Arun Eapen
>> CISSP, RHC{A,DS,E,I,SS,VA,X}
>> Senior Technical Consultant & Certification Poobah
>> Red Hat India Pvt. Ltd.,
>> No - 4/1, Bannergatta Road,
>> IBC Knowledge Park,
>> 11th floor, Tower D,
>> Bangalore - 560029, INDIA.
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121003/ecf3f31c/attachment.htm>