[Linux-cluster] Two nodes DRBD - Fail-Over Actif/Passif Cluster.
vincent.blondel at ing.be
vincent.blondel at ing.be
Wed Feb 16 05:55:35 UTC 2011
>>> below the cluster.conf file ...
>>>
>>>
>>> <?xml version="1.0"?>
>>> <cluster name="cluster" config_version="6">
>>> <!-- post_join_delay: number of seconds the daemon will wait before
>>> fencing any victims after a node joins the domain
>>> post_fail_delay: number of seconds the daemon will wait before
>>> fencing any victims after a domain member fails
>>> clean_start : prevent any startup fencing the daemon might do.
>>> It indicates that the daemon should assume all nodes
>>> are in a clean state to start. -->
>>> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>> <clusternodes>
>>> <clusternode name="reporter1.lab.intranet" votes="1" nodeid="1">
>>> <fence>
>>> <!-- Handle fencing manually -->
>>> <method name="human">
>>> <device name="human" nodename="reporter1.lab.intranet"/>
>>> </method>
>>> </fence>
>>> </clusternode>
>>> <clusternode name="reporter2.lab.intranet" votes="1" nodeid="2">
>>> <fence>
>>> <!-- Handle fencing manually -->
>>> <method name="human">
>>> <device name="human" nodename="reporter2.lab.intranet"/>
>>> </method>
>>> </fence>
>>> </clusternode>
>>> </clusternodes>
>>> <!-- cman two nodes specification -->
>>> <cman expected_votes="1" two_node="1"/>
>>> <fencedevices>
>>> <!-- Define manual fencing -->
>>> <fencedevice name="human" agent="fence_manual"/>
>>> </fencedevices>
>>> <rm>
>>> <failoverdomains>
>>> <failoverdomain name="example_pri" nofailback="0" ordered="1" restricted="0">
>>> <failoverdomainnode name="reporter1.lab.intranet" priority="1"/>
>>> <failoverdomainnode name="reporter2.lab.intranet" priority="2"/>
>>> </failoverdomain>
>>> </failoverdomains>
>>> <resources>
>>> <ip address="10.30.30.92" monitor_link="on" sleeptime="10"/>
>>> <apache config_file="conf/httpd.conf" name="example_server" server_root="/etc/httpd" shutdown_wait="0"/>
>>> </resources>
>>> <service autostart="1" domain="example_pri" exclusive="0" name="example_apache" recovery="relocate">
>>> <ip ref="10.30.30.92"/>
>>> <apache ref="example_server"/>
>>> </service>
>>> </rm>
>>> </cluster>
>>>
>>> and this is the result I get on both servers ...
>>>
>>> [root at reporter1 ~]# clustat
>>> Cluster Status for cluster @ Mon Feb 14 22:22:53 2011
>>> Member Status: Quorate
>>>
>>> Member Name ID Status
>>> ------ ---- ---- ------
>>> reporter1.lab.intranet 1 Online, Local, rgmanager
>>> reporter2.lab.intranet 2 Online, rgmanager
>>>
>>> Service Name Owner (Last) State
>>> ------- ---- ----- ------ -----
>>> service:example_apache (none) stopped
>>>
>>> as you can see, everything is stopped or in other words nothing runs .. so my question are :
>
>Having a read through /var/log/messages for possible causes would be a
>good start.
>
this is what I see in the /var/log/messages file ...
Feb 16 07:36:54 reporter1 corosync[1250]: [MAIN ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service.
Feb 16 07:36:54 reporter1 corosync[1250]: [MAIN ] Corosync built-in features: nss rdma
Feb 16 07:36:54 reporter1 corosync[1250]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Feb 16 07:36:54 reporter1 corosync[1250]: [MAIN ] Successfully parsed cman config
Feb 16 07:36:54 reporter1 corosync[1250]: [TOTEM ] Initializing transport (UDP/IP).
Feb 16 07:36:54 reporter1 corosync[1250]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Feb 16 07:36:55 reporter1 corosync[1250]: [TOTEM ] The network interface [10.30.30.90] is now up.
Feb 16 07:36:55 reporter1 corosync[1250]: [QUORUM] Using quorum provider quorum_cman
Feb 16 07:36:55 reporter1 corosync[1250]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Feb 16 07:36:55 reporter1 corosync[1250]: [CMAN ] CMAN 3.0.12 (built Aug 17 2010 14:08:49) started
Feb 16 07:36:55 reporter1 corosync[1250]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90
Feb 16 07:36:55 reporter1 corosync[1250]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Feb 16 07:36:55 reporter1 corosync[1250]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Feb 16 07:36:55 reporter1 corosync[1250]: [SERV ] Service engine loaded: corosync configuration service
Feb 16 07:36:55 reporter1 corosync[1250]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Feb 16 07:36:55 reporter1 corosync[1250]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Feb 16 07:36:55 reporter1 corosync[1250]: [SERV ] Service engine loaded: corosync profile loading service
Feb 16 07:36:55 reporter1 corosync[1250]: [QUORUM] Using quorum provider quorum_cman
Feb 16 07:36:55 reporter1 corosync[1250]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Feb 16 07:36:55 reporter1 corosync[1250]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Feb 16 07:36:55 reporter1 corosync[1250]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 16 07:36:55 reporter1 corosync[1250]: [CMAN ] quorum regained, resuming activity
Feb 16 07:36:55 reporter1 corosync[1250]: [QUORUM] This node is within the primary component and will provide service.
Feb 16 07:36:55 reporter1 corosync[1250]: [QUORUM] Members[1]: 1
Feb 16 07:36:55 reporter1 corosync[1250]: [QUORUM] Members[1]: 1
Feb 16 07:36:55 reporter1 corosync[1250]: [CPG ] downlist received left_list: 0
Feb 16 07:36:55 reporter1 corosync[1250]: [CPG ] chosen downlist from node r(0) ip(10.30.30.90)
Feb 16 07:36:55 reporter1 corosync[1250]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 16 07:36:56 reporter1 fenced[1302]: fenced 3.0.12 started
Feb 16 07:36:57 reporter1 dlm_controld[1319]: dlm_controld 3.0.12 started
Feb 16 07:36:57 reporter1 gfs_controld[1374]: gfs_controld 3.0.12 started
Feb 16 07:37:03 reporter1 corosync[1250]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 16 07:37:03 reporter1 corosync[1250]: [QUORUM] Members[2]: 1 2
Feb 16 07:37:03 reporter1 corosync[1250]: [QUORUM] Members[2]: 1 2
Feb 16 07:37:03 reporter1 corosync[1250]: [CPG ] downlist received left_list: 0
Feb 16 07:37:03 reporter1 corosync[1250]: [CPG ] downlist received left_list: 0
Feb 16 07:37:03 reporter1 corosync[1250]: [CPG ] chosen downlist from node r(0) ip(10.30.30.90)
>>> do I have to configure manually load balanced ip 10.30.30.92 as an alias ip on both sides or is it done automatically by redhat cluster ?
>
>RHCS will automatically assign the IP to an interface that is on the
>same subnet. You most definitely shouldn't create the IP manually on any
>of the nodes.
>
>>> I just made a simple try with apache but I do not find anywhere reference to the start/stop script for apache in the examples, is that normal ??
>>> do you have some best practice regarding this picture ??
>
>I'm not familiar with the <apache> tag in cluster.conf, I usually
>configure most things as init script resources.
>
>Gordan
-----------------------------------------------------------------
ATTENTION:
This e-mail is intended for the exclusive use of the
recipient(s). This e-mail and its attachments, if any, contain
confidential information and/or information protected by
intellectual property rights or other rights. This e-mail does
not constitute any commitment for ING Belgium except when
expressly otherwise agreed in a written agreement between the
intended recipient and ING Belgium.
If you receive this message by mistake, please, notify the sender
with the "reply" option and delete immediately this e-mail from
your system, and destroy all copies of it. You may not, directly
or indirectly, use this e-mail or any part of it if you are not
the intended recipient.
Messages and attachments are scanned for all viruses known. If
this message contains password-protected attachments, the files
have NOT been scanned for viruses by the ING mail domain. Always
scan attachments before opening them.
-----------------------------------------------------------------
ING Belgium SA/NV - Bank/Lender - Avenue Marnix 24, B-1000
Brussels, Belgium - Brussels RPM/RPR - VAT BE 0403.200.393 -
BIC (SWIFT) : BBRUBEBB - Account: 310-9156027-89 (IBAN BE45 3109
1560 2789).
An insurance broker, registered with the Banking, Finance and
Insurance Commission under the code number 12381A.
ING Belgique SA - Banque/Preteur, Avenue Marnix 24, B-1000
Bruxelles - RPM Bruxelles - TVA BE 0403 200 393 - BIC (SWIFT) :
BBRUBEBB - Compte: 310-9156027-89 (IBAN: BE45 3109 1560 2789).
Courtier d'assurances inscrit a la CBFA sous le numero 12381A.
ING Belgie NV - Bank/Kredietgever - Marnixlaan 24, B-1000 Brussel
- RPR Brussel - BTW BE 0403.200.393 - BIC (SWIFT) : BBRUBEBB -
Rekening: 310-9156027-89 (IBAN: BE45 3109 1560 2789).
Verzekeringsmakelaar ingeschreven bij de CBFA onder het nr.
12381A.
-----------------------------------------------------------------
More information about the Linux-cluster
mailing list