[Linux-cluster] RHEL Cluster node fencing and cluster membership

POWERBALL ONLINE sakect at gmail.com
Sat Jun 26 16:54:33 UTC 2010


Hi ,

Are you select don't fail back in cluster policy?
What tool you use for create cluster luci or system-config-cluster?
Do you have quorum disk?

Regards,

Somsak (Linux Specialist HP Thailand)

On Sat, Jun 26, 2010 at 10:39 PM, Rajkumar, Anoop
<anoop_rajkumar at merck.com>wrote:

>  Hi
>
> I have two dl585 with shared storage from MSA 1000 in a two node rhel 5.3
> cluster. Priority in cluster.conf are like below.
>
> <failoverdomainnode name="usrylxap237.merck.com" priority="1"/>
>                                 <failoverdomainnode name="
> usrylxap238.merck.com" priority="2"/>
>
> Whenever lower priority node usrylxap238 Is rebooted it kills cman on
> usrylxap237 (Higher priority node) and fence it causing reboot of it.
> Message I see in /var/log/messages of higher priority node is
>
> Jun 26 11:02:36 usrylxap237 openais[4750]: [CMAN ] cman killed by node 2
> because we rejoined the cluster without a full restart
>
> Jun 26 11:03:57 usrylxap237 openais[27373]: [CMAN ] cman killed by node 1
> because we were killed by cman_tool or other application
>
> After reboot when higher priority node usrylxap237 comes up it tranfers
> services from lower priority node to itself and everything works fine for
> some time. Then I see following message in /var/log/messages of higher
> priority node running services.
>
> Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] The token was lost in
> the OPERATIONAL state.
> Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Receive multicast socket
> recv buffer size (2880
> 00 bytes).
> Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Transmit multicast
> socket send buffer size (288
> 000 bytes).
> Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
> from 2.
> Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Creating commit token
> because I am the rep.
> Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Saving state aru 17 high
> seq received 17
> Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id
> for ring 420
> Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
> Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] The token was lost in
> the COMMIT state.
> Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
> from 4.
> Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] Creating commit token
> because I am the rep.
> Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id
> for ring 424
> Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
> Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] The token was lost in
> the COMMIT state.
> Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
> from 4.
> Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] Creating commit token
> because I am the rep.
> Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id
> for ring 428
> Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] entering RECOVERY state.
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] position [0] member
> 54.3.254.237:
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] previous ring seq 1052
> rep 54.3.254.237
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] aru 17 high delivered 17
> received flag 1
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] position [1] member
> 54.3.254.238:
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] previous ring seq 1052
> rep 54.3.254.237
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] aru 17 high delivered 17
> received flag 1
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] Did not need to
> originate any messages in recov
> ery.
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] Sending initial ORF
> token
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] CLM CONFIGURATION CHANGE
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] New Configuration:
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ]      r(0)
> ip(54.3.254.237)
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ]      r(0)
> ip(54.3.254.238)
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] Members Left:
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] Members Joined:
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] CLM CONFIGURATION CHANGE
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] New Configuration:
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ]      r(0)
> ip(54.3.254.237)
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ]      r(0)
> ip(54.3.254.238)
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] Members Left:
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] Members Joined:
> Jun 26 09:24:54 usrylxap237 openais[5792]: [SYNC ] This node is within the
> primary component and w
> ill provide service.
> Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] entering OPERATIONAL
> state.
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] got nodejoin message
> 54.3.254.237
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM  ] got nodejoin message
> 54.3.254.238
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CPG  ] got joinlist message
> from node 1
> Jun 26 09:24:54 usrylxap237 openais[5792]: [CPG  ] got joinlist message
> from node 2
> Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] The token was lost in
> the OPERATIONAL state.
> Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Receive multicast socket
> recv buffer size (2880
> 00 bytes).
> Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Transmit multicast
> socket send buffer size (288
> 000 bytes).
> Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
> from 2.
> Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Creating commit token
> because I am the rep.
> Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Saving state aru 17 high
> seq received 17
> Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id
> for ring 42c
> Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Creating commit token
> because I am the rep.
> Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id
> for ring 430
> Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
> from 13.
> Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Creating commit token
> because I am the rep.
> Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id
> for ring 434
> Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] Creating commit token
> because I am the rep.
> Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id
> for ring 438
> Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] entering GATHER state
> from 13.
> Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] Creating commit token
> because I am the rep.
>
> On the second node I can see
>
> Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
> from 12.
> Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] Saving state aru 17 high
> seq received 17
> Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id
> for ring 420
> Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
> Jun 26 09:24:36 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
> from 13.
> Jun 26 09:24:36 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id
> for ring 424
> Jun 26 09:24:36 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
> Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] The token was lost in
> the COMMIT state.
> Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
> from 4.
> Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id
> for ring 428
> Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
> Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] entering RECOVERY state.
> Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] position [0] member
> 54.3.254.237:
> Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq 1052
> rep 54.3.254.237
> Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered 17
> received flag 1
> Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] position [1] member
> 54.3.254.238:
> Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq 1052
> rep 54.3.254.237
> Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered 17
> received flag 1
> Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] Did not need to
> originate any messages in re
> covery.
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] CLM CONFIGURATION CHANGE
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] New Configuration:
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
> ip(54.3.254.237)
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
> ip(54.3.254.238)
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] Members Left:
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] Members Joined:
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] CLM CONFIGURATION CHANGE
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] New Configuration:
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
> ip(54.3.254.237)
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
> ip(54.3.254.238)
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] Members Left:
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] Members Joined:
> Jun 26 09:24:54 usrylxap238 openais[5725]: [SYNC ] This node is within the
> primary component an
> d will provide service.
> Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] entering OPERATIONAL
> state.
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] got nodejoin message
> 54.3.254.237
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM  ] got nodejoin message
> 54.3.254.238
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CPG  ] got joinlist message
> from node 1
> Jun 26 09:24:54 usrylxap238 openais[5725]: [CPG  ] got joinlist message
> from node 2
> Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
> from 12.
> Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] Saving state aru 17 high
> seq received 17
> Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id
> for ring 42c
> Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] The token was lost in
> the COMMIT state.
> Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
> from 4.
> Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id
> for ring 430
> Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
> from 13.
> Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id
> for ring 434
> Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] The token was lost in
> the COMMIT state.
> Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
> from 4.
> Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id
> for ring 438
> Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
> from 13.
> Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id
> for ring 43c
> Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] The token was lost in
> the COMMIT state.
> Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] entering GATHER state
> from 4.
> Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id
> for ring 440
> Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
> Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] entering RECOVERY state.
> Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] position [0] member
> 54.3.254.237:
> Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq 1064
> rep 54.3.254.237
> Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered 17
> received flag 1
> Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] position [1] member
> 54.3.254.238:
> Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq 1064
> rep 54.3.254.237
> Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered 17
> received flag 1
> Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] Did not need to
> originate any messages in re
> covery.
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] CLM CONFIGURATION CHANGE
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] New Configuration:
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
> ip(54.3.254.237)
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
> ip(54.3.254.238)
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] Members Left:
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] Members Joined:
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] CLM CONFIGURATION CHANGE
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] New Configuration:
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
> ip(54.3.254.237)
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ]      r(0)
> ip(54.3.254.238)
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] Members Left:
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] Members Joined:
> Jun 26 09:25:54 usrylxap238 openais[5725]: [SYNC ] This node is within the
> primary component an
> d will provide service.
> Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] entering OPERATIONAL
> state.
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] got nodejoin message
> 54.3.254.237
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM  ] got nodejoin message
> 54.3.254.238
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CPG  ] got joinlist message
> from node 1
> Jun 26 09:25:54 usrylxap238 openais[5725]: [CPG  ] got joinlist message
> from node 2
>
> Now my cluster is messed up. Even though clustat and cman_tool show
> everything is fine. As I can not move services between the node (they are
> running fine on present node). It even does not give any error message when
> I try to move them using clusvcadm.
>
> [root at usrylxap238 ~]# clustat
> Cluster Status for cluster1 @ Sat Jun 26 11:25:12 2010
> Member Status: Quorate
>
>  Member Name                             ID   Status
>  ------ ----                             ---- ------
>  usrylxap237.merck.com                       1 Online, rgmanager
>  usrylxap238.merck.com                       2 Online, Local, rgmanager
>
>  Service Name                   Owner (Last)                   State
>  ------- ----                   ----- ------                   -----
>  service:http-service           usrylxap237.merck.com          started
>  service:mysql                  usrylxap237.merck.com          started
> [root at usrylxap238 ~]# cman_tool status
> Version: 6.1.0
> Config Version: 32
> Cluster Name: cluster1
> Cluster Id: 26777
> Cluster Member: Yes
> Cluster Generation: 1276
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 1
> Total votes: 2
> Quorum: 1
> Active subsystems: 9
> Flags: 2node Dirty
> Ports Bound: 0 11 177
> Node name: usrylxap238.merck.com
> Node ID: 2
> Multicast addresses: 239.192.104.2
> Node addresses: 54.3.254.238
>
> I have clvmd running with locking_type = 3 and gfs2 file system mounted
> (using dlm) which now is hanging on higher priority node but is fine on
> lower priority node (Which seems is not part of cluster now).
>
> [root at usrylxap237 ~]# service gfs2 status
> Active GFS2 mountpoints:
> /oracluster1
>
> [root at usrylxap238 ~]# service gfs2 status
> Configured GFS2 mountpoints:
> /oracluster1
> Active GFS2 mountpoints:
> /oracluster1
>
> Not sure why cluster is loosing membership and getting staled and GFS file
> system is not accessible.
>
> Thanks
> Anoop
>
> Notice:  This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
> New Jersey, USA 08889), and/or its affiliates Direct contact information
> for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential,
> proprietary copyrighted and/or legally privileged. It is intended solely
> for the use of the individual or entity named on this message. If you are
> not the intended recipient, and have received this message in error,
> please notify us immediately by reply e-mail and then delete it from
> your system.
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100626/ad4871ba/attachment.htm>


More information about the Linux-cluster mailing list