[Linux-cluster] Necessary a delay to restart cman?
Chrissie Caulfield
ccaulfie at redhat.com
Wed May 6 12:01:21 UTC 2009
Miguel Sanchez wrote:
> Hi. I have a CentOS 5.3 cluster with two nodes. If I execute service
> cman restart within a node, or stop + start after few seconds, another
> node doesn´t recognize this membership return and its fellow stay
> forever offline.
>
> For example:
>
> * Before cman restart:
>
> node1# cman_tool status
> Version: 6.1.0
> Config Version: 6
> Cluster Name: CSVirtualizacion
> Cluster Id: 42648
> Cluster Member: Yes
> Cluster Generation: 202600
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 1
> Total votes: 2
> Quorum: 1
> Active subsystems: 7
> Flags: 2node Dirty
> Ports Bound: 0
> Node name: patty
> Node ID: 1
> Multicast addresses: 224.0.0.133
> Node addresses: 138.100.8.70
>
> * After cman stop for node2 (and before a number seconds < token parameter)
>
> node1# cman_tool status
> Version: 6.1.0
> Config Version: 6
> Cluster Name: CSVirtualizacion
> Cluster Id: 42648
> Cluster Member: Yes
> Cluster Generation: 202600
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 1
> Total votes: 1
> Quorum: 1
> Active subsystems: 7
> Flags: 2node Dirty
> Ports Bound: 0
> Node name: patty
> Node ID: 1
> Multicast addresses: 224.0.0.133
> Node addresses: 138.100.8.70
> Wed May 6 12:29:38 CEST 2009
>
> * After cman stop for node2 (and after a number seconds > token parameter)
>
> node1# date; cman_tool status
> Version: 6.1.0
> Config Version: 6
> Cluster Name: CSVirtualizacion
> Cluster Id: 42648
> Cluster Member: Yes
> Cluster Generation: 202604
> Membership state: Cluster-Member
> Nodes: 1
> Expected votes: 1
> Total votes: 1
> Quorum: 1
> Active subsystems: 7
> Flags: 2node Dirty
> Ports Bound: 0
> Node name: patty
> Node ID: 1
> Multicast addresses: 224.0.0.133
> Node addresses: 138.100.8.70
> Wed May 6 12:29:47 CEST 2009
>
> /var/log/messages:
> May 6 12:35:20 node2 openais[17262]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> May 6 12:35:20 node2 openais[17262]: [TOTEM] Receive multicast socket
> recv buffer size (288000 bytes).
> May 6 12:35:20 node2 openais[17262]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> May 6 12:35:20 node2 openais[17262]: [TOTEM] entering GATHER state from 2.
> May 6 12:35:25 node2 openais[17262]: [TOTEM] entering GATHER state from 0.
> May 6 12:35:25 node2 openais[17262]: [TOTEM] Creating commit token
> because I am the rep.
> May 6 12:35:25 node2 openais[17262]: [TOTEM] Saving state aru 26 high
> seq received 26
> May 6 12:35:25 node2 openais[17262]: [TOTEM] Storing new sequence id
> for ring 31780
> May 6 12:35:25 node2 openais[17262]: [TOTEM] entering COMMIT state.
> May 6 12:35:25 node2 openais[17262]: [TOTEM] entering RECOVERY state.
> May 6 12:35:25 node2 openais[17262]: [TOTEM] position [0] member
> 10.10.8.70:
> May 6 12:35:25 node2 openais[17262]: [TOTEM] previous ring seq 202620
> rep 10.10.8.70
> May 6 12:35:25 node2 openais[17262]: [TOTEM] aru 26 high delivered 26
> received flag 1
> May 6 12:35:25 node2 openais[17262]: [TOTEM] Did not need to originate
> any messages in recovery.
> May 6 12:35:25 node2 openais[17262]: [TOTEM] Sending initial ORF token
> May 6 12:35:25 node2 openais[17262]: [CLM ] CLM CONFIGURATION CHANGE
> May 6 12:35:25 node2 openais[17262]: [CLM ] New Configuration:
> May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.70)
> May 6 12:35:25 node2 openais[17262]: [CLM ] Members Left:
> May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.71)
> May 6 12:35:25 node2 openais[17262]: [CLM ] Members Joined:
> May 6 12:35:25 node2 openais[17262]: [CLM ] CLM CONFIGURATION CHANGE
> May 6 12:35:25 node2 openais[17262]: [CLM ] New Configuration:
> May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.70)
> May 6 12:35:25 node2 openais[17262]: [CLM ] Members Left:
> May 6 12:35:25 node2 openais[17262]: [CLM ] Members Joined:
> May 6 12:35:25 node2 openais[17262]: [SYNC ] This node is within the
> primary component and will provide service.
> May 6 12:35:25 node2 openais[17262]: [TOTEM] entering OPERATIONAL state.
> May 6 12:35:25 node2 kernel: dlm: closing connection to node 2
> May 6 12:35:25 node2 openais[17262]: [CLM ] got nodejoin message
> 10.10.8.70
> May 6 12:35:25 node2 openais[17262]: [CPG ] got joinlist message from
> node 1
>
>
> if node2 doesn`t wait for run cman start to the detection the
> operational token's lost, node1 detect node2 like offline forever.
> Following attempts for cman restarts don`t change this state:
> node1# cman_tool nodes
> Node Sts Inc Joined Name
> 1 M 202616 2009-05-06 12:34:43 node1
> 2 X 202628 node2
> node2# cman_tool nodes
> Node Sts Inc Joined Name
> 1 M 202644 2009-05-06 12:51:04 node1
> 2 M 202640 2009-05-06 12:51:04 node2
>
>
> Is it necessary a delay for cman stop + start to avoid this inconsistent
> state or really is it a bug?
I suspect it's an instance of this known bug. Check that CentOS has the
appropriate patch available:
https://bugzilla.redhat.com/show_bug.cgi?id=485026
Chrissie
More information about the Linux-cluster
mailing list