[Linux-cluster] Necessary a delay to restart cman?

Harri.Paivaniemi at tieto.com Harri.Paivaniemi at tieto.com
Wed May 6 14:30:45 UTC 2009


Hi,

Just fyi:

I had a similar problem in the past and I made a support request to RH support. They said you have to wait totem token- time after stop before starting again, or it's not giong to work...

I wonder if this is correct...


-hjp
 


-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Adam Hough
Sent: Wed 5/6/2009 15:59
To: linux clustering
Subject: Re: [Linux-cluster] Necessary a delay to restart cman?
 
On Wed, May 6, 2009 at 7:01 AM, Chrissie Caulfield <ccaulfie at redhat.com> wrote:
> Miguel Sanchez wrote:
>> Hi. I have a CentOS 5.3 cluster with two nodes. If I execute service
>> cman restart within a node, or stop + start after few seconds, another
>> node doesn´t recognize this membership return and its fellow stay
>> forever offline.
>>
>> For example:
>>
>> * Before cman restart:
>>
>> node1# cman_tool status
>> Version: 6.1.0
>> Config Version: 6
>> Cluster Name: CSVirtualizacion
>> Cluster Id: 42648
>> Cluster Member: Yes
>> Cluster Generation: 202600
>> Membership state: Cluster-Member
>> Nodes: 2
>> Expected votes: 1
>> Total votes: 2
>> Quorum: 1
>> Active subsystems: 7
>> Flags: 2node Dirty
>> Ports Bound: 0
>> Node name: patty
>> Node ID: 1
>> Multicast addresses: 224.0.0.133
>> Node addresses: 138.100.8.70
>>
>> * After cman stop for node2 (and before a number seconds < token parameter)
>>
>> node1# cman_tool status
>> Version: 6.1.0
>> Config Version: 6
>> Cluster Name: CSVirtualizacion
>> Cluster Id: 42648
>> Cluster Member: Yes
>> Cluster Generation: 202600
>> Membership state: Cluster-Member
>> Nodes: 2
>> Expected votes: 1
>> Total votes: 1
>> Quorum: 1
>> Active subsystems: 7
>> Flags: 2node Dirty
>> Ports Bound: 0
>> Node name: patty
>> Node ID: 1
>> Multicast addresses: 224.0.0.133
>> Node addresses: 138.100.8.70
>> Wed May  6 12:29:38 CEST 2009
>>
>> * After cman stop for node2 (and after a number seconds > token parameter)
>>
>> node1# date; cman_tool status
>> Version: 6.1.0
>> Config Version: 6
>> Cluster Name: CSVirtualizacion
>> Cluster Id: 42648
>> Cluster Member: Yes
>> Cluster Generation: 202604
>> Membership state: Cluster-Member
>> Nodes: 1
>> Expected votes: 1
>> Total votes: 1
>> Quorum: 1
>> Active subsystems: 7
>> Flags: 2node Dirty
>> Ports Bound: 0
>> Node name: patty
>> Node ID: 1
>> Multicast addresses: 224.0.0.133
>> Node addresses: 138.100.8.70
>> Wed May  6 12:29:47 CEST 2009
>>
>> /var/log/messages:
>> May  6 12:35:20 node2 openais[17262]: [TOTEM] The token was lost in the
>> OPERATIONAL state.
>> May  6 12:35:20 node2 openais[17262]: [TOTEM] Receive multicast socket
>> recv buffer size (288000 bytes).
>> May  6 12:35:20 node2 openais[17262]: [TOTEM] Transmit multicast socket
>> send buffer size (262142 bytes).
>> May  6 12:35:20 node2 openais[17262]: [TOTEM] entering GATHER state from 2.
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] entering GATHER state from 0.
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] Creating commit token
>> because I am the rep.
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] Saving state aru 26 high
>> seq received 26
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] Storing new sequence id
>> for ring 31780
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] entering COMMIT state.
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] entering RECOVERY state.
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] position [0] member
>> 10.10.8.70:
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] previous ring seq 202620
>> rep 10.10.8.70
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] aru 26 high delivered 26
>> received flag 1
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] Did not need to originate
>> any messages in recovery.
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] Sending initial ORF token
>> May  6 12:35:25 node2 openais[17262]: [CLM  ] CLM CONFIGURATION CHANGE
>> May  6 12:35:25 node2 openais[17262]: [CLM  ] New Configuration:
>> May  6 12:35:25 node2 openais[17262]: [CLM  ]   r(0) ip(10.10.8.70)
>> May  6 12:35:25 node2 openais[17262]: [CLM  ] Members Left:
>> May  6 12:35:25 node2 openais[17262]: [CLM  ]   r(0) ip(10.10.8.71)
>> May  6 12:35:25 node2 openais[17262]: [CLM  ] Members Joined:
>> May  6 12:35:25 node2 openais[17262]: [CLM  ] CLM CONFIGURATION CHANGE
>> May  6 12:35:25 node2 openais[17262]: [CLM  ] New Configuration:
>> May  6 12:35:25 node2 openais[17262]: [CLM  ]   r(0) ip(10.10.8.70)
>> May  6 12:35:25 node2 openais[17262]: [CLM  ] Members Left:
>> May  6 12:35:25 node2 openais[17262]: [CLM  ] Members Joined:
>> May  6 12:35:25 node2 openais[17262]: [SYNC ] This node is within the
>> primary component and will provide service.
>> May  6 12:35:25 node2 openais[17262]: [TOTEM] entering OPERATIONAL state.
>> May  6 12:35:25 node2 kernel: dlm: closing connection to node 2
>> May  6 12:35:25 node2 openais[17262]: [CLM  ] got nodejoin message
>> 10.10.8.70
>> May  6 12:35:25 node2 openais[17262]: [CPG  ] got joinlist message from
>> node 1
>>
>>
>> if node2 doesn`t wait for run cman start to the detection the
>> operational token's lost, node1 detect node2 like offline forever.
>> Following attempts for cman restarts don`t change this state:
>> node1# cman_tool nodes
>> Node  Sts   Inc   Joined               Name
>>   1   M  202616   2009-05-06 12:34:43  node1
>>   2   X  202628                        node2
>> node2# cman_tool nodes
>> Node  Sts   Inc   Joined               Name
>>   1   M  202644   2009-05-06 12:51:04  node1
>>   2   M  202640   2009-05-06 12:51:04  node2
>>
>>
>> Is it necessary a delay for cman stop + start to avoid this inconsistent
>> state or really is it a bug?
>
>
> I suspect it's an instance of this known bug. Check that CentOS has the
> appropriate patch available:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=485026
>
> Chrissie
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


When restarting cman, I have always had to stop cman and then manually
stop openais before trying to start cman again.   If I do not follow
these steps then the node would never rejoin the cluster or might
fence the other node.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 5251 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090506/7b5ffd35/attachment.bin>


More information about the Linux-cluster mailing list