[Linux-cluster] Help needed

Fri Jun 1 02:05:17 UTC 2012

Send your cluster.conf please, editing only password please. Please also
include you network configs.

On 05/31/2012 08:12 PM, Chen, Ming Ming wrote:
> Hi Digimer,
> Thanks for your comment. I've got rid of the first problem, and now I have the following messages. Any idea?
> Thanks in advance.
> Ming
> 
> [root at shr295 ~]# tail -f /var/log/messages
> May 31 16:56:01 shr295 dlm_controld[3375]: dlm_controld 3.0.12.1 started
> May 31 16:56:11 shr295 fenced[3353]: daemon cpg_join error retrying
> May 31 16:56:12 shr295 gfs_controld[3447]: gfs_controld 3.0.12.1 started
> May 31 16:56:12 shr295 dlm_controld[3375]: daemon cpg_join error retrying
> May 31 16:56:21 shr295 fenced[3353]: daemon cpg_join error retrying
> May 31 16:56:22 shr295 dlm_controld[3375]: daemon cpg_join error retrying
> May 31 16:56:22 shr295 gfs_controld[3447]: daemon cpg_join error retrying
> May 31 16:56:31 shr295 fenced[3353]: daemon cpg_join error retrying
> May 31 16:56:32 shr295 dlm_controld[3375]: daemon cpg_join error retrying
> May 31 16:56:32 shr295 gfs_controld[3447]: daemon cpg_join error retrying
> May 31 16:56:41 shr295 fenced[3353]: daemon cpg_join error retrying
> May 31 16:56:42 shr295 dlm_controld[3375]: daemon cpg_join error retrying
> May 31 16:56:42 shr295 gfs_controld[3447]: daemon cpg_join error retrying
> 
> -----Original Message-----
> From: Digimer [mailto:lists at alteeve.ca]
> Sent: Thursday, May 31, 2012 10:13 AM
> To: Chen, Ming Ming
> Cc: linux clustering
> Subject: Re: [Linux-cluster] Help needed
> 
> On 05/31/2012 12:27 PM, Chen, Ming Ming wrote:
>>  Hi, I have the following simple cluster config just to try out on SertOS 6.2
>>
>> <?xml version="1.0"?>
>> <cluster config_version="2" name="vmcluster">
>>       <logging debug="on"/>
>>       <cman expected_votes="1" two_node="1"/>
>>       <clusternodes>
>>             <clusternode name="shr289.cup.hp.com" nodeid="1">
>>                   <fence>
>>                   </fence>
>>             </clusternode>
>>             <clusternode name="shr295.cup.hp.com" nodeid="2">
>>                   <fence>
>>                   </fence>
>>             </clusternode>
>>       </clusternodes>
>>       <fencedevices>
>>       </fencedevices>
>>       <rm>
>>       </rm>
>> </cluster>
>>
>>
>> And I got the following error message when I did "service cman start" I got the same messages on both nodes.
>> Any help will be appreciated.
>>
>> May 31 09:08:04 corosync [TOTEM ] RRP multicast threshold (100 problem count)
>> May 31 09:08:05 shr295 corosync[3542]:   [MAIN  ] Completed service synchronizat
>> ion, ready to provide service.
>> May 31 09:08:05 shr295 corosync[3542]:   [TOTEM ] A processor joined or left the
>>  membership and a new membership was formed.
>> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Unable to load new config in c
>> orosync: New configuration version has to be newer than current running configur
>> ation
>> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Can't get updated config versi
>> on 4: New configuration version has to be newer than current running configurati
>> on#012.
>> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Activity suspended on this nod
>> e
>> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Error reloading the configurat
>> ion, will retry every second
>> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Node 1 conflict, remote config
>>  version id=4, local=2
>> -- VISUAL BLOCK --r295 corosync[3542]:   [CMAN  ] Unable to load new config in c
>> orosync: New configuration version has to be newer than current running configur
>> ation
>> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Can't get updated config versi
>> on 4: New configuration version has to be newer than current running configurati
>> on#012.
>> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Activity suspended on this nod
>> E
>>
> 
> Run 'cman_tool version' to get the current version of the configuration,
> then increase the config_version="x" to be one higher.
> 
> Also, configure fencing! If you don't, your cluster will hang the first
> time anything goes wrong.
> 
> --
> Digimer
> Papers and Projects: https://alteeve.com

-- 
Digimer
Papers and Projects: https://alteeve.com