[Linux-cluster] Help needed

Fri Jun 1 00:12:31 UTC 2012

Hi Digimer,
Thanks for your comment. I've got rid of the first problem, and now I have the following messages. Any idea?
Thanks in advance.
Ming

[root at shr295 ~]# tail -f /var/log/messages
May 31 16:56:01 shr295 dlm_controld[3375]: dlm_controld 3.0.12.1 started
May 31 16:56:11 shr295 fenced[3353]: daemon cpg_join error retrying
May 31 16:56:12 shr295 gfs_controld[3447]: gfs_controld 3.0.12.1 started
May 31 16:56:12 shr295 dlm_controld[3375]: daemon cpg_join error retrying
May 31 16:56:21 shr295 fenced[3353]: daemon cpg_join error retrying
May 31 16:56:22 shr295 dlm_controld[3375]: daemon cpg_join error retrying
May 31 16:56:22 shr295 gfs_controld[3447]: daemon cpg_join error retrying
May 31 16:56:31 shr295 fenced[3353]: daemon cpg_join error retrying
May 31 16:56:32 shr295 dlm_controld[3375]: daemon cpg_join error retrying
May 31 16:56:32 shr295 gfs_controld[3447]: daemon cpg_join error retrying
May 31 16:56:41 shr295 fenced[3353]: daemon cpg_join error retrying
May 31 16:56:42 shr295 dlm_controld[3375]: daemon cpg_join error retrying
May 31 16:56:42 shr295 gfs_controld[3447]: daemon cpg_join error retrying

-----Original Message-----
From: Digimer [mailto:lists at alteeve.ca]
Sent: Thursday, May 31, 2012 10:13 AM
To: Chen, Ming Ming
Cc: linux clustering
Subject: Re: [Linux-cluster] Help needed

On 05/31/2012 12:27 PM, Chen, Ming Ming wrote:
>  Hi, I have the following simple cluster config just to try out on SertOS 6.2
>
> <?xml version="1.0"?>
> <cluster config_version="2" name="vmcluster">
>       <logging debug="on"/>
>       <cman expected_votes="1" two_node="1"/>
>       <clusternodes>
>             <clusternode name="shr289.cup.hp.com" nodeid="1">
>                   <fence>
>                   </fence>
>             </clusternode>
>             <clusternode name="shr295.cup.hp.com" nodeid="2">
>                   <fence>
>                   </fence>
>             </clusternode>
>       </clusternodes>
>       <fencedevices>
>       </fencedevices>
>       <rm>
>       </rm>
> </cluster>
>
>
> And I got the following error message when I did "service cman start" I got the same messages on both nodes.
> Any help will be appreciated.
>
> May 31 09:08:04 corosync [TOTEM ] RRP multicast threshold (100 problem count)
> May 31 09:08:05 shr295 corosync[3542]:   [MAIN  ] Completed service synchronizat
> ion, ready to provide service.
> May 31 09:08:05 shr295 corosync[3542]:   [TOTEM ] A processor joined or left the
>  membership and a new membership was formed.
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Unable to load new config in c
> orosync: New configuration version has to be newer than current running configur
> ation
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Can't get updated config versi
> on 4: New configuration version has to be newer than current running configurati
> on#012.
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Activity suspended on this nod
> e
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Error reloading the configurat
> ion, will retry every second
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Node 1 conflict, remote config
>  version id=4, local=2
> -- VISUAL BLOCK --r295 corosync[3542]:   [CMAN  ] Unable to load new config in c
> orosync: New configuration version has to be newer than current running configur
> ation
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Can't get updated config versi
> on 4: New configuration version has to be newer than current running configurati
> on#012.
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Activity suspended on this nod
> E
>

Run 'cman_tool version' to get the current version of the configuration,
then increase the config_version="x" to be one higher.

Also, configure fencing! If you don't, your cluster will hang the first
time anything goes wrong.

--
Digimer
Papers and Projects: https://alteeve.com