[Linux-cluster] Problems on new cluster

Fri Nov 26 10:03:55 UTC 2010

Fresh rhel 5.4 installation on 2 nodes, clustering and cluster storage
groups, Oracle Rac installed on both systems.
When we try to configure the cluster (cluster suite i mean) using Luci
web interface, just after pressing the "Create Cluster" button, the
operation fails and in /var/log/messages we find these strings:

[...]
Nov 26 10:33:39 sdbsap01 openais[16218]: [MAIN ] ERROR: Could not
accept Library connection: (null) - prior to this log entry, openais
logger dropped '5' messages because of overflow.
Nov 26 10:33:39 sdbsap01 openais[16218]: [MAIN ] ERROR: Could not
accept Library connection: (null) - prior to this log entry, openais
logger dropped '5' messages because of overflow.
Nov 26 10:33:39 sdbsap01 openais[16218]: [MAIN ] ERROR: Could not
accept Library connection: (null) - prior to this log entry, openais
logger dropped '65' messages because of overflow.
[...]

and

[...]
Nov 26 10:38:03 sdbsap01 last message repeated 2 times
Nov 26 10:38:06 sdbsap01 ccsd[16212]: Unable to connect to cluster
infrastructure after 270 seconds.
Nov 26 10:38:36 sdbsap01 ccsd[16212]: Unable to connect to cluster
infrastructure after 300 seconds.
[...]

service cman status return

[root at sdbsap01 ~]# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... failed

[FAILED]
[root at sdbsap01 ~]#

and this is our cluster.conf file:

[root at sdbsap01 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="cludbsap01" config_version="1" name="cludbsap01">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="sdbsap01-priv" nodeid="1" votes="1"/>
<clusternode name="sdbsap02-priv" nodeid="2" votes="1"/>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices/>
<rm/>
</cluster>

Using tcpdump multicast seems to allowed between the nodes

10:33:39.836081 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF],
proto: IGMP (2), length: 40, options ( RA (148) len 4 )) 192.1.1.26 >
224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 239.192.158.231
to_in, 0 source(s)]
10:33:41.933008 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF],
proto: IGMP (2), length: 40, options ( RA (148) len 4 )) 192.1.1.27 >
224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 239.192.158.231
to_ex, 0 source(s)]
10:33:41.965995 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto:
UDP (17), length: 176) 192.1.1.27.5149 > 239.192.158.231.netsupport:
UDP, length 148
10:33:42.188485 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto:
UDP (17), length: 146) 192.1.1.27.5149 > 239.192.158.231.netsupport:
UDP, length 118
10:33:42.507042 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF],
proto: IGMP (2), length: 40, options ( RA (148) len 4 )) 192.1.1.27 >
224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 239.192.158.231
to_in, 0 source(s)]

We are already tried to remove all the packages (yum groupremove
"Clustering" "Cluster Storage"), the /etc/cluster and /var/lib/luci
and /var/lib/ricci directories, and then reinstalling the cluster
suite, but we still have this problem.

Any suggestions?
Regards
Marco
-- 
bizza
http://www.rm-rf.eu/