[Linux-cluster] Unable to retrieve batch 1493885544 status fromx.x.x.86:11111: module scheduled for execution

Tue Nov 25 05:43:13 UTC 2008

Hi,

Are you sure multicast is working in your switch?

Openais uses it... I have had several really odd misbehaviours because of multicast not working...

-hjp

-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of eric rosel
Sent: Mon 11/24/2008 19:02
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Unable to retrieve batch 1493885544 status fromx.x.x.86:11111: module scheduled for execution

Hi All,

I'm trying to set up a 2-node cluster using luci.  As of now, I have only configured it with a single service, with a single resource: an IP address, so I could test if the IP address fails over to the other node.  So far, it doesn't.

In /var/log/messages of the first node, it says: "Unable to retrieve batch 1493885544 status from x.x.x.86:11111: module scheduled for execution"

It seems that each node is unaware of the other, "cman_tool nodes" says, respectively:

===<snip>===
Node  Sts   Inc   Joined               Name
   1   M     48   2008-11-24 23:46:04  x.x.x.85
   2   X      0                        x.x.x.86
===<snip>===
Node  Sts   Inc   Joined               Name
   1   X      0                        x.x.x.85
   2   M     72   2008-11-24 23:32:43  x.x.x.86
===<snip>===

My /etc/cluster/cluster.conf contains:
===<snip>===
<?xml version="1.0"?>
<cluster alias="binary.cluster" config_version="18" name="binary.cluster">
        <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="202.81.160.85" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="202.81.160.86" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <resources>
                        <ip address="202.81.160.87" monitor_link="0"/>
                </resources>
                <service autostart="1" exclusive="1" name="binary.service" recovery="relocate">
                        <ip ref="202.81.160.87"/>
                </service>
                <failoverdomains/>
        </rm>
</cluster>
===<snip>===

I've already tried some things mentioned in the list archives:
1. "ccs_test connect" returns "Connect successful." on both nodes.

2. Although I'm using IP addresses in cluster.conf, I've added hostname definitions in /etc/hosts on both nodes:
===<snip>===
x.x.x.85   node1.domain.com    node1
x.x.x.86   node2.domain.com    node2
===<snip>===

3. When I manually copy /etc/cluster/cluster.conf to both nodes and do a "cman_tool version -r <version_number>", luci shows both nodes' "Status" as "Cluster Member". But when I try to make any changes using luci, the second node becomes "Not a Cluster Member"; and doing a "Have node join cluster" doesn't make it a member.

I'm running on CentOS 5.2 with:
luci-0.12.0-7.el5.centos.3
ricci-0.12.0-7.el5.centos.3
cman-2.0.84-2.el5_2.1
rgmanager-2.0.38-2.el5_2.1

Any pointers on how to make this work?

TIA,
-eric

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4546 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081125/69f7a924/attachment.bin>