[Linux-cluster] Unable to retrieve batch 1493885544 status fromx.x.x.86:11111: module scheduled for execution

Tue Nov 25 05:59:01 UTC 2008

Regarding Multicast: A symptom I had when multicasting was disabled was "ccs_tool update cluster.conf" didn't work, ie didn't push out the updated cluster.conf (which makes some sense!). So you can check by updating the version of cluster.conf on one member and running this (from in /etc/cluster).

Have you got all your firewall rules done or turned off?

Your error message seems to be saying a problem with ricci (that 11111 port). On RH 5.1 on a two node cluster I had too many issues with luci and ricci misbehaving or giving wrong information. The command line tools worked much better for me. "cman_tool status" :-)

Bevan Broun
Solutions Architect
Ardec International

http://www.ardec.com.au
http://www.lisasoft.com
http://www.terrapages.com
Sydney
-----------------------
Suite 112,The Lower Deck
19-21 Jones Bay Wharf
Pirrama Road, Pyrmont 2009
Ph:  +61 2 8570 5000
Fax: +61 2 8570 5099 
-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Harri.Paivaniemi at tietoenator.com
Sent: Tuesday, 25 November 2008 4:43 PM
To: neuroticimbecile at yahoo.com; linux-cluster at redhat.com
Subject: RE: [Linux-cluster] Unable to retrieve batch 1493885544 status fromx.x.x.86:11111: module scheduled for execution

Hi,

Are you sure multicast is working in your switch?

Openais uses it... I have had several really odd misbehaviours because of multicast not working...

-hjp

-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of eric rosel
Sent: Mon 11/24/2008 19:02
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Unable to retrieve batch 1493885544 status fromx.x.x.86:11111: module scheduled for execution

Hi All,

I'm trying to set up a 2-node cluster using luci.  As of now, I have only configured it with a single service, with a single resource: an IP address, so I could test if the IP address fails over to the other node.  So far, it doesn't.

In /var/log/messages of the first node, it says: "Unable to retrieve batch 1493885544 status from x.x.x.86:11111: module scheduled for execution"

It seems that each node is unaware of the other, "cman_tool nodes" says, respectively:

===<snip>===
Node  Sts   Inc   Joined               Name
   1   M     48   2008-11-24 23:46:04  x.x.x.85
   2   X      0                        x.x.x.86
===<snip>===
Node  Sts   Inc   Joined               Name
   1   X      0                        x.x.x.85
   2   M     72   2008-11-24 23:32:43  x.x.x.86
===<snip>===

My /etc/cluster/cluster.conf contains:
===<snip>===
<?xml version="1.0"?>
<cluster alias="binary.cluster" config_version="18" name="binary.cluster">
        <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="202.81.160.85" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="202.81.160.86" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <resources>
                        <ip address="202.81.160.87" monitor_link="0"/>
                </resources>
                <service autostart="1" exclusive="1" name="binary.service" recovery="relocate">
                        <ip ref="202.81.160.87"/>
                </service>
                <failoverdomains/>
        </rm>
</cluster>
===<snip>===

I've already tried some things mentioned in the list archives:
1. "ccs_test connect" returns "Connect successful." on both nodes.

2. Although I'm using IP addresses in cluster.conf, I've added hostname definitions in /etc/hosts on both nodes:
===<snip>===
x.x.x.85   node1.domain.com    node1
x.x.x.86   node2.domain.com    node2
===<snip>===

3. When I manually copy /etc/cluster/cluster.conf to both nodes and do a "cman_tool version -r <version_number>", luci shows both nodes' "Status" as "Cluster Member". But when I try to make any changes using luci, the second node becomes "Not a Cluster Member"; and doing a "Have node join cluster" doesn't make it a member.

I'm running on CentOS 5.2 with:
luci-0.12.0-7.el5.centos.3
ricci-0.12.0-7.el5.centos.3
cman-2.0.84-2.el5_2.1
rgmanager-2.0.38-2.el5_2.1

Any pointers on how to make this work?

TIA,
-eric

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster