[Linux-cluster] Latest cvs

Fri Jun 17 15:19:15 UTC 2005

David Teigland wrote:

>On Thu, Jun 16, 2005 at 03:57:14PM +0200, Ion Alberdi wrote:
>
>  
>
>>I tried to update my cluster (kernel space and userspace) to the latest 
>>cvs version with the linux 2.6.11.12 kernel
>>    
>>
>
>Use this:  http://people.redhat.com/teigland/cluster-2.6.11.tar.bz2
>
>The cvs head isn't ready for general use yet, and the RHEL4/FC4 cvs
>branches don't work with 2.6.11.
>
>  
>
>>using the patch method and I experiences some problems.
>>    
>>
>
>We don't keep the patches updated, so that method doesn't work any longer.
>Build everything (including kernel modules) within the cluster directory.
>
>Dave
>
>  
>
OK thank you !!!!
I installed this version and the cluster can ben launched now.
But I have the same problem with the rgmanager that I have don't managed 
to debug:
I launch the cluster on the two nodes (buba and gump) (ccsd,cman,fence)
I launch the rgmanager on the two nodes.
I activate/desactivate the simplest service (#!/bin/sh

exit 0)
on gump, and it works
whereas when I try to do the same in buba I have te following error:

Jun 17 17:00:18 buba clurgmgrd[24643]: <notice> Starting disabled 
service datamover
Jun 17 17:00:18 buba clurgmgrd[24643]: <warning> #68: Failed to start 
datamover; return value: 1
Jun 17 17:00:18 buba clurgmgrd[24643]: <notice> Stopping service datamover
Jun 17 17:00:18 buba clurgmgrd[24643]: <crit> #12: RG datamover failed 
to stop; intervention required
Jun 17 17:00:18 buba clurgmgrd[24643]: <notice> Service datamover is failed

When I look to rgmanager/errors.txt
#68: Failed to start <name>; return value: <integer>

The resource group <name> failed to start and returned the value <integer>.
This could indicate missing resources on the node or an improperly 
configured
resource group.  Check your resource group's configuration against your
hardware and software configuration and ensure that it is correct.

What I don't undertand is that buba and gump have the same cluster 
components, and the same install
so it's weird that it works on gump, and not on buba.

Another thing that is really weird: here is the script launched
by the rgmanager:

#!/bin/sh                                                                                                                                                                                                            

exit 0     

So it can never return 1, which is contradictory with the message #68: 
Failed to start datamover; return value: 1.

Here is my cluster.conf:

<?xml version="1.0"?>
<cluster name="cluster1" config_version="1">

  <cman two_node="1" expected_votes="1">
  </cman>

  <clusternodes>
    <clusternode name="buba_cluster" votes="1">
      <fence>
          <method name="single">
            <device name="human" ipaddr="192.168.0.1"/>
          </method>
      </fence>
    </clusternode>

    <clusternode name="gump_cluster" votes="1">
      <fence>
        <method name="single">
          <device name="human" ipaddr="192.168.0.2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>

  <fencedevices>
    <fencedevice name="human" agent="fence_manual"/>
  </fencedevices>

  <rm>

    <failoverdomains>
       <failoverdomain name="datamoverdomain">
    <failoverdomainnode name="gump_cluster" priority="1"/>
    <failoverdomainnode name="buba_cluster" priority="1"/>
       </failoverdomain>
    </failoverdomains>

    <resources>
     <script name="simple" file="/etc/init.d/simple"/>
    </resources>

    <resourcegroup name="datamover" domain="datamoverdomain">
      <script ref="simple"/>
    </resourcegroup>

  </rm>

</cluster>

If anyone has an idea it will be great!! but I will also be very 
thankfull if anybody could give me some debugging techniques
to see what happens there (I tried gdb clurgmrgd and a break point on 
group_op but I lost all my hope when I saw that the process was detached 
and that threads were launched.... ( I don't know if
and how we can debug multithreaded programs with gdb))