[Linux-cluster] Latest cvs

Tue Jun 21 11:59:20 UTC 2005

>>                                                                                                            
>>                                                            what 
>> happens in gump:
>>
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> find_root_by_ref 
>> parameters:resources 134613668,groupname datamover
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> We are in 
>> find_root_by_ref we enter the loop
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> 134642544
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> curr->r_rule->rr_root 
>> == 0
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> 134617456
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> find_root_by_ref in 
>> groups.c returned res 134617456
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> res_exec in restree.c 
>> returned rv 0
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> res_exec in restree.c 
>> returned rv 0
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> _res_op_by_level in 
>> restree.c returned rv 0
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> res_start in groups.c 
>> returned ret 0
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> group_op in rg_state.c 
>> returned ret 0
>> Jun 20 11:30:11 gump clurgmgrd[7231]: <notice> Service datamover started
>>
>> what happens in buba:
>>
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> find_root_by_ref 
>> parameters:resources 134613668,groupname datamover
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> We are in 
>> find_root_by_ref we enter the loop
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> 134616568
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> curr->r_rule->rr_root 
>> == 0
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> find_root_by_ref in 
>> groups.c returned res 0
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> group_op in 
>> rg_state.c returned ret -1
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <warning> #68: Failed to start 
>> datamover; return value: 1
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> Stopping service 
>> datamover
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> find_root_by_ref 
>> parameters:resources 134613668,groupname datamover
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> We are in 
>> find_root_by_ref we enter the loop
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> 134616568
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> curr->r_rule->rr_root 
>> == 0
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> find_root_by_ref in 
>> groups.c returned res 0
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <crit> #12: RG datamover 
>> failed to stop; intervention required
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <notice> Service datamover is 
>> failed
>> Jun 20 11:31:29 buba clurgmgrd[10706]: <crit> #13: Service datamover 
>> failed to stop cleanly
>>
>> The reslist seems not to be the same in the two nodes,
>> I'll check my cluster.conf in my both nodes............
>>
> Sorry I'd better have waited before sending the previous mail:
> the cluster.conf are identical.
> So my question-> what does this lists represents, and  what are the 
> files I have to check to see if there are identical
> on my two nodes?
>
I have completely re installed buba (I formatted the root partition, 
reinstalled the os, reinstalled the cluster kernel and userspace components)
and I have the same error. My question is not complicated: I want to 
know that the
resource_t **reslist
in
find_root_by_ref
represents (comments: List of resources to traverse, ok but is the 
resource list what you put in the cluster.conf between the 
<resources><HERE?><resources/>?
Can anyone give me some clue to find why the program behaves differently 
on the two nodes whereas the files
(cluster.conf, /usr/share/cluster/*.sh and the executables) are the same?
Here is my cluster.conf:
<?xml version="1.0"?>
<cluster name="cluster1" config_version="1">

  <cman two_node="1" expected_votes="1">
  </cman>
 
  <clusternodes>
    <clusternode name="buba_cluster" votes="1">
      <fence>
        <method name="single">
          <device name="human" ipaddr="192.168.0.1"/>
        </method>
      </fence>
    </clusternode>

    <clusternode name="gump_cluster" votes="1">
      <fence>
        <method name="single">
          <device name="human" ipaddr="192.168.0.2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice name="human" agent="fence_manual"/>
  </fencedevices>

  <rm>
 
    <failoverdomains>
       <failoverdomain name="datamoverdomain">
        <failoverdomainnode name="gump_cluster" priority="1"/>
        <failoverdomainnode name="buba_cluster" priority="1"/>
       </failoverdomain>
    </failoverdomains>
 
    <resources>
     <script name="simple" file="/etc/init.d/simple"/>
    </resources>
     
    <resourcegroup name="datamover" domain="datamoverdomain">
      <script ref="simple"/>
    </resourcegroup>
  
  </rm>

</cluster>