[Linux-cluster] Cannot make cluster after upgrade

Abed-nego G. Escobal, Jr. abednegoyulo at yahoo.com
Wed Jul 8 06:53:37 UTC 2009


After an upgrade from 5.2 to 5.3, the cluster, named GFSCluster, seems to stop being a cluster. GFSCluster is a 2 node cluster using iscsi, cman, clvm, and gfs and it was working fine when it was on 5.2 The configuration on both of the nodes (passwords removed)

<?xml version="1.0"?>
<cluster name="GFSCluster" config_version="5">
<cman expected_votes="1" two_node="1"/>
  <clusternodes><clusternode name="node01.company.com" votes="1" nodeid="1"><fence><method name="single"><device name="node01_ipmi"/></method></fence></clusternode><clusternode name="node02.company.com" votes="1" nodeid="2"><fence><method name="single"><device name="node02_ipmi"/></method></fence></clusternode></clusternodes>
  <fencedevices><fencedevice name="node01_ipmi" agent="fence_ipmilan" ipaddr="10.1.0.5" login="root" passwd="********"/><fencedevice name="node02_ipmi" agent="fence_ipmilan" ipaddr="10.1.0.7" login="root" passwd="********"/></fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

When starting the service cman, they both hang on the part starting fencing

Starting cluster: 
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... 

After 5 minutes the task finishes with "done" but clustat says

==== As root on web01.company.com ====
  Cluster Status for GFSCluster @ Wed Jul  8 01:00:24 2009
  Member Status: Quorate
  
   Member Name                             ID   Status
   ------ ----                             ---- ------
   node01.company.com                         1 Online, Local
   node02.company.com                         2 Offline
  

==== As root on web02.company.com ====
  Cluster Status for GFSCluster @ Wed Jul  8 01:00:26 2009
  Member Status: Quorate
  
   Member Name                             ID   Status
   ------ ----                             ---- ------
   node01.company.com                         1 Offline
   node02.company.com                         2 Online, Local

They are both quorate with their own cluster

In the logs of web01 I found repeating messages

Jul  8 00:55:27 web01 fenced[21872]: node02.company.com not a cluster member after 6 sec post_join_delay
Jul  8 00:55:27 web01 fenced[21872]: fencing node "node02.company.com"
Jul  8 00:55:52 web01 fenced[21872]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.1.0.7...ipmilan: Failed to connect after 30 seconds Failed 


In the logs of web02 I also found the same repeating messages

Jul  8 00:55:27 web02 fenced[6363]: node01.company.com not a cluster member after 6 sec post_join_delay
Jul  8 00:55:27 web02 fenced[6363]: fencing node "node01.company.com"
Jul  8 00:55:53 web02 fenced[6363]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.1.0.5...ipmilan: Failed to connect after 30 seconds Failed


Is there a bug on 5.3 with regards to clustering?
Is there any workarounds?



      Feel safer online. Upgrade to the new, safer Internet Explorer 8 optimized for Yahoo! to put your mind at peace. It's free. Get IE8 here! http://downloads.yahoo.com/sg/internetexplorer/




More information about the Linux-cluster mailing list