[Linux-cluster] cluster down network

Fri Jan 11 11:56:03 UTC 2008

I have two node cluster .

/etc/cluster/cluster.conf

     <?xml version="1.0"?>
<cluster alias="saza" config_version="38" name="saza">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="node1.network.com" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="node2.network.com" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="cenall" ordered="1" restricted="1">
                                <failoverdomainnode name="node1.network.com" priority="1"/>
                                <failoverdomainnode name="node2.network.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/httpd" name="httpd"/>
                        <fs device="/dev/sda3" force_fsck="0" force_unmount="0" fsid="6443" fstype="ext3" mountpoint="/var/www/html" name="httpd-content" options="" self_fence="0"/>
                        <ip address="10.28.99.81" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="cenall" name="webserver">
                        <script ref="httpd"/>
                        <fs ref="httpd-content"/>
                        <ip ref="10.28.99.81"/>
                </service>
        </rm>
</cluster>

when i restart network on node2. 
service network shutdown
service network start.

and then

on node2
i can't start cman .

start fencing ... failed

Message log on node2

openais[22433]: [SYNC ] Not using a virtual synchrony filter. 
Jan 12 02:05:02 clus2 groupd[22442]: found uncontrolled kernel object rgmanager in /sys/kernel/dlm
Jan 12 02:05:02 clus2 openais[22433]: [TOTEM] Creating commit token because I am the rep. 
Jan 12 02:05:02 clus2 groupd[22442]: local node must be reset to clear 1 uncontrolled instances of gfs and/or dlm
Jan 12 02:05:02 clus2 openais[22433]: [TOTEM] Saving state aru 0 high seq received 0 
Jan 12 02:05:02 clus2 fence_node[22467]: Fence of "node2.network.com" was unsuccessful 
Jan 12 02:05:02 clus2 openais[22433]: [TOTEM] entering COMMIT state. 
Jan 12 02:05:02 clus2 gfs_controld[22460]: groupd_dispatch error -1 errno 104
Jan 12 02:05:02 clus2 dlm_controld[22454]: groupd is down, exiting
Jan 12 02:05:02 clus2 fenced[22448]: groupd is down, exiting
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering RECOVERY state. 
Jan 12 02:05:03 clus2 gfs_controld[22460]: groupd connection died
Jan 12 02:05:03 clus2 kernel: dlm: closing connection to node 2
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] position [0] member 192.168.100.2: 
Jan 12 02:05:03 clus2 gfs_controld[22460]: cluster is down, exiting
Jan 12 02:05:03 clus2 kernel: dlm: closing connection to node 1
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] previous ring seq 0 rep 192.168.100.2 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] aru 0 high delivered 0 received flag 0 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] Did not need to originate any messages in recovery. 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] Storing new sequence id for ring 4 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] Sending initial ORF token 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] CLM CONFIGURATION CHANGE 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] New Configuration: 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] Members Left: 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] Members Joined: 
Jan 12 02:05:03 clus2 openais[22433]: [SYNC ] This node is within the primary component and will provide service. 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] CLM CONFIGURATION CHANGE 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] New Configuration: 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.2)  
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] Members Left: 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] Members Joined: 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.2)  
Jan 12 02:05:03 clus2 openais[22433]: [SYNC ] This node is within the primary component and will provide service. 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering OPERATIONAL state. 
Jan 12 02:05:03 clus2 openais[22433]: [CMAN ] quorum regained, resuming activity 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] got nodejoin message 192.168.100.2 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering GATHER state from 11. 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] Saving state aru 9 high seq received 9 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering COMMIT state. 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering RECOVERY state. 
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] position [0] member 192.168.100.1: 
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] previous ring seq 132 rep 192.168.100.1 
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] aru c high delivered c received flag 0 
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] position [1] member 192.168.100.2: 
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] previous ring seq 4 rep 192.168.100.2 
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] aru 9 high delivered 9 received flag 0 
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] Did not need to originate any messages in recovery. 
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] Storing new sequence id for ring 88 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] CLM CONFIGURATION CHANGE 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] New Configuration: 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.2)  
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] Members Left: 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] Members Joined: 
Jan 12 02:05:04 clus2 openais[22433]: [SYNC ] This node is within the primary component and will provide service. 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] CLM CONFIGURATION CHANGE 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] New Configuration: 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.1)  
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.2)  
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] Members Left: 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] Members Joined: 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.1)  
Jan 12 02:05:04 clus2 openais[22433]: [SYNC ] This node is within the primary component and will provide service. 
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] entering OPERATIONAL state. 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] got nodejoin message 192.168.100.1 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] got nodejoin message 192.168.100.2 
Jan 12 02:05:04 clus2 openais[22433]: [CPG  ] got joinlist message from node 1 
Jan 12 02:05:04 clus2 openais[22433]: [CMAN ] cman killed by node 2 for reason 2 
Jan 12 02:05:32 clus2 ccsd[22427]: Unable to connect to cluster infrastructure after 30 seconds. 

but on node1 can   restart cman. but message log show 
fence node2.network.com failed.

how  to join member again ?

      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080111/4d85c017/attachment.htm>