[Linux-cluster] Re: Fencing test
Paras pradhan
pradhanparas at gmail.com
Mon Jan 5 18:11:24 UTC 2009
hi,
On Mon, Jan 5, 2009 at 8:23 AM, Rajagopal Swaminathan
<raju.rajsand at gmail.com> wrote:
> Greetings,
>
> On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>
>> Here I am using 4 nodes.
>>
>> Node 1) That runs luci
>> Node 2) This is my iscsi shared storage where my virutal machine(s) resides
>> Node 3) First node in my two node cluster
>> Node 4) Second node in my two node cluster
>>
>> All of them are connected simply to an unmanaged 16 port switch.
>
> Luci need not require a separate node to run. it can run on one of the
> member nodes (node 3 | 4).
OK.
>
> what does clustat say?
Here is my clustat o/p:
-----------
[root at ha1lx ~]# clustat
Cluster Status for ipmicluster @ Mon Jan 5 12:00:10 2009
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
10.42.21.29 1
Online, rgmanager
10.42.21.27 2
Online, Local, rgmanager
Service Name
Owner (Last) State
------- ----
----- ------ -----
vm:linux64
10.42.21.27
started
[root at ha1lx ~]#
------------------------
10.42.21.27 is node3 and 10.42.21.29 is node4
>
> Can you post your cluster.conf here?
Here is my cluster.conf
--
[root at ha1lx cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster alias="ipmicluster" config_version="8" name="ipmicluster">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="10.42.21.29" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="fence2"/>
</method>
</fence>
</clusternode>
<clusternode name="10.42.21.27" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="fence1"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.42.21.28"
login="admin" name="fence1" passwd="admin"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.42.21.30"
login="admin" name="fence2" passwd="admin"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="myfd" nofailback="0" ordered="1" restricted="0">
<failoverdomainnode name="10.42.21.29" priority="2"/>
<failoverdomainnode name="10.42.21.27" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources/>
<vm autostart="1" domain="myfd" exclusive="0" migrate="live"
name="linux64" path="/guest_roots" recovery="restart"/>
</rm>
</cluster>
------
Here:
10.42.21.28 is IPMI interface in node3
10.42.21.30 is IPMI interface in node4
>
> When you pull out the network cable *and* plug it back in say node 3,
> , what messages appear in the /var/log/messages if Node 4 (if any)?
> (sorry for the repitition, but messages are necessary here to make any
> sense of the situation)
>
Ok here is the log in node 4 after i disconnect the network cable in node3.
-----------
Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] The token was lost in the
OPERATIONAL state.
Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Receive multicast socket
recv buffer size (288000 bytes).
Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Transmit multicast socket
send buffer size (262142 bytes).
Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] entering GATHER state from 2.
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering GATHER state from 0.
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Creating commit token
because I am the rep.
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Saving state aru 76 high
seq received 76
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Storing new sequence id
for ring ac
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.29:
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] previous ring seq 168 rep
10.42.21.27
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] aru 76 high delivered 76
received flag 1
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Did not need to originate
any messages in recovery.
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Sending initial ORF token
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration:
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29)
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left:
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27)
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined:
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE
Jan 5 12:05:28 ha2lx kernel: dlm: closing connection to node 2
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration:
Jan 5 12:05:28 ha2lx fenced[5004]: 10.42.21.27 not a cluster member
after 0 sec post_fail_delay
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29)
Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Trying to acquire journal lock...
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left:
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined:
Jan 5 12:05:28 ha2lx openais[4988]: [SYNC ] This node is within the
primary component and will provide service.
Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] got nodejoin message 10.42.21.29
Jan 5 12:05:28 ha2lx openais[4988]: [CPG ] got joinlist message from node 1
Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Looking at journal...
Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Acquiring the transaction lock...
Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Replaying journal...
Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Replayed 0 of 0 blocks
Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Found 0 revoke tags
Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Journal replayed in 1s
Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Done
------------------
Now when I plug back my cable to node3, node 4 reboots and here is the
quickly grabbed log in node4
--
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering GATHER state from 11.
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Saving state aru 1d high
seq received 1d
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Storing new sequence id
for ring b0
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.27:
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
10.42.21.27
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 16 high delivered 16
received flag 1
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [1] member 10.42.21.29:
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
10.42.21.29
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 1d high delivered 1d
received flag 1
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Did not need to originate
any messages in recovery.
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration:
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29)
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left:
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined:
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration:
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27)
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29)
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left:
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined:
Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27)
Jan 5 12:07:12 ha2lx openais[4988]: [SYNC ] This node is within the
primary component and will provide service.
Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
Jan 5 12:07:12 ha2lx openais[4988]: [MAIN ] Killing node 10.42.21.27
because it has rejoined the cluster with existing state
Jan 5 12:07:12 ha2lx openais[4988]: [CMAN ] cman killed by node 2
because we rejoined the cluster without a full restart
Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd_dispatch error -1 errno 11
Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd connection died
Jan 5 12:07:12 ha2lx gfs_controld[5016]: cluster is down, exiting
Jan 5 12:07:12 ha2lx dlm_controld[5010]: cluster is down, exiting
Jan 5 12:07:12 ha2lx kernel: dlm: closing connection to node 1
Jan 5 12:07:12 ha2lx fenced[5004]: cluster is down, exiting
-------
Also here is the log of node3:
--
[root at ha1lx ~]# tail -f /var/log/messages
Jan 5 12:07:24 ha1lx openais[26029]: [TOTEM] entering OPERATIONAL state.
Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message 10.42.21.27
Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message 10.42.21.27
Jan 5 12:07:24 ha1lx openais[26029]: [CPG ] got joinlist message from node 2
Jan 5 12:07:27 ha1lx ccsd[26019]: Attempt to close an unopened CCS
descriptor (4520670).
Jan 5 12:07:27 ha1lx ccsd[26019]: Error while processing disconnect:
Invalid request descriptor
Jan 5 12:07:27 ha1lx fenced[26045]: fence "10.42.21.29" success
Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
jid=0: Trying to acquire journal lock...
Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
jid=0: Looking at journal...
Jan 5 12:07:28 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0: Done
----------------
> HTH
>
> With warm regards
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
Thanks a lot
Paras.
More information about the Linux-cluster
mailing list