[Linux-cluster] GFS2 2 Node Cluster - lost Node - Mount not writeable
Thomas Börnert
tb at tbits.net
Tue Feb 26 22:40:30 UTC 2008
Hi List,
2 Servers - connected with crossover
my rpms:
gfs2-utils-0.1.38-1.el5
gfs-utils-0.1.12-1.el5
kmod-gfs2-1.52-1.16.el5
cman-2.0.73-1.el5_1.1
my cluster.conf on both sites
---------------------------------------------------------------------------------
<?xml version="1.0"?>
<cluster name="cluster" config_version="2">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="node1" votes="1" nodeid="1">
<fence>
<method name="human">
<device name="human" nodename="node1"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" votes="1" nodeid="2">
<fence>
<method name="human">
<device name="human" nodename="node2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="human" agent="fence_manual"/>
</fencedevices>
</cluster>
---------------------------------------------------------------------------------------
my hosts on both sites
192.168.0.1 node1
192.168.0.2 node2
my mountpoints
mkfs.gfs2 -p lock_dlm -t cluster:drbd -j 2 /dev/drbd0
mount -t gfs2 -o noatime,nodiratime /dev/drbd0 /test
(Btw: => drbd works fine as Primary/Primary)
ok, i can use /test on both sites and can write to files
and so on.
cman_tool nodes
--------------------------------------------------------------------------------------
Node Sts Inc Joined Name
1 M 364 2008-02-26 23:20:16 node1
2 M 360 2008-02-26 23:20:16 node2
cman_tool status
-------------------------------------------------------------------------------------
Version: 6.0.1
Config Version: 3
Cluster Name: cluster
Cluster Id: 34996
Cluster Member: Yes
Cluster Generation: 364
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1
Active subsystems: 6
Flags: 2node
Ports Bound: 0
Node name: node2
Node ID: 2
Multicast addresses: 239.192.136.61
Node addresses: 192.168.0.2
NOW: i power node1 off !
my log on node2 shows:
-----------------------------------------------------------------------------------------
==> /var/log/messages <==
Feb 26 23:27:22 node2 last message repeated 13 times
==> /var/log/kernel <==
Feb 26 23:27:31 node2 kernel: tg3: eth1: Link is down.
Feb 26 23:27:32 node2 kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
Feb 26 23:27:32 node2 kernel: tg3: eth1: Flow control is off for TX and off
for RX.
Feb 26 23:27:36 node2 kernel: drbd0: PingAck did not arrive in time.
Feb 26 23:27:36 node2 kernel: drbd0: peer( Primary -> Unknown ) conn(
Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Feb 26 23:27:36 node2 kernel: drbd0: Creating new current UUID
Feb 26 23:27:36 node2 kernel: drbd0: asender terminated
Feb 26 23:27:36 node2 kernel: drbd0: short read expecting header on sock:
r=-512
Feb 26 23:27:36 node2 kernel: drbd0: tl_clear()
Feb 26 23:27:36 node2 kernel: drbd0: Connection closed
Feb 26 23:27:36 node2 kernel: drbd0: Writing meta data super block now.
Feb 26 23:27:36 node2 kernel: drbd0: conn( NetworkFailure -> Unconnected )
Feb 26 23:27:36 node2 kernel: drbd0: receiver terminated
Feb 26 23:27:36 node2 kernel: drbd0: receiver (re)started
Feb 26 23:27:36 node2 kernel: drbd0: conn( Unconnected -> WFConnection )
==> /var/log/messages <==
Feb 26 23:27:37 node2 last message repeated 3 times
Feb 26 23:27:40 node2 openais[3288]: [TOTEM] The token was lost in the
OPERATIONAL state.
Feb 26 23:27:40 node2 openais[3288]: [TOTEM] Receive multicast socket recv
buffer size (288000 bytes).
Feb 26 23:27:40 node2 openais[3288]: [TOTEM] Transmit multicast socket send
buffer size (262142 bytes).
Feb 26 23:27:40 node2 openais[3288]: [TOTEM] entering GATHER state from 2.
Feb 26 23:27:42 node2 root: Process did not exit cleanly, returned 2 with
signal 0
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] entering GATHER state from 0.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Creating commit token because I
am the rep.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Saving state aru 31 high seq
received 31
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Storing new sequence id for ring
170
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] entering COMMIT state.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] entering RECOVERY state.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] position [0] member 192.168.0.2:
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] previous ring seq 364 rep
192.168.0.1
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] aru 31 high delivered 31 received
flag 1
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Did not need to originate any
messages in recovery.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Sending initial ORF token
Feb 26 23:27:44 node2 openais[3288]: [CLM ] CLM CONFIGURATION CHANGE
Feb 26 23:27:44 node2 openais[3288]: [CLM ] New Configuration:
Feb 26 23:27:44 node2 fenced[3307]: node1 not a cluster member after 0 sec
post_fail_delay
Feb 26 23:27:44 node2 openais[3288]: [CLM ] r(0) ip(192.168.0.2)
Feb 26 23:27:44 node2 fenced[3307]: fencing node "node1"
==> /var/log/kernel <==
Feb 26 23:27:44 node2 kernel: dlm: closing connection to node 1
==> /var/log/messages <==
Feb 26 23:27:44 node2 openais[3288]: [CLM ] Members Left:
Feb 26 23:27:45 node2 openais[3288]: [CLM ] r(0) ip(192.168.0.1)
Feb 26 23:27:45 node2 fence_manual: Node node1 needs to be reset before
recovery can procede. Waiting for node1 to rejoin the cluster or for manual
acknowledgement that it has been reset (i.e. fence_ack_manual -n node1)
Feb 26 23:27:45 node2 openais[3288]: [CLM ] Members Joined:
Feb 26 23:27:45 node2 openais[3288]: [CLM ] CLM CONFIGURATION CHANGE
Feb 26 23:27:45 node2 openais[3288]: [CLM ] New Configuration:
Feb 26 23:27:45 node2 openais[3288]: [CLM ] r(0) ip(192.168.0.2)
Feb 26 23:27:45 node2 openais[3288]: [CLM ] Members Left:
Feb 26 23:27:45 node2 openais[3288]: [CLM ] Members Joined:
Feb 26 23:27:45 node2 openais[3288]: [SYNC ] This node is within the primary
component and will provide service.
Feb 26 23:27:45 node2 openais[3288]: [TOTEM] entering OPERATIONAL state.
Feb 26 23:27:45 node2 openais[3288]: [CLM ] got nodejoin message 192.168.0.2
Feb 26 23:27:45 node2 openais[3288]: [CPG ] got joinlist message from node 2
Feb 26 23:27:47 node2 root: Process did not exit cleanly, returned 2 with
signal 0
-------------------------------------------------------------------------------------------------------------
ls /test works
BUT
touch /test/testfile hangs ....
cman_tool nodes shows
------------------------------------------------------------------------------------------------------------------
Node Sts Inc Joined Name
1 X 364 node1
2 M 360 2008-02-26 23:20:16 node2
-----------------------------------------------------------------------------------------------------------------
cman_tool status shows
-----------------------------------------------------------------------------------------------------------------
Version: 6.0.1
Config Version: 3
Cluster Name: cluster
Cluster Id: 34996
Cluster Member: Yes
Cluster Generation: 368
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Quorum: 1
Active subsystems: 6
Flags: 2node
Ports Bound: 0
Node name: node2
Node ID: 2
Multicast addresses: 239.192.136.61
Node addresses: 192.168.0.2
------------------------------------------------------------------------------------------------------------------
my drbd is no problem state is already primary (standalone)
Why can't i write to a gfs partition in the "lost Node" state ?
Now: i power node1 on !
drbd is no problem -> its recovered.
now i start cman
and my touch will be finished ....
Thanks for any ideas and help
-Thomas
More information about the Linux-cluster
mailing list