[Linux-cluster] Problem in fencing, gfs2 freezes

Digimer lists at alteeve.ca
Mon Oct 29 17:28:21 UTC 2012


Please see the answer given on the DRBD Users list to this question.

digimer

On 10/29/2012 04:23 AM, Zohair Raza wrote:
> Hi, 
> 
> I have setup a Primary/Primary cluster with GFS2.
> 
> All works good if I shut down any node regularly, but when I unplug
> power of any node, GFS freezes and I can not access the device. 
> 
> Tried to use http://people.redhat.com/lhh/obliterate 
> 
> this is what I see in logs 
> 
> Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time.
> Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown )
> conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0
> -> 1 )
> Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated
> Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread
> Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed
> Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure ->
> Unconnected )
> Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated
> Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread
> Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started
> Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected ->
> WFConnection )
> Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
> fence-peer res0
> Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed
> Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
> fence-peer res0 exit code 1 (0x100)
> Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken,
> returned 1
> Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor failed,
> forming new configuration.
> Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1
> Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor joined or
> left the membership and a new membership was formed.
> Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen downlist: sender
> r(0) ip(192.168.23.128) ; members(old:2 left:1)
> Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2
> Oct 29 08:05:53 node1 fenced[1401]: fencing node node2
> Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1:
> Trying to acquire journal lock...
> Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent
> fence_ack_manual result: error from agent
> Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed
> Oct 29 08:05:56 node1 fenced[1401]: fencing node node2
> Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent
> fence_ack_manual result: error from agent
> Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed
> Oct 29 08:05:59 node1 fenced[1401]: fencing node node2
> Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent
> fence_ack_manual result: error from agent
> Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed
> 
> Regards,
> Zohair Raza 
> 
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Linux-cluster mailing list