[Linux-cluster] dual fence redux

Chris Harms chris at cmiware.com
Tue Jul 3 23:20:23 UTC 2007


To recap:
I am attempting to setup a 2 node cluster where each will run a DB and 
an apache service to be failed over between them.  Both are fenced via 
Dell DRAC connected via the system NICs (this adds to the issue, but 
manual fencing is broken).

My test case so far is to unplug the network cables from one node and 
then reconnect them.  For some reason, both machines get halted instead 
of one machine being fenced.  Having only one node fenced in this 
scenario has only occurred successfully one time.

I previously suspected DRBD as being the culprit, but I can now rule 
this out after performing the cable pull test without RHCS running, and 
having DRBD in every possible configuration the cluster could put it in, 
including a split brain (which is impossible for me due to services not 
failing over until fencing occurs).

Is there any component of the cluster system that would issue the 
shutdown command shown in the log entry below?

[From logs on Node A]

Jul  3 17:36:20 nodeA openais[3504]: [MAIN ] Killing node nodeB because 
it has rejoined the cluster without cman_tool join
Jul  3 17:36:20 nodeA kernel: drbd0: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Jul  3 17:36:21 nodeA kernel: drbd0: Writing meta data super block now.
Jul  3 17:36:21 nodeA kernel: drbd0: conn( WFBitMapS -> SyncSource ) 
pdsk( UpToDate -> Inconsistent )
Jul  3 17:36:21 nodeA kernel: drbd0: Began resync as SyncSource (will 
sync 56 KB [14 bits set]).
Jul  3 17:36:21 nodeA kernel: drbd0: Writing meta data super block now.
Jul  3 17:36:21 nodeA kernel: drbd0: Resync done (total 1 sec; paused 0 
sec; 56 K/sec)
Jul  3 17:36:21 nodeA kernel: drbd0: conn( SyncSource -> Connected ) 
pdsk( Inconsistent -> UpToDate )
Jul  3 17:36:21 nodeA kernel: drbd0: Writing meta data super block now.
Jul  3 17:36:21 nodeA shutdown[18845]: shutting down for system halt


Thanks to a hardware issue on NodeB, I am unable to get to the logs off 
of it presently.






More information about the Linux-cluster mailing list