[Linux-cluster] qdiskd questions

Tue Mar 9 11:10:20 UTC 2010

Hi,

I have a two node cluster providing NFS. I'm using a small partition on the
shared storage as a quorum disk, with a single heuristic to ping the default
gateway on the network. Both nodes are connected to the network with bonded
interfaces, but I have all of the heartbeat/cluster traffic running over a
crossover cable between the two. The hardware is a pair of Dell PowerEdge
servers with an MD3000 array between them and I'm using the DRAC interface
as the fence device. My cluster.conf looks like the following:

<cluster name="storage" config_version="46">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <quorumd interval="1" tko="10" votes="1" device="/dev/mapper/md3000p1">
                <heuristic program="ping 192.168.30.254 -c3 -t2" score="1" interval="2" tko="3"/>
        </quorumd>
        <clusternodes>
                <clusternode name="node1-xover" votes="1" nodeid="1">
			<fence>
				...
			</fence>
		</clusternode>
                <clusternode name="node2-xover" votes="1" nodeid="2">
			<fence>
				...
			</fence>
		</clusternode>
        </clusternodes>
        <cman expected_votes="3"/>
	...
</cluster>

(It's exactly as per the qdisk(5) man page example)

It's been up and running for ages with no trouble, but recently I
had a problem where the default gateway, despite being an active/passive
pair of Cisco ASA firewalls configured for failover, took at least 30
seconds to fail over when the primary device developed a problem. This
caused the heuristic to fail for long enough, and both nodes rebooted
simultaneously which caused a loss of service. All I can see in the
logs is:

---8<---
Mar  5 07:38:50 node1 qdiskd[7967]: <info> Heuristic: 'ping 192.168.30.254 -c3 
-t2' DOWN (3/3) 
Mar  5 07:38:50 node1 qdiskd[7967]: <notice> Score insufficient for master oper
ation (0/1; required=1); downgrading 
Mar  5 07:38:50 node1 kernel: md: stopping all md devices.
Mar  5 07:38:51 node1 kernel: Synchronizing SCSI cache for disk sdd: 
Mar  5 07:38:51 node1 kernel: Synchronizing SCSI cache for disk sdb: 
Mar  5 07:38:51 node1 kernel: Synchronizing SCSI cache for disk sda: 
Mar  5 07:38:51 node1 kernel: ACPI: PCI interrupt for device 0000:0a:00.0 disab
led
Mar  5 07:38:51 node1 kernel: hub 1-1:1.0: cannot reset port 2 (err = -71)
Mar  5 07:42:34 node1 syslogd 1.4.1: restart.
---8<---

The time difference between the last two messages is obviously where the
node is rebooting. The timestamps on the logs from both nodes are identical
apart from a few seconds on that last message.

I'm a bit unsure what actually did the rebooting in this case, was it
qdiskd or each node shooting the other? Ideally I would like to prevent
this situation from happening again, is it a case of simply adding
reboot="0" to the <quorumd> directive? Does this introduce any different
problems?

Thanks

Matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100309/5b93b4b6/attachment.sig>