[Linux-cluster] Qdiskd issue over EMC CX3-20 Storage + EMC PowerPath multipathing software

Mon Sep 24 13:52:49 UTC 2007

Hello all,

I'm having an issue with a RHCS4 Cluster. Here are some versioning information:
* Storage: EMC CX3-20, latest FLARE code applied;
* HBAs: 2 x QLogic 2462, latest/certified BIOS by EMC (v1.24);
* Servers: 2 Dell PowerEdge 2950, 2 quad-core processors, 8 GB of RAM, all 
available firmware updates applied;
* OS: RHEL v4 Update 4 with kernel 2.6.9-42.0.10.ELsmp (latest kernel 
certified by EMC for RHEL4). RHEL4u5 is not certified by EMC yet, so we 
installed RHEL4u4 and upgraded the kernel only to the latest certified release;
* Processor Architecture: everything x86_64;
* RH Cluster Suite: latest non-kernel specific packages, the other packages 
(cman-kernel, dlm-kernel) are specific for the 2.6.9-42.0.10.ELsmp kernel;
* Multipath/storage software: EMC PowerPath v5.0.0.157, Navisphere Agent 
v6.24.0.6.13.

We are experiencing a problem during our tests with the multipathing 
software. If we take out the fiber cable from one of the HBAs from one 
server, it removes itself from the Cluster because of losing access to the 
shared partition (this is an expected behaviour). But since we are pointing 
the Qdisk daemon to an EMC Power device (/dev/emcpowerXX), we expected that 
the multipathing should take care of the fibre channel outage.

So, I ask: is there any specific timers I should configure in cman or qdiskd 
so that I can give enough time for PowerPath to reconfigure the available 
paths? The Storage Administrator verified that all storage paths are active 
and functional.

By the way: I'm configuring qdiskd with no heuristics at all, since we 
didn't have any reliable "router" available to work as an IP tiebraker for 
the cluster. Since the Cluster FAQ 
(http://sources.redhat.com/cluster/faq.html#quorumdiskonly) states in 
question #23 (last paragraph) that in RHCS4U5 it is possible to have no 
heuristics at all, we are trying it in this installation for the first time.

Below I post the relevant part of my cluster.conf file:

<?xml version="1.0"?>
<cluster config_version="9" name="clu_xxxxxx">
	<quorumd log_facility="local6" device="/dev/emcpowere1" interval="1" 
min_score="0" tko="10" votes="1"/>
	<fence_daemon post_fail_delay="10" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="node1" votes="1">
			<fence>
				<method name="1">
					<device lanplus="" name="node1-ipmi"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="node2" votes="1">
			<fence>
				<method name="1">
					<device lanplus="" name="node2-ipmi"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman/>
	<fencedevices>
		<fencedevice agent="fence_ipmilan" auth="none" ipaddr="hercules01-ipmi" 
login="root" name="node1-ipmi" passwd="clusterprosper"/>
		<fencedevice agent="fence_ipmilan" auth="none" ipaddr="hercules02-ipmi" 
login="root" name="node2-ipmi" passwd="clusterprosper"/>
	</fencedevices>
...

Thank you very much for any ideas on this issue.

Regards,

Celso.

-- 
*Celso Kopp Webber*

celso at webbertek.com.br <mailto:celso at webbertek.com.br>

*Webbertek - Opensource Knowledge*
(41) 8813-1919 - celular
(41) 4063-8448, ramal 102 - fixo

-- 
Esta mensagem foi verificada pelo sistema de antivírus e
 acredita-se estar livre de perigo.