Lon Hohberger lhh at redhat.com
Mon Sep 24 17:25:06 UTC 2007

On Mon, Sep 24, 2007 at 10:52:49AM -0300, Celso K. Webber wrote:
> I'm having an issue with a RHCS4 Cluster. Here are some versioning 
> information:
> * Storage: EMC CX3-20, latest FLARE code applied;
> * HBAs: 2 x QLogic 2462, latest/certified BIOS by EMC (v1.24);
> * Servers: 2 Dell PowerEdge 2950, 2 quad-core processors, 8 GB of RAM, all 
> available firmware updates applied;
> * OS: RHEL v4 Update 4 with kernel 2.6.9-42.0.10.ELsmp (latest kernel 
> certified by EMC for RHEL4). RHEL4u5 is not certified by EMC yet, so we 
> installed RHEL4u4 and upgraded the kernel only to the latest certified 
> release;
> * Processor Architecture: everything x86_64;
> * RH Cluster Suite: latest non-kernel specific packages, the other packages 
> (cman-kernel, dlm-kernel) are specific for the 2.6.9-42.0.10.ELsmp kernel;
> * Multipath/storage software: EMC PowerPath v5.0.0.157, Navisphere Agent 
> v6.
> We are experiencing a problem during our tests with the multipathing 
> software. If we take out the fiber cable from one of the HBAs from one 
> server, it removes itself from the Cluster because of losing access to the 
> shared partition (this is an expected behaviour). But since we are pointing 
> the Qdisk daemon to an EMC Power device (/dev/emcpowerXX), we expected that 
> the multipathing should take care of the fibre channel outage.

Yes, it should.
> So, I ask: is there any specific timers I should configure in cman or 
> qdiskd so that I can give enough time for PowerPath to reconfigure the 
> available paths? The Storage Administrator verified that all storage paths 
> are active and functional.

Yes, you can adjust interval + TKO count.  See the qdisk(5) man page.
Note that qdisk timings should be < (0.5 * cluster_timeout), so you will
need to adjust your cluster timeout accordingly:

   <cman deadnode_timeout="..." .../>

> By the way: I'm configuring qdiskd with no heuristics at all, since we 
> didn't have any reliable "router" available to work as an IP tiebraker for 
> the cluster. Since the Cluster FAQ 
> (http://sources.redhat.com/cluster/faq.html#quorumdiskonly) states in 
> question #23 (last paragraph) that in RHCS4U5 it is possible to have no 
> heuristics at all, we are trying it in this installation for the first time.

Correct, but it's nice to have them :)
> <?xml version="1.0"?>
> <cluster config_version="9" name="clu_xxxxxx">
> 	<quorumd log_facility="local6" device="/dev/emcpowere1" interval="1" 
> min_score="0" tko="10" votes="1"/>

  interval*tko = qdisk timeout (in seconds)

> 	<cman/>

   <cman deadnode_timeout="X"/>

   ... where X = 2 * interval * tko + 1

The qdisk timeout should be set to something which exceeds the Power
Path failure detection timeout; I don't know what that is...

