[Linux-cluster] Qdiskd issue over EMC CX3-20 Storage + EMC PowerPath multipathing software
Lon Hohberger
lhh at redhat.com
Mon Sep 24 17:25:06 UTC 2007
On Mon, Sep 24, 2007 at 10:52:49AM -0300, Celso K. Webber wrote:
> Hello all,
>
> I'm having an issue with a RHCS4 Cluster. Here are some versioning
> information:
> * Storage: EMC CX3-20, latest FLARE code applied;
> * HBAs: 2 x QLogic 2462, latest/certified BIOS by EMC (v1.24);
> * Servers: 2 Dell PowerEdge 2950, 2 quad-core processors, 8 GB of RAM, all
> available firmware updates applied;
> * OS: RHEL v4 Update 4 with kernel 2.6.9-42.0.10.ELsmp (latest kernel
> certified by EMC for RHEL4). RHEL4u5 is not certified by EMC yet, so we
> installed RHEL4u4 and upgraded the kernel only to the latest certified
> release;
> * Processor Architecture: everything x86_64;
> * RH Cluster Suite: latest non-kernel specific packages, the other packages
> (cman-kernel, dlm-kernel) are specific for the 2.6.9-42.0.10.ELsmp kernel;
> * Multipath/storage software: EMC PowerPath v5.0.0.157, Navisphere Agent
> v6.24.0.6.13.
>
>
> We are experiencing a problem during our tests with the multipathing
> software. If we take out the fiber cable from one of the HBAs from one
> server, it removes itself from the Cluster because of losing access to the
> shared partition (this is an expected behaviour). But since we are pointing
> the Qdisk daemon to an EMC Power device (/dev/emcpowerXX), we expected that
> the multipathing should take care of the fibre channel outage.
Yes, it should.
>
> So, I ask: is there any specific timers I should configure in cman or
> qdiskd so that I can give enough time for PowerPath to reconfigure the
> available paths? The Storage Administrator verified that all storage paths
> are active and functional.
Yes, you can adjust interval + TKO count. See the qdisk(5) man page.
Note that qdisk timings should be < (0.5 * cluster_timeout), so you will
need to adjust your cluster timeout accordingly:
<cman deadnode_timeout="..." .../>
> By the way: I'm configuring qdiskd with no heuristics at all, since we
> didn't have any reliable "router" available to work as an IP tiebraker for
> the cluster. Since the Cluster FAQ
> (http://sources.redhat.com/cluster/faq.html#quorumdiskonly) states in
> question #23 (last paragraph) that in RHCS4U5 it is possible to have no
> heuristics at all, we are trying it in this installation for the first time.
Correct, but it's nice to have them :)
>
> <?xml version="1.0"?>
> <cluster config_version="9" name="clu_xxxxxx">
> <quorumd log_facility="local6" device="/dev/emcpowere1" interval="1"
> min_score="0" tko="10" votes="1"/>
interval*tko = qdisk timeout (in seconds)
> <cman/>
<cman deadnode_timeout="X"/>
... where X = 2 * interval * tko + 1
The qdisk timeout should be set to something which exceeds the Power
Path failure detection timeout; I don't know what that is...
--
Lon Hohberger - Software Engineer - Red Hat, Inc.
More information about the Linux-cluster
mailing list