[Linux-cluster] (new) problems with qdisk, running test rpms
Lon Hohberger
lhh at redhat.com
Mon May 7 14:49:41 UTC 2007
On Wed, May 02, 2007 at 02:44:06PM +0100, Frederik Ferner wrote:
> Hi,
>
> finally I had a chance to experiment with the test rpms for cman[1] that
> should solve the problem with multiple master I had...
>
> For these tests I was using the following rpms on RHEL4U4:
>
> kernel-smp-2.6.9-42.0.3.EL
> cman-kernel-smp-2.6.9-45.8.1TEST
> cman-1.0.11-0.4.1qdisk
> rgmanager-1.9.54-1
>
> To test this I have two server connected to one switch with nothing else
> connected and one uplink. As heuristics for qdiskd I'm pinging a few IP
> addresses outside of this switch. When I unplug the uplink with the old
> cman installed, qdiskd on both servers immediately notice this and lower
> the score accordingly.
> With the new version of qdiskd it seems the heuristics are not tested
> anymore after it reaches a sufficient score once. When the outside
> network is lost qdiskd on both server still claim the same score in the
> status file and both servers report the votes for the qdisk to cman.
Hmm, could you add 'tko="1"' to your cluster.conf for the heuristics? I
wonder if it's an initialization problem.
> If qdiskd is started while the outside network is unreachable the scores
> start without the scores for the failing heuristics. Once network is
> restored the score jumps to at least the minimum required for operation
> and once again stays there.
>
> Is this a bug that will be fixed in the upcoming RHEL4U5 release or
> could there be something else wrong with my setup?
This seems to work for me:
[10538] debug: Heuristic: 'ping 192.168.79.254 -c1 -t3' missed (1/3)
[10538] debug: Heuristic: 'ping 192.168.79.254 -c1 -t3' missed (2/3)
[10538] info: Heuristic: 'ping 192.168.79.254 -c1 -t3' DOWN (3/3)
[10537] notice: Score insufficient for master operation (0/11;
required=6); downgrading
Message from syslogd at green at Mon May 7 10:36:43 2007 ...
green clurgmgrd[7305]: <emerg> #1: Quorum Dissolved
(machine rebooted)
> Here's my quorumd section from cluster.conf
>
> -----
> <quorumd interval="1" tko="5" votes="3" log_level="9"
> log_facility="local4" status_file="/tmp/qdisk_status"
> device="/dev/emcpowerq1">
> <heuristic program="ping 172.23.4.254 -c1 -t1" score="1"
> interval="2"/>
> <heuristic program="ping 130.246.8.13 -c1 -t3" score="1"
> interval="2"/>
> <heuristic program="ping 130.246.72.21 -c1 -t3" score="1"
> interval="2"/>
> <heuristic program="ping 172.23.5.120 -c1 -t1" score="1"
> interval="2"/>
> <heuristic program="ping 172.23.6.229 -c1 -t1" score="1"
> interval="2"/>
> <heuristic program="ping 172.23.7.34 -c1 -t1" score="1"
> interval="2"/>
> <heuristic program="ping 172.23.7.35 -c1 -t1" score="1"
> interval="2"/>
> <heuristic program="ping 172.23.6.233 -c1 -t1" score="1"
> interval="2"/>
> </quorumd>
> -----
> If you need any more information, I happy to provide this.
Hmm, try adding tko="3" to each of your ping heuristics, like this:
<heuristic program="ping 172.23.6.233 -c1 -t1" score="1"
interval="2" tko="3"/>
-- Lon
--
Lon Hohberger - Software Engineer - Red Hat, Inc.
More information about the Linux-cluster
mailing list