[Cluster-devel] qdiskd hangs cluster activity
Fabio Massimo Di Nitto
fabbione at ubuntu.com
Tue Oct 30 13:39:02 UTC 2007
The culprit is a missing patch from openais 0.82 release.
https://bugzilla.redhat.com/show_bug.cgi?id=314641
Fabio
Fabio Massimo Di Nitto wrote:
> Hi Lon,
>
> I found a very interesting bug that manages to hang the entire cluster.
>
> Setup is a 3 nodes cluster, with no fancy stuff running at all (i will be able
> to show it to you next wed as it lives on my laptop ;)).
>
> <quorumd label="test1">
> <heuristic program="ping 192.168.1.1 -c1 -t1" score="1" interval="2" tko="3"/>
> </quorumd>
>
> test1 is a 1GB shared AOE device between the 3 nodes.
>
> the cluster starts without problems. After firing up qdiskd -f -d:
>
> qdiskd -f -d
> [12681] debug: Loading configuration information
> [12681] debug: Heuristic: 'ping 192.168.1.1 -c1 -t1' score=1 interv
> =2 tko=3
> [12681] debug: 1 heuristics loaded
> [12681] debug: Quorum Daemon: 1 heuristics, 1 interval, 10 tko, 0 votes
> open_partition: seek: Invalid argument
> qdisk_validate: open of /dev/sda2 for RDWR failed: Illegal seek
> qdisk_verify: Illegal seek
> [12681] info: Quorum Partition: /dev/etherd/e1.0 Label: test1
> [12681] info: Quorum Daemon Initializing
> [12682] info: Heuristic: 'ping 192.168.1.1 -c1 -t1' UP
> [12681] debug: Node 2 is UP
> [12681] debug: Node 3 is UP
> [12681] info: Initial score 1/1
> [12681] info: Initialization complete
> [12681] notice: Score sufficient for master operation (1/1; required=1); upgra
> ng
> [12681] debug: Making bid for master
> [12681] info: Assuming master role
>
> A few seconds after the node assume master role, it hangs. The others will
> follow in a matter of seconds.
>
> aisexec is stalled in recv(..
>
> No way to recover. kill -9 all over is required.
>
> In attachment is a qdiskd strace from all the 3 nodes started at the exact same
> time.
>
> Fabio
>
> PS I wonder if we are hitting this:
>
> from qdisk/disk.c:
>
> /*
> * All IOs must be of size which is a multiple of 512. Here we
> * just add in enough extra to accommodate.
> * XXX - if the on-disk offsets don't provide enough room we're cooked!
> */
>
--
I'm going to make him an offer he can't refuse.
More information about the Cluster-devel
mailing list