[Cluster-devel] qdiskd hangs cluster activity

Fabio Massimo Di Nitto fabbione at ubuntu.com
Tue Oct 30 13:39:02 UTC 2007


The culprit is a missing patch from openais 0.82 release.

https://bugzilla.redhat.com/show_bug.cgi?id=314641

Fabio

Fabio Massimo Di Nitto wrote:
> Hi Lon,
> 
> I found a very interesting bug that manages to hang the entire cluster.
> 
> Setup is a 3 nodes cluster, with no fancy stuff running at all (i will be able
> to show it to you next wed as it lives on my laptop ;)).
> 
> <quorumd label="test1">
>  <heuristic program="ping 192.168.1.1 -c1 -t1" score="1" interval="2" tko="3"/>
> </quorumd>
> 
> test1 is a 1GB shared AOE device between the 3 nodes.
> 
> the cluster starts without problems. After firing up qdiskd -f -d:
> 
> qdiskd -f -d
> [12681] debug: Loading configuration information
> [12681] debug: Heuristic: 'ping 192.168.1.1 -c1 -t1' score=1 interv
> =2 tko=3
> [12681] debug: 1 heuristics loaded
> [12681] debug: Quorum Daemon: 1 heuristics, 1 interval, 10 tko, 0 votes
> open_partition: seek: Invalid argument
> qdisk_validate: open of /dev/sda2 for RDWR failed: Illegal seek
> qdisk_verify: Illegal seek
> [12681] info: Quorum Partition: /dev/etherd/e1.0 Label: test1
> [12681] info: Quorum Daemon Initializing
> [12682] info: Heuristic: 'ping 192.168.1.1 -c1 -t1' UP
> [12681] debug: Node 2 is UP
> [12681] debug: Node 3 is UP
> [12681] info: Initial score 1/1
> [12681] info: Initialization complete
> [12681] notice: Score sufficient for master operation (1/1; required=1); upgra
> ng
> [12681] debug: Making bid for master
> [12681] info: Assuming master role
> 
> A few seconds after the node assume master role, it hangs. The others will
> follow in a matter of seconds.
> 
> aisexec is stalled in recv(..
> 
> No way to recover. kill -9 all over is required.
> 
> In attachment is a qdiskd strace from all the 3 nodes started at the exact same
> time.
> 
> Fabio
> 
> PS I wonder if we are hitting this:
> 
> from qdisk/disk.c:
> 
>         /*
>          * All IOs must be of size which is a multiple of 512.  Here we
>          * just add in enough extra to accommodate.
>          * XXX - if the on-disk offsets don't provide enough room we're cooked!
>          */
> 


-- 
I'm going to make him an offer he can't refuse.




More information about the Cluster-devel mailing list