[Linux-cluster] Hang on start fence_tool join with qdisk

Eugene Melnichuk doc at mts.com.ua
Wed Jul 25 15:00:05 UTC 2007


Hi,

Without qdisk (with two_node=1) cluster works fine. But I need qdisk for 
latest transition from 2 -> 3 nodes without cluster restart.
Currently I built test cluster with the same hardware and reproduce this 
problem.
Messages from dlm occurred from time to time, often I have no messages 
after "Quorum formed, starting"
If I set clean_start=1, fencing start fine, but I still lock on access 
to cman_admin socket.

So, if you have any suggestions or new devel. pakages for testing , I 
can install it and gather debug information.
I can open official ticket for that, but since I installed unsupported 
packages this maybe wrong way :)
But, without your unofficial packages I still have non-working qdisk and 
ccs_tool update ...

PS I already tried to install new kernel from 
http://people.redhat.com/dzickus/el5/36.el5/x86_64/   (that contain many 
fixes in DLM) but without luck...


--
Eugene Melnichuk
Leading Engineer
email: doc at umc.ua <mailto:doc at umc.ua>
mob: +380503304043
pbx: +380501105731
CJSC Ukrainian Mobile Communications
49/2 Pobedy ave., room 4.26, 03680, Kyiv, Ukraine



Lon Hohberger ?????:
> On Mon, Jul 23, 2007 at 04:37:42PM +0300, Eugene Melnichuk wrote:
>   
>> I have problem with my cluster running on RHEL5 + updates from  
>> http://people.redhat.com/lhh/rhel5-test/  
>>
>> I have 2 node cluster with shared quorum disk, qdiskd is running, but 
>> when I start service cman I hang on Starting fencing.
>> In my logs I have messages about regained qourum :
>>
>> Jul 21 15:50:18 arf-web1 qdiskd[7326]: <info> Assuming master role
>> Jul 21 15:50:19 arf-web1 ccsd[8188]: Cluster is not quorate.  Refusing 
>> connection.
>> Jul 21 15:50:19 arf-web1 ccsd[8188]: Error while processing connect: 
>> Connection refused
>> Jul 21 15:50:19 arf-web1 openais[8200]: [CMAN ] quorum regained, 
>> resuming activity
>> Jul 21 15:50:20 arf-web1 clurgmgrd[7746]: <notice> Quorum formed, starting
>> Jul 21 15:50:20 arf-web1 kernel: dlm: no local IP address has been set
>> Jul 21 15:50:20 arf-web1 kernel: dlm: cannot start dlm lowcomms -12
>>     
>
> The cause here is probably the problem.  Does this happen without qdisk?
> I don't understand why qdisk would cause this.
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070725/55da229c/attachment.htm>


More information about the Linux-cluster mailing list