[Linux-cluster] Hang on start fence_tool join with qdisk
Eugene Melnichuk
doc at mts.com.ua
Wed Jul 25 15:00:05 UTC 2007
Hi,
Without qdisk (with two_node=1) cluster works fine. But I need qdisk for
latest transition from 2 -> 3 nodes without cluster restart.
Currently I built test cluster with the same hardware and reproduce this
problem.
Messages from dlm occurred from time to time, often I have no messages
after "Quorum formed, starting"
If I set clean_start=1, fencing start fine, but I still lock on access
to cman_admin socket.
So, if you have any suggestions or new devel. pakages for testing , I
can install it and gather debug information.
I can open official ticket for that, but since I installed unsupported
packages this maybe wrong way :)
But, without your unofficial packages I still have non-working qdisk and
ccs_tool update ...
PS I already tried to install new kernel from
http://people.redhat.com/dzickus/el5/36.el5/x86_64/ (that contain many
fixes in DLM) but without luck...
--
Eugene Melnichuk
Leading Engineer
email: doc at umc.ua <mailto:doc at umc.ua>
mob: +380503304043
pbx: +380501105731
CJSC Ukrainian Mobile Communications
49/2 Pobedy ave., room 4.26, 03680, Kyiv, Ukraine
Lon Hohberger ?????:
> On Mon, Jul 23, 2007 at 04:37:42PM +0300, Eugene Melnichuk wrote:
>
>> I have problem with my cluster running on RHEL5 + updates from
>> http://people.redhat.com/lhh/rhel5-test/
>>
>> I have 2 node cluster with shared quorum disk, qdiskd is running, but
>> when I start service cman I hang on Starting fencing.
>> In my logs I have messages about regained qourum :
>>
>> Jul 21 15:50:18 arf-web1 qdiskd[7326]: <info> Assuming master role
>> Jul 21 15:50:19 arf-web1 ccsd[8188]: Cluster is not quorate. Refusing
>> connection.
>> Jul 21 15:50:19 arf-web1 ccsd[8188]: Error while processing connect:
>> Connection refused
>> Jul 21 15:50:19 arf-web1 openais[8200]: [CMAN ] quorum regained,
>> resuming activity
>> Jul 21 15:50:20 arf-web1 clurgmgrd[7746]: <notice> Quorum formed, starting
>> Jul 21 15:50:20 arf-web1 kernel: dlm: no local IP address has been set
>> Jul 21 15:50:20 arf-web1 kernel: dlm: cannot start dlm lowcomms -12
>>
>
> The cause here is probably the problem. Does this happen without qdisk?
> I don't understand why qdisk would cause this.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070725/55da229c/attachment.htm>
More information about the Linux-cluster
mailing list