[Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1
Harri.Paivaniemi at tietoenator.com
Harri.Paivaniemi at tietoenator.com
Fri Apr 18 04:23:29 UTC 2008
Oh my dear Alex,
It really goes that way! - I just can't believe - you are one hell of a genious.
I havn't had a clue about it could be something this simple. It really works. I feel stupid.
So, I was really driving grazy with this cluster ver 5 yesterday, but now it seems that both of my problems are solved:
1. unable to bring just one node up in 2-node cluster - hanging in fencing / fence failed
Reason: cman was told (by RH) to be started before qdisk and this is wrong way.
Qdisk have to be started first in this situation, so fence_tool is not wondering why cluster is not quorate ;)
2. restart of cluster daemons not succesfull
Reason: You have to wait "token timeout" before starting again ;)
Great.
Thanks for all you. RH support has been thinking these problems 3 weeks now without success.
-hjp
-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Alex Kompel
Sent: Fri 4/18/2008 4:10
To: linux clustering
Subject: Re: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1
2008/4/17 Harri Päiväniemi <harri.paivaniemi at tietoenator.com>:
>
> The 2nd problem that still exists is:
>
> When node a and b are running and everything is ok. I stop node b's
> cluster daemons. when I start node b again, this situation stays
> forever:
>
> ----------------
> node a - clustat
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> areenasql1 1 Online, Local, rgmanager
> areenasql2 2 Offline
> /dev/sda 0 Online, Quorum Disk
>
> Service Name Owner (Last) State
> ------- ---- ----- ------ -----
> service:areena areenasql1 started
>
> -------------------
>
> node b - clustat
>
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> areenasql1 1 Online, rgmanager
> areenasql2 2 Online, Local, rgmanager
> /dev/sda 0 Offline, Quorum Disk
>
> Service Name Owner (Last) State
> ------- ---- ----- ------ -----
> service:areena areenasql1 started
>
>
> So node b's quorum disk is offline, log says it's registred ok and
> heuristic is UP... node a sees node b as offline. If I reboot node b, it
> works ok and joins ok...
Now that you have mentioned it - I remember stumbling upon the similar
problem. It happens if you restart the cluster services before the
cluster realizes the node is dead. I guess it is a bug since the node
is in some sort of limbo state at that moment reporting itsefl being
part of the cluster while the cluster does not recognize it as a
member. If you wait 70 seconds ( cluster.conf: <totem token="70000"/>
) before starting the cluster services then it will come up fine. The
reboot works for you because it take longer than 70 sec (correct me if
I am wrong). So try stopping node b cluster services, wait 70 secs and
then start them back up.
-Alex
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4652 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080418/b027096d/attachment.bin>
More information about the Linux-cluster
mailing list