[Linux-cluster] cluster instability

GS R gsrlinux at gmail.com
Tue Jun 17 03:31:14 UTC 2008


On 6/16/08, Shawn Hood <shawnlhood at gmail.com> wrote:
>
> All,
>
> This message was sent out to my office, so the voice may seem a bit
> odd.  We have a 4 node cluster running RHEL4U6 on Dell Poweredge
> 1950s.  Fencing is done via DRAC.
>
> Using packages (from RHN):
>
> cman-kernel-smp-2.6.9-53.13
> cman-1.0.17-0.el4_6.5
> ccs-1.0.11-1.el4_6.1
> fence-1.32.50-2.el4_6.1
> lvm2-cluster-2.02.27-2.el4_6.2
> dlm-kernel-smp-2.6.9-52.9
> dlm-kernheaders-2.6.9-52.9
>
> Our cluster became unstable on Saturday morning.  Apparently
> hugin stopped sending out heartbeats, causing it to become fenced.  hugin
> was under heavy load (~10) at the time:
>
> 03:30:02 AM         6       453      9.35     10.29     10.51
> 03:40:01 AM        12       465     11.02     11.00     10.75
> 03:50:02 AM         3       446      9.75     10.80     10.86
> 04:00:01 AM         5       430      9.23      9.47     10.07
> Average:            7       455     10.19     10.32     10.28
>
> 04:09:35 AM       LINUX RESTART
>
> As you can see, hugin was fenced at 4:09.  The other nodes then began
> logging the following:
>
> Jun 14 04:08:06 munin kernel: CMAN: Initiating transition, generation 58
> Jun 14 04:08:21 munin kernel: CMAN: Initiating transition, generation 59
> Jun 14 04:08:36 munin kernel: CMAN: Initiating transition, generation 60
> Jun 14 04:08:51 munin kernel: CMAN: Initiating transition, generation 61
> Jun 14 04:09:06 munin kernel: CMAN: too many transition restarts - will die
> Jun 14 04:09:06 munin kernel: CMAN: we are leaving the cluster.
> Inconsistent
> cluster view


I guess this has to do with network issue though its utilization was low
when this logged.
The node is not able to receive messages.

After so many 'initiating transition' messages, the cluster died.  Our
> network utilization was very low at the time.
>
> Any ideas?
>
> Shawn
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

Thanks
Gowrishankar Rajaiyan | Senior Quality Analyst
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080617/99d0aec8/attachment.htm>


More information about the Linux-cluster mailing list