[Linux-cluster] Corosync memory problem
sdake at redhat.com
Tue Dec 27 16:00:25 UTC 2011
On 12/21/2011 11:04 AM, Chris Alexander wrote:
> An update in case anyone ever runs into something like this - we had
> corosync-notify running on the servers and once we removed that and
> restarted the cluster stack, corosync seemed to return to normal.
> Additionally, according to the corosync mailing list, the cluster 1.2.3
> version is basically very similar to (if not the same as) the 1.4 that
> they currently have released, someone's been backporting.
The upstream 1.2.3 version hasn't had any backports applied to it. Only
the RHEL 1.2.3-z versions have been backported.
> On 19 December 2011 19:01, Chris Alexander <chris.alexander at kusiri.com
> <mailto:chris.alexander at kusiri.com>> wrote:
> Hi all,
> You may remember our recent issue, I believe this is being worsened
> if not caused by another problem we have encountered.
> Every few days our nodes are (non-simultaneously) being fenced due
> to corosync taking up vast amounts of memory (i.e. 100% of the box).
> Please see a sample log message, we have several just like this, 
> which occurs when this happens. Note that it is not always corosync
> being killed - but it is clearly corosync eating all the memory (see
> top output from three servers at various times since their last
> reboot,   ).
> The corosync version is 1.2.3:
> [g at cluster1 ~]$ corosync -v
> Corosync Cluster Engine, version '1.2.3'
> Copyright (c) 2006-2009 Red Hat, Inc.
> We had a bit of a dig around and there are a significant number of
> bugfix updates which address various segfaults, crashes, memory
> leaks etc. in this minor as well as subsequent minor versions.  
> We're trialling the Fedora 14 (fc14) RPMs for corosync and
> corosynclib (v1.4.2) to see if it fixes the particular issue we are
> seeing (i.e. whether or not the memory keeps spiralling way out of
> Has anyone else seen an issue like this, and is there any known way
> to debug or fix it? If we can assist debugging by providing further
> information, please specify what this is (and, if non-obvious, how
> to get it).
> Thanks again for your help
>  http://pastebin.com/CbyERaRT
>  http://pastebin.com/uk9ZGL7H
>  http://pastebin.com/H4w5Zg46
>  http://pastebin.com/KPZxL6UB
>  http://rhn.redhat.com/errata/RHBA-2011-1361.html
>  http://rhn.redhat.com/errata/RHBA-2011-1515.html
> Linux-cluster mailing list
> Linux-cluster at redhat.com
More information about the Linux-cluster