[Linux-cluster] Corosync memory problem

Chris Alexander chris.alexander at kusiri.com
Mon Dec 19 19:01:05 UTC 2011


Hi all,

You may remember our recent issue, I believe this is being worsened if not
caused by another problem we have encountered.

Every few days our nodes are (non-simultaneously) being fenced due to
corosync taking up vast amounts of memory (i.e. 100% of the box). Please
see a sample log message, we have several just like this, [1] which occurs
when this happens. Note that it is not always corosync being killed - but
it is clearly corosync eating all the memory (see top output from three
servers at various times since their last reboot, [2] [3] [4]).

The corosync version is 1.2.3:
[g at cluster1 ~]$ corosync -v
Corosync Cluster Engine, version '1.2.3'
Copyright (c) 2006-2009 Red Hat, Inc.

We had a bit of a dig around and there are a significant number of bugfix
updates which address various segfaults, crashes, memory leaks etc. in this
minor as well as subsequent minor versions. [5] [6]

We're trialling the Fedora 14 (fc14) RPMs for corosync and corosynclib
(v1.4.2) to see if it fixes the particular issue we are seeing (i.e.
whether or not the memory keeps spiralling way out of control).

Has anyone else seen an issue like this, and is there any known way to
debug or fix it? If we can assist debugging by providing further
information, please specify what this is (and, if non-obvious, how to get
it).

Thanks again for your help

Chris

[1] http://pastebin.com/CbyERaRT
[2] http://pastebin.com/uk9ZGL7H
[3] http://pastebin.com/H4w5Zg46
[4] http://pastebin.com/KPZxL6UB
[5] http://rhn.redhat.com/errata/RHBA-2011-1361.html
[6] http://rhn.redhat.com/errata/RHBA-2011-1515.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111219/dbe7e386/attachment.htm>


More information about the Linux-cluster mailing list