[Linux-cluster] problems with gfs and openais cluster.

Sat Aug 15 06:11:52 UTC 2009

folks,
    I have little experience with RH and even less with RH clusters,
but kind of know my way around, I am concerned about an RH cluster
where I am getting nonstop errors like these...

Aug 15 05:45:42 XXXXX   openais[5283]: [MAIN ] Received message has
invalid digest... ignoring.
Aug 15 05:45:42 XXXXX   openais[5283]: [MAIN ] Invalid packet data

We are using RedHat's GFS and I believe the error should be fixed
following this post, however  I have a copule of questions.....

http://www.mail-archive.com/linux-cluster@redhat.com/msg06172.html

a) here is my /etc/ais/openais.conf file (same config for both hosts
in the cluster)...
======================================================
totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.2.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}
logging {
        debug: off
        timestamp: on
}
amf {
        mode: disabled
}
===================================================
but here is a list of my IP addresses:

============================
ifconfig -a | grep inet | grep Mask
          inet addr:10.205.226.91  Bcast:10.205.239.255  Mask:255.255.240.0
          inet addr:10.205.39.26  Bcast:10.205.39.255  Mask:255.255.255.0
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet addr:192.168.122.1  Bcast:192.168.122.255
Mask:255.255.255.0 ((virbr0))
============================
first of all, I believe the  bindnetaddr parameter should have been
set to 192.168.122.0 but I am not sure.....

next, there is a 2nd cluster with the exact same configuration but
different IP addresses except the debugging setup...

here it is...
========================================
totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.2.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}
logging {
        debug: on
        logfile: /var/log/openais.log
        timestamp: on
}

amf {
        mode: disabled
}
===================================================
and here are the IP addresses found in the server:
============================================
#ifconfig -a | grep inet | grep Mask
          inet addr:10.205.26.38  Bcast:10.205.26.255  Mask:255.255.255.0
          inet addr:10.205.39.28  Bcast:10.205.39.255  Mask:255.255.255.0
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet addr:192.168.122.1  Bcast:192.168.122.255
Mask:255.255.255.0 (virbr0)
============================================

given that 10.205.26.38 and 10.205.226.91 are on different VLANs I
doubt that is the problem, but it could be in the adapters using the
10.205.39.x IP address range since they are in the same VLAN and given
the interface binding is incorrect the cluster picked up an interface
and clashed with the existing server.... is that correct?

2nd question: I see openais service is running when chkconfig --list
openais ; shows it should be down ....

chkconfig --list openais
openais         0:off   1:off   2:off   3:off   4:off   5:off   6:off

who's starting it???

thank you,
enrique sanchez.

-- 
Enrique Sanchez Vela
------------------------------------------
"What you have been obliged to discover
by yourself leaves a path in your mind
which you can use again when the need
arises."    --G. C. Lichtenberg
http://themathcircle.org/