[Linux-cluster] Cluster environment issue
Kaloyan Kovachev
kkovachev at varna.net
Fri Jun 3 08:48:31 UTC 2011
Hi,
On Thu, 2 Jun 2011 08:37:07 -0700 (PDT), Srija <swap_project at yahoo.com>
wrote:
> Thank you so much for your reply again.
>
> --- On Tue, 5/31/11, Kaloyan Kovachev <kkovachev at varna.net> wrote:
> Thanks for your reply again.
>
>
> >
>> If it is a switch restart you will have in your logs the
>> interface going
>> down/up, but more problematic is to find a short drop of
>> the multicast
>
> I checked all nodes did not find anything about interface, but in all
the
> nodes it is reporting that server19(node 12) /server18 (node 11) is the
> problematic, here I am mentioning the logs from three nodes (out of 16
> nodes)
>
> May 24 18:04:59 server7 openais[6113]: [TOTEM] entering GATHER state
> from 12.
> May 24 18:05:01 server7 crond[5068]: (root) CMD (
> /opt/hp/hp-health/bin/check-for-restart-requests)
> May 24 18:05:19 server7 openais[6113]: [TOTEM] entering GATHER state
> from 11.
>
> May 24 18:04:59 server1 openais[6148]: [TOTEM] entering GATHER state
> from 12.
> May 24 18:05:01 server1 crond[2275]: (root) CMD (
> /opt/hp/hp-health/bin/check-for-restart-requests)
> May 24 18:05:19 server1 openais[6148]: [TOTEM] entering GATHER state
> from 11.
>
> May 24 18:04:59 server8 openais[6279]: [TOTEM] entering GATHER state
> from 12.
> May 24 18:05:01 server8 crond[11125]: (root) CMD (
> /opt/hp/hp-health/bin/check-for-restart-requests)
> May 24 18:05:19 server8 openais[6279]: [TOTEM] entering GATHER state
> from 11.
>
>
> Here is some lines from node12 , at the same time
> ___________________________________________________
>
>
> May 24 18:04:59 server19 openais[5950]: [TOTEM] The token was lost in
the
> OPERATIONAL state.
> May 24 18:04:59 server19 openais[5950]: [TOTEM] Receive multicast socket
> recv buffer size (320000 bytes).
> May 24 18:04:59 server19 openais[5950]: [TOTEM] Transmit multicast
socket
> send buffer size (262142 bytes).
> May 24 18:04:59 server19 openais[5950]: [TOTEM] entering GATHER state
from
> 2.
> May 24 18:05:19 server19 openais[5950]: [TOTEM] entering GATHER state
from
> 11.
> May 24 18:05:20 server19 openais[5950]: [TOTEM] Saving state aru 39a8f
> high seq received 39a8f
> May 24 18:05:20 server19 openais[5950]: [TOTEM] Storing new sequence id
> for ring 2af0
> May 24 18:05:20 server19 openais[5950]: [TOTEM] entering COMMIT state.
> May 24 18:05:20 server19 openais[5950]: [TOTEM] entering RECOVERY state.
>
>
> Here is few lines on node11 ie server18
> ------------------------------------------
>
> ay 24 18:04:48 server18
> May 24 18:10:14 server18 syslog-ng[5619]: syslog-ng starting up;
> version='2.0.10'
> May 24 18:10:14 server18 Bootdata ok (command line is ro
> root=/dev/vgroot_xen/lvroot rhgb quiet)
>
>
> So it seems that node11 is rebooting just after few mintues we get all
> the problems in the logs of all nodes.
>
>
> > You may ask the network people to check for STP changes and
>> double check
>> the multicast configuration and you may also try to use
>> broadcast instead
>> of multicast or use a dedicated switch.
>
> As per the dedicated switch, I don't think it is possible as per the
> network team. I asked the STP chanes related. their answer is
>
> "there are no stp changes for the private network as there are no
> redundant devices in the environment. the multicast configs is igmp
> snooping with Pim"
>
> I have talked to the network team for using the broadcast instead of
> multicast, as per them , they can set..
>
> Pl. comment on this...
>
to use broadcast (if private addresses are in the same VLAN/subnet) you
just need to set it in cluster.conf - cman section, but not sure if it can
be done on a running cluster (without stopping or braking it)
> > your interface and multicast address)
>> ping -I ethX -b -L 239.x.x.x -c 1
>> and finaly run this script until the cluster gets broken
>
> Yes , I have checked it , it is working fine now. I have also set a
cron
> for this script and set in one node.
no need for cron if you haven't changed the script - this will start
several processes and your network will be overloaded !!!
the script was made to run on a console (or via screen) and it will exit
_only_ when multicast is lost
>
> I have few questions regarding the cluster configuration ...
>
>
> - We are using clvm in the cluster environment. As I understand it
> is active-active.
> The environment is xen . all the xen hosts are in the cluster and
> each host have
> the guests. We are keeping the options to live migrate the guests
> from one host to another.
>
> - I was looking into the redhat knowledgebase
> https://access.redhat.com/kb/docs/DOC-3068,
> as per the document , what do you think using CLVM or HA-LVM will
be
> the best choice?
>
> Pl. advice.
can't comment on this sorry
>
>
> Thanks and regards again.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list