[Linux-cluster] Nodes are getting Down while relocating service

Tue Jan 31 17:25:41 UTC 2012

Hi
Well just not fully sure what logging that was

Anyway, to help clarify, if the cluster works ok, up until you start
services, I'll investigate the services

can you post the output of
cman_tool services

when cluster is running ok

Cheers
Jose

> Hello Jose
>
> If you look the cluster.conf you can see his dosn't using drbd
>
> Like i sayed beforce
> ===================================================
> [network_problem]
> ===================================================
> Jan 28 15:50:05 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:05 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:05 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:05 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:06 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:06 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:06 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:06 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:07 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:07 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:07 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:07 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:08 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:08 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:08 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:08 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:09 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:09 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:09 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:09 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:10 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:10 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:10 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:10 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:11 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:11 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> Jan 28 15:50:11 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE
> Jan 28 15:50:11 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state
> from 6.
> ==================================================================
>
> the first think it can be utils it's stops iptables
>
> 2012/1/31 jose nuno neto <jose.neto at liber4e.com>
>
>> Hello
>>
>> Took a quick look on the messages and see no fence reference, there's a
>> break in token messages, recovering, cluster.conf change, comunication
>> lost again....
>> could be the service shutdown, after cluster.conf update, forcing
>> shutdown
>>
>> do you have drbd running too?
>>
>> Cheers
>> Jose Neto
>>
>> > Hi,
>> >
>> > We  are facing some issue while configuring cluster in Centos 5.5
>> >
>> >
>> > Here is the scenario where we got stuck.
>> >
>> > Issue:
>> >
>> > All nodes in the cluster turned of if cluster services restarted or
>> > disabled or enabled.
>> >
>> > Three services should work as a clustered service,
>> >
>> > 1.     Postgresql.
>> > 2.     GFS (1TB SAN space which is mounted on /var/lib/pgsql)
>> > 3.     Virtual IP (common IP)—IP 10.242.108.42
>> >
>> > Even we tried adding only Virtual IP as a cluster service then also,
>> >
>> > #clusvcadm  -r DBService –m ssdgblade2.db2   (from ssdgblade1.db1)
>> >
>> > Could not relocate the service and both node get turned off.
>> >
>> > Environment
>> >
>> > CentOS 5.5
>> > Postgresql 8.3.3
>> > Kernel version-2.6.18-194
>> > CentOs Cluster Suit.
>> >
>> > Hardware:
>> >
>> > 1.    Chasis IBM BladeCenter E.
>> > 2.    IBM HS22 blades (8 numbers)—clustering is done in blade1 and
>> blade2
>> > 3.    Blade Management Module IP is 10.242.108.58
>> > 4.    Fence device IBM Bladecenter.( login successful via telnet and
>> > web browser to management module).
>> > 5.    Cisco Catalyst 2960G Switch.
>> >
>> > IP:
>> >
>> > 10.242.108.41 (ssdgblade1.db1)
>> > 10.242.108.43 (ssdgblade2.db2)
>> >
>> > Virtual IP 10.242.108.42
>> > Multicast IP 239.192.247.38
>> >
>> >
>> > Diagnostic Steps followed:
>> >
>> > 1.     Removed postgresql and GFS from cluster service and rebooted
>> > both the server with only VIP service. Still problem exist. Can not
>> > relocate the service.
>> > 2.    Tested fencing by,
>> >
>> > #fence_node ssdgblade2.db2   (from db1)
>> > #fence_node ssdgblade1.db1   (from db2)
>> >
>> > Can fence the given node.  But during boot up it fence the other node.
>> >
>> > Please find the attachment for your reference.
>> > --
>> >
>> >
>> > Thanks & Regards,
>> >
>> > *Arun K P
>> > *
>> >
>> > System Administrator
>> >
>> > *HCL Infosystems Ltd*.
>> >
>> > *Kolkata*
>> >
>> > Mob: +91- 9903361422
>> >
>> > *www.hclinfosystems.in* <http://www.hclinfosystems.in/>
>> >
>> > *Technology that touches lives* *TM*
>> > **
>> >
>> > --
>> > This message has been scanned for viruses and
>> > dangerous content by MailScanner, and is
>> > believed to be clean.
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.