[Linux-cluster] Network failure results cluster environmentunstable & fragile
Pena, Francisco Javier
francisco_javier.pena at roche.com
Mon Feb 27 08:22:19 UTC 2006
Hi Deval,
If you are using iLO fencing, you could try the latest fence package
(1.32.10). I have seen a similar problem, and it is because recent iLO
firmware versions behave a little different (they try to make a soft
restart instead of a hard reboot).
At least one of the nodes should get properly killed, and the surviving
one should keep all services.
Hope this helps. Regards,
Javier
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Deval
> kulshrestha
> Sent: Saturday, February 25, 2006 6:33 AM
> To: 'linux clustering'
> Subject: RE: [Linux-cluster] Network failure results cluster
> environmentunstable & fragile
>
>
> Please help me to resolve my problem
>
>
> If network goes off on node1, and service which were not
> running on node1 are started by node1 with shared storage
> mount point, which was already running on node 2 but both of
> these nodes are not able to communicate to each other,
> node2 anyway already running the same service with shared
> storage mount point. Because of Fencing both of these nodes
> try to kill each other. Both of they got hanged up at
> "Stoping Cluster manager Services.".In /var/log/messages, it
> shows fencing s1, fence successful.
>
> If we disable fencing than
>
> If network comes back nodes don't synchronize with each
> other. Shared storage mount point is available to both the
> servers. If they try to access storage at same storage gives
> IO errors. Hence this entire setup become very unstable, fragile.
>
> --- Deval kulshrestha
> <deval.kulshrestha at progression.com> wrote:
>
> > Hi
> >
> > I am struggling to get some help on following
> > configuration. This setup is
> > intended to put live in a data center for 24 x 7
> > x365, any issue that makes
> > my environment unstable is very critical here.
> >
> > My HA Cluster Setup details
> >
> > 1. HP DL 360 G4p Server 2nos.
> > 2. HP MSA 500 G2 (SAN) 1nos.
> > 3. RedHat Enterprise Linux 4 ES
> > 4. Red Hat Cluster Suite 4
> >
> >
> > Server does have a HP SCSI HBA. MSA 500G2 is a scsi
> > based SAN. Both of these
> > server are connected to SAN using SCSI VHDCI cable.
> > I used a network switch
> > to establish network connectivity for the server.
> > created a disk array of
> > three HDD on SAN with two logical volumes than I
> > have installed RHEL 4
> > Update 1 on both server(Servers are configured with
> > RAID 1) than installed
> > all HP drivers and management agents. After server
> configuration and
> > OS installation I have installed Red Hat Cluster Suite
> > v 4 on both the machine.
> >
> >
> >
> > Than I have configured Cluster using Cluster
> > Configuration Manager. Added
> > member hosts, configured fence device and assigned
> > to member host(HP iLO is
> > certified as an fence device), Configured Failover
> > domain with node
> > priority, configured resources such as floating IP
> > address, File System,
> > Script, than configured service which need to be run
> > in HA mode.
> >
> >
> >
> > After configuring this I have tested with various
> > scenario HA is working
> > properly, when ever powered off any machine ,
> > services fail over on
> > available node.
> >
> > Problem:
> >
> >
> > If network goes off on node1, and service which were
> > not running on node1
> > are started by node1 with shared storage mount
> > point, which was already
> > running on node 2 but both of these nodes are not
> > able to communicate to
> > each other, node2 anyway already running the same
> > service with shared
> > storage mount point. Because of Fencing both of
> > these nodes try to kill each
> > other. Both of they got hanged up at "Stoping
> > Cluster manager Services.".In
> > /var/log/messages, it shows fencing s1, fence
> > successful.
> >
> > If we disable fencing than
> >
> > If network comes back nodes don't synchronize with
> > each other. Shared
> > storage mount point is available to both the
> > servers. If they try to access
> > storage at same storage gives IO errors. Hence this
> > entire setup become very
> > unstable, fragile.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > With Regard
> >
> > Deval
> >
> > Progression Infonet Pvt. Ltd.
> > 55, Independent Electronic Modules,
> > Sector - 18, Electronic City,
> > Gurgaon - 122015
> >
> > India
> > Tel : - 0124 - 2455070, Ext. 215, Fax:
> > 91-124-2398647
> > Mobile : - 98186 -82509
> > URL : - www.progression.com
> >
> >
> >
> >
> ===========================================================
> > Privileged or confidential information may be
> > contained
> > in this message. If you are not the addressee
> > indicated
> > in this message (or responsible for delivery of the
> > message to such person), please delete this message
> > and
> > kindly notify the sender by an emailed reply.
> > Opinions,
> > conclusions and other information in this message
> > that
> > do not relate to the official business of
> > Progression
> > and its associate entities shall be understood as
> > neither
> > given nor endorsed by them.
> >
> >
> >
> -------------------------------------------------------------
> > Progression Infonet Private Limited, Gurgaon
> > (Haryana), India
> > > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
>
>
> ===========================================================
> Privileged or confidential information may be contained
> in this message. If you are not the addressee indicated
> in this message (or responsible for delivery of the
> message to such person), please delete this message and
> kindly notify the sender by an emailed reply. Opinions,
> conclusions and other information in this message that do not
> relate to the official business of Progression and its
> associate entities shall be understood as neither given nor
> endorsed by them.
>
>
> -------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon (Haryana), India
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> ===========================================================
> Privileged or confidential information may be contained
> in this message. If you are not the addressee indicated
> in this message (or responsible for delivery of the
> message to such person), please delete this message and
> kindly notify the sender by an emailed reply. Opinions,
> conclusions and other information in this message that do not
> relate to the official business of Progression and its
> associate entities shall be understood as neither given nor
> endorsed by them.
>
>
> -------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon (Haryana), India
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
More information about the Linux-cluster
mailing list