[Linux-cluster] GFS 6.0 node without quorum tries to fence
Schumacher, Bernd
bernd.schumacher at hp.com
Wed Aug 4 14:06:32 UTC 2004
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of AJ Lewis
> Sent: Mittwoch, 4. August 2004 15:54
> To: Discussion of clustering software components including GFS
> Subject: Re: [Linux-cluster] GFS 6.0 node without quorum
> tries to fence
>
>
> On Wed, Aug 04, 2004 at 08:12:51AM +0200, Schumacher, Bernd wrote:
> > So, what I have learned from all answers is very bad news
> for me. It
> > seems, what happened is as expected by most of you. But this means:
> >
> >
> ----------------------------------------------------------------------
> > -
> > --- One single point of failure in one node can stop the
> whole gfs. ---
> >
> --------------------------------------------------------------
> ---------
> >
> > The single point of failure is:
> > The lancard specified in "nodes.ccs:ip_interfaces" stops working on
> > one node. No matter if this node was master or slave.
> >
> > The whole gfs is stopped:
> > The rest of the cluster seems to need time to form a new
> cluster. The
> > bad node does not need so much time for switching to
> arbitrary mode.
> > So the bad node has enough time to fence all other nodes, before it
> > would be fenced by the new master.
> >
> > The bad node lives but it can not form a cluster. GFS is
> not working.
> >
> > Now all other nodes will reboot. But after reboot they can not join
> > the cluster, because they can not contact the bad node. The
> lancard is
> > still broken. GFS is not working.
> >
> > Did I miss something?
> > Please tell me that I am wrong!
>
> Well, I guess I'm confused how the node with the bad lan card
> can contact the fencing device to fence the other nodes. If
> it can't communicate with the other nodes because it's NIC is
> down, it can't contact the fencing device over that NIC
> either, right? Or are you using some alternate transport to
> contact the fencing device?
There is a second admin Lan which is used for fencing.
Could I probably use this second admin Lan for GFS Heartbeats too. Can I
define two LAN-Cards in "nodes.ccs:ip_interfaces". If this works I would
not have a single point of failure anymore. But the documentation seems
not to allow this.
I will test this tomorrow.
>
> > > -----Original Message-----
> > > From: linux-cluster-bounces at redhat.com
> > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of
> > > Schumacher, Bernd
> > > Sent: Dienstag, 3. August 2004 13:56
> > > To: linux-cluster at redhat.com
> > > Subject: [Linux-cluster] GFS 6.0 node without quorum
> tries to fence
> > >
> > >
> > > Hi,
> > > I have three nodes oben, mitte and unten.
> > >
> > > Test:
> > > I have disabled eth0 on mitte, so that mitte will be excluded.
> > >
> > > Result:
> > > Oben and unten are trying to fence mitte and build a new
> > > cluster. OK! But mitte tries to fence oben and unten. PROBLEM!
> > >
> > > Why can this happen? Mitte knows that it can not build a
> > > cluster. See Logfile from mitte: "Have 1, need 2"
> > >
> > > Logfile from mitte:
> > > Aug 3 12:53:17 mitte lock_gulmd_core[1845]: Client (oben)
> > > expired Aug 3 12:53:17 mitte lock_gulmd_core[1845]: Core lost
> > > slave quorum. Have 1, need 2. Switching to Arbitrating. Aug
> > > 3 12:53:17 mitte
> > > lock_gulmd_core[2120]: Gonna exec fence_node oben Aug 3
> > > 12:53:17 mitte
> > > lock_gulmd_core[1845]: Forked [2120] fence_node oben with a 0
> > > pause. Aug 3 12:53:17 mitte fence_node[2120]: Performing
> > > fence method, manual, on oben.
> > >
> > > cluster.ccs:
> > > cluster {
> > > name = "tom"
> > > lock_gulm {
> > > servers = ["oben", "mitte", "unten"]
> > > }
> > > }
> > >
> > > fence.ccs:
> > > fence_devices {
> > > manual_oben {
> > > agent = "fence_manual"
> > > }
> > > manual_mitte ...
> > >
> > >
> > > nodes.ccs:
> > > nodes {
> > > oben {
> > > ip_interfaces {
> > > eth0 = "192.168.100.241"
> > > }
> > > fence {
> > > manual {
> > > manual_oben {
> > > ipaddr = "192.168.100.241"
> > > }
> > > }
> > > }
> > > }
> > > mitte ...
> > >
> > > regards
> > > Bernd Schumacher
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > http://www.redhat.com/mailman/listinfo/linux-> cluster
> > >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > http://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> AJ Lewis Voice: 612-638-0500
> Red Hat Inc. E-Mail: alewis at redhat.com
> 720 Washington Ave. SE, Suite 200
> Minneapolis, MN 55414
>
> Current GPG fingerprint = D9F8 EDCE 4242 855F A03D 9B63 F50C
> 54A8 578C 8715 Grab the key at:
> http://people.redhat.com/alewis/gpg.html or > one of the many
> keyservers out there... -----Begin Obligatory Humorous
> Quote----------------------------------------
> "In this time of war against Osama bin Laden and the
> oppressive Taliban regime, we are thankful that OUR leader
> isn't the spoiled son of a powerful politician from a wealthy
> oil family who is supported by religious fundamentalists,
> operates through clandestine organizations, has no respect
> for the democratic electoral process, bombs innocents, and
> uses war to deny people their civil liberties." --The
> Boondocks -----End Obligatory Humorous
> Quote------------------------------------------
>
More information about the Linux-cluster
mailing list