[Linux-cluster] fencing issue - with attach logs&conf

Sat Mar 6 05:48:39 UTC 2010

Hi .

I agree with you !! About giving the admin to adopt or not the
paranoid approach of not failing over the services.

I supported in the past tru64 clusters & now days the HP serviceguard.( hpux
& linux ).

Hp decided not develops the serviceguard on  linux  anymore & we now start
using Redhat-Cluster.

Its seems that for very critical customers you need at least 2 fencing
method !!!

& there is another thing to be fix ASAP  - when using HALVM - The needs of
comparing which file is newer , the lvm.conf or
the initrd.img. -

Regards.

Shalom.

On Fri, Mar 5, 2010 at 10:01 PM, brem belguebli <brem.belguebli at gmail.com>wrote:

> Corey,
>
> Hi Corey
>
> I was talking about a watchdog not a kernel panic (sysreq...), on
> common (X86) hardware, most server vendors implement embedded hardware
> chips that could be used.
>
> Indeed, SCSI3 reservation/registration could be combined to this whole
> stuff to be sure about the nodes sanity.
>
> I think the choice should be given to the admin to adopt or not the
> paranoid approach of not failing over the services.
>
>
>
> 2010/3/4 Corey Kovacs <corey.kovacs at gmail.com>:
> > Brem,
> >
> > It's been my understanding that the kernel panic technique you are
> > describing essentially is undesirable for the fact that the kernel is in
> an
> > unknown state. Basically anything can happen. The OS doesn't have to do a
> > sync for an hba do flush etc. Since RedHat isn't in the business of
> building
> > there own hardware like HP(DEC), Sun, IBM, they take the next best route
> to
> > ensure that nothing from that problematic machine can affect the storage
> and
> > the only way to guarantee that is to remove power from the whole machine.
> >
> > VMS and Tru64 use the panic method but the other nodes will issue a
> > reservation on the scsi bus against that node to protect the storage.
> They
> > can do that because they know exactly how there hardware and
> implementation
> > of reservations work.
> >
> > Corey
> >
> > On Thu, Mar 4, 2010 at 5:32 AM, שלום קלמר <sklemer at gmail.com> wrote:
> >>
> >> Thanks to all !!!!
> >>
> >> Shalom.klemer at hp.com
> >>
> >> On Thu, Mar 4, 2010 at 12:00 AM, Lon Hohberger <lhh at redhat.com> wrote:
> >>>
> >>> On Wed, 2010-03-03 at 13:10 +0200, שלום קלמר wrote:
> >>> > Hi.
> >>> >
> >>> > I got 2 power supplies. But if someone by mistake pull the power
> >>> > cables , is that mean
> >>> >
> >>> > That the services will not failover ??
> >>>
> >>> The problem is:
> >>>
> >>> no power = no ping + no DRAC access
> >>> no network = no ping, no DRAC access
> >>>
> >>> If there's no power, then it is safe to fail over.
> >>>
> >>> If there is no network (and power is OK), then it is not safe to fail
> >>> over.  Failover in this case is very likely to produce data corruption!
> >>>
> >>> Because we can not tell which case happened, we do not fail over.
> >>>
> >>> -- Lon
> >>>
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100306/1626dc50/attachment.htm>