Ext3 corruption using cluster
Andreas Dilger
adilger at sun.com
Sat May 9 08:58:28 UTC 2009
On May 09, 2009 10:25 +0200, Stefano Cislaghi wrote:
> Maybe... looking around some solutions can be:
> - maximize journal size
> - journaling all data and metadata (mount -o data=journal)
No, these have nothing to do with your problem. If you are running
in a failover environment you need to STONITH the failing server
BEFORE the backup server is trying to take over.
>
> Ste
>
>
> 2009/5/9 Christian Kujau <lists at nerdbynature.de>
>
> > On Thu, 7 May 2009, Stefano Cislaghi wrote:
> > > During a normal switch, operations done are:
> > > - oracle shutdown abort
> > > - oracle listernet shutdown
> > > - umount fs (using umount -l )
Using "umount -l" is just a way to NOT unmount the filesystem,
because some process is keeping it busy. All this does is hide
the mountpoint until the busy process goes away. Definitely a
bad sign that you need this for doing any failover.
Try "lsof" to see which process is keeping the mountpoint busy.
At minimum these need to be stopped/killed and then do a proper
unmount.
> > I'm not all too Oracle cluster savvy, but this lazy umount looks
> > kinda suspicious. From the manpage:
> >
> > > Detach the filesystem from the filesystem hierarchy now, and cleanup
> > > all references to the filesystem as soon as it is not busy anymore
> >
> > My wild guess: node1 has been shut down, did a lazy umount, so that
> > node2 could mount it but node1 was still writing to the fs (i.e. it was
> > still in use)?
> >
> > Christian.
> > --
> > Bruce Schneier's first program was encrypt world.
> >
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the Ext3-users
mailing list