Corrupt inodes on shared disk...

Paul Fitzmaurice pfitzmaurice at aveksa.com
Wed Apr 4 03:34:17 UTC 2007


Thanks for the info, if you could help to confirm, it appears that in some fail-over situations, we are mounting the shared partition as the the node going down has not completely shut down and done the umount!

So having one node in rw mode when shutting down, and one node mounting and starting up... Could this cause inode and journal corruption?


----- Original Message -----
From: Stephen Samuel <darkonc at gmail.com>
To: Paul Fitzmaurice
Cc: ext3-users at redhat.com <ext3-users at redhat.com>
Sent: Tue Apr 03 15:40:03 2007
Subject: Re: Corrupt inodes on shared disk...

I don't know much about RHCS, but I'm think that this is more likely
to be a Red Hat problem than an ext3 problem..

1) *IF* RHCS properly locks out the 'dead' system, and it doesn't
manage (at some time after the backup system takes over) to write
cashes to the shared drive,

2) and *IF* the failover software isn't too stupid to do things like
run the journal, and otherwise do sane FSCK things before mounting,
then you shouldn't have a problem.

My best guess is that 2) is relatively unlikely which leaves 1) as
probable cause.

If your primary system does *ANY* writes after the failover starts,
then you can probably expect problems like you've seen here. (does
RHCS _physically_ lock out the second system, or is it a software
lockout?)

The other question I have is: why is the system failing over?  Other
than testing, a well built HA system should almost *never* actually
need to fail over. (we're not talking Windows servers here :-} )  HA
should be like insurance ... You pay up front for it and work to make
sure that you never actually have to use what you've paid for.


On 4/3/07, Paul Fitzmaurice <pfitzmaurice at aveksa.com> wrote:
> I am having problems when using a Dell PowerVault MD3000 with multipath from
> a Dell PowerEdge 1950.  I have 2 cables connected and mount the partition on
> the DAS Array.  I am using RHEL 4.4 with RHCS and a two node cluster.  Only
> one node is "Active" at a time, it creates a mount to the partition, and if
> there is an issue RHCS will fence the device and then the other node will
> mount the partition.
>
> I have now run into a problem twice where my ext3 (with Journaling) has
> corrupt inodes.  This actually has resulted in a filesystem with #xxxxxxxxx
> files and directories.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070403/2c95b57e/attachment.htm>


More information about the Ext3-users mailing list