[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Ext3 strangeness data loss

On Tue, Feb 04, 2003 at 12:47:06PM -0000, Bodrogi Viktor wrote:
> So I decided to switch to reiserfs, which has performance advantages too.
> After about fifth reboot I could mount /var, and copied it to a new
> partition together with root partition.
> And, terrible, I had the same problem with /usr/sbin/sshd startup, without
> the binary changes, according to a diff with a probably-good backup (who can
> be sure about after all these...).
> So the conclusion is that pssibly this has nothing to do with ext3.

It would be polite for me to not say, "I told you so", so I won't.  :-)

> It's not openssh because I had problems with other files/dirs, too...
> Maybe it's evms?
> Maybe it's the kernel?
> It's a stock 2.4.19, only with evms and vserves patches.

As I said earlier, it's probably a hardware problem, or perhaps a
combination of hardware and kernel (i.e., the kernel tries to be too
agressive with the IDE DMA configuration, as Stephen conjectured).

> > In any case, the scenario I described (a controller/cable problem, or
> > an incorrectly configured IDE DMA settings) are all still possible
> > with RAID; RAID does not help you prevent these sorts of problems.
> It's SW RAID-1, disks are on the same controller,
> but different buses / cables.
> Am I right, that in this case HW errors are *very* unlikely?
> That would mean that there are exactly the same bits of errors at exactly
> the same time on different cables/disks...

Nope, you're incorrect here.  When you read from a SW-RAID-1 array,
the Software Raid driver picks one or the other disk (whichever one is
available) and reads from the that particular disk.  It does *not*
read the block from both disks, and compare the blocks read from both
disks to make sure they are identical, as you seem to believe.

What this means is that using Software RAID-1, you get higher
performance on reads, since you can take advantage of the I/O read
speeds of both disks.  It does mean that if one of the cables is
defective, then it will be random whether or not a disk read from the
disk will be corrupted, since it is a 50-50 chance whether the
Software Raid drive will try using disk #1 or disk #2.

						- Ted

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]