Intermittent ext3 corruption on external firewire Micronet1.5Tb RAID on FC3

Fri May 27 15:24:24 UTC 2005

Hi,

On Tue, 2005-05-17 at 08:42, Joseph D. Wagner wrote:
> > What I would do (if you don't mind overwriting the disk, presumably
> > not if it is just new and doesn't contain important data) is to
> > write a small test program to write the byte offset at the start of
> > every 4kB block on the disk, then read them all back and verify it
> > is correct.
> 
> That's what badblocks is for when doing a destructive write test.

No, badblocks just tells you if an IO succeeded.  It's really not
designed to make sure that the IO went to the correct disk block in the
presence of block aliasing, which is what you need to detect wraps.

I wrote a program to test such things a couple of months ago, and have
recently been polishing it up and writing documentation for it for
public consumption.  It's called "verify-data", and it does a
write-then-read verify scan designed for large block devices.  It uses
1MB IOs by default, with the buffer carefully constructed to be easily
recognisable: buffers contain a repeating pattern of block offset, byte
offset, magic number and pass number, so any IOs going astray are
instantly recognisable.  Everything should be 64-bit safe, and I've used
it on block devices up to 13TB in size.

By default it just writes and verifies one chunk every 128GB throughout
the device, but you can tell it to walk the whole device (MUCH
slower!).  I've found it very good for detecting edge-conditions, wraps
etc. on large block devices.  (It also includes a query mode, -Q, to
interrogate the GETBLKSIZE[64] ioctls too.)

It's called "verify-data" and can be found at 

        http://people.redhat.com/sct/src/verify-data/

I've got it in git locally, and can push the git repo to http too if
people find it useful.

Cheers, 
 Stephen