Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems

Andreas Dilger adilger at clusterfs.com
Wed Sep 19 17:11:06 UTC 2007


On Sep 17, 2007  11:59 -0700, Jeremy Cole wrote:
> >> Inode 16257874, i_size is 18014398562775391, should be 53297152
> 
> 53297152:
> 
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0011 0010 1101 0100 0000 0000 0000
> 
> 18014398562775391:
> 
> 0000 0000 0100 0000 0000 0000 0000 0000
> 0000 0011 0010 1101 0011 0001 0101 1111

Actually, since e2fsck doesn't know the right file size, it rounds
to the end of the last valid block, hence some of the last 12 bits
flipped and an increment.

> >> Inode 2121855, i_size is 35184386120704, should be 14032896.
> 
> 14032896:
> 
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 1101 0110 0010 0000 0000 0000
> 
> 35184386120704:
> 
> 0000 0000 0000 0000 0010 0000 0000 0000
> 0000 0000 1101 0110 0001 1100 0000 0000

Same.
 
> I would 
> suspect higher-level corruption than the actual disks (typical single 
> bit or double bit flips, and generally 1->0 only) but lower than the OS 
> (typical entire page corruptions of 4k-64k).
> 
> That leaves network, SATA controller, various system buses, and possibly 
> stupid errors in DRBD (although I'd call this unlikely).
> 
> Do note that data on e.g. the PCI bus is not protected by any sort of 
> checksum.  I've seen this cause corruption problems with PCI risers and 
> RAID cards.  Are you using a PCI riser card?  Note that LSI does *not* 
> certify their cards to be used on risers if you are custom building a 
> machine.
>

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.




More information about the Ext3-users mailing list