Second Block on Partition overwritten with 0xFF

Tomas Pospisek ML tpo2 at sourcepole.ch
Wed Sep 5 08:58:15 UTC 2007


Can anybody here give me a hint about the problem? Particulary:

> My question is: does the ext3 driver _ever_ write outside of its own
> space on disk - i.e into 0x000-0x400? That is can we exclude with
> certainity that it's _not_ the ext3 driver causing the problem?

?
*t

On 9/3/2007, "Tomas Pospisek ML" <tpo2 at sourcepole.ch> wrote:

>
>Hello everybody
>
>we're running a small population of lightly embedded machines with the
>following specs:
>
>System: +- standard intel box
>FS: ext3 (defaults,errors=remount-ro,noatime)
>HD: TRANSCEND, ATA DISK drive, Compact Flash (CF), 2000880 sectors (1024
>MB) w/2KiB Cache, CHS=1985/16/63
>Driver: Standard IDE Driver
>            ICH4: chipset revision 2
>            ICH4: not 100% native mode: will probe irqs later
>               ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:pio,
>hdb:pio
>               ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio,
>hdd:pio
>kernel: 2.6.15.6 #1 PREEMPT Sat Mar 11 00:56:41 CET 2006 i686 GNU/Linux
>
>ext3 was chosen in the hope to make the system more power-failure
>resilient. The system run on a UPS, but unfortunately some operators
>will just pull the power plug (allthought they're instucted not to).
>
>What we have experienced now multiple times is, that the systems run just
>fine, absolutely no complaints in dmesg/kern.log, until it is rebooted
>(shutdown -r now). At that point, *very rarely* GRUB will no longer be
>able to read the boot filesystem (Error 17).
>
>I've checked the on-disk data and have discovered that 0x200-0x1c00 is
>overwritten with 0xff, then a single 0x0f and after that 0x00 untill
>0x207f
>
>That is the second to the sixteenth on-disk blocks have been overwritten:
>
>000001e0  53 59 53 4d 53 44 4f 53  20 20 20 53 59 53 7f 01  |SYSMSDOS  
>SYS..|
>000001f0  00 41 bb 00 07 60 66 6a  00 e9 3b ff 00 00 00 00 
>|.A»..`fj.é;ÿ....|
>00000200  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff 
>|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
>*
>00001c00  ff 0f 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
>|ÿ...............|
>00001c10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
>|................|
>*
>00002080  ed 41 00 00 00 04 00 00  1e 39 a0 46 a6 6a dd 45  |íA.......9
>F¦jÝE|
>
>Our project does no hardware-level operations. All access is through
>regular file-operations only. Thus there's no way we're aware of that
>our software would be changing blocks on-disk directly.
>
>What's striking about the problem above is that the first affected block
>starts _before_ the on-disk filesystem (0x200), which starts at 0x400.
>
>My question is: does the ext3 driver _ever_ write outside of its own
>space on disk - i.e into 0x000-0x400? That is can we exclude with
>certainity that it's _not_ the ext3 driver causing the problem?
>
>What else could cause the problem then? We don't see any sign of a
>problem before reboot only after. Could the IDE driver be the problem?
>Or is it the IDE CF Card HW?
>
>I've done a dd=/dev/hdc of=/dev/null and there was absolutely no trouble
>visible (nothing in kern.log/dmesg), thus the card does not seem to be
>broken on the physical level and doesn't have badblocks that would fail
>on read.
>
>Does this ring a bell with anybody?
>*t
>
>_______________________________________________
>Ext3-users mailing list
>Ext3-users at redhat.com
>https://www.redhat.com/mailman/listinfo/ext3-users
>




More information about the Ext3-users mailing list