CD driver reads causes error by reading too far ahead

Fri Mar 11 22:35:32 UTC 2005

summary: reading from a CD block device generates spurious errors near
the end.  These errors prevent reasonable tasks from being done.

I think that this behaviour is new with LINUX 2.6.  I'm using Fedora
Core 3 with kernel 2.6.10-1.770_FC3 on x86_64.

I burned a CD from a .iso (you may ignore the details):

    I used Fedora core 3's Nautilus desktop to burn (rightclick on .iso,
    choose "Write to Disc...").
    -rw-rw-r--  1 hugh hugh  582391808 Feb  4 19:37 w2k3sp1_1433_usa_x64fre_pro.iso

I attempt to check the result.  I moved the burnt CD to another drive, then
used the command
	cmp w2k3sp1_1433_usa_x64fre_pro.iso /dev/hdd
to test the result.

I got what looks like failure:
	cmp: /dev/hdd: Input/output error

[I later tried the same test on Fedora Core 1 with a 2.4 kernel.  The
drive tested as correct (using device /dev/scd0).]

Let's investigate the error:

dmesg of error:
    hdd: media error (bad sector): status=0x51 { DriveReady SeekComplete Error }
    hdd: media error (bad sector): error=0x30
    ide: failed opcode was 100
    ATAPI device hdd:
      Error: Medium error -- (Sense key=0x03)
      Unrecovered read error -- (asc=0x11, ascq=0x00)
      The failed "Read 10" packet command was: 
      "28 00 00 04 56 b4 00 00 21 00 00 00 00 00 00 00 "
    end_request: I/O error, dev hdd, sector 1137360
    Buffer I/O error on device hdd, logical block 284340
    Buffer I/O error on device hdd, logical block 284341
    Buffer I/O error on device hdd, logical block 284342
    Buffer I/O error on device hdd, logical block 284343
    Buffer I/O error on device hdd, logical block 284344
    Buffer I/O error on device hdd, logical block 284345
    Buffer I/O error on device hdd, logical block 284346
    Buffer I/O error on device hdd, logical block 284347
    Buffer I/O error on device hdd, logical block 284348
    Buffer I/O error on device hdd, logical block 284349
    Buffer I/O error on device hdd, logical block 284350
    Buffer I/O error on device hdd, logical block 284351
    Buffer I/O error on device hdd, logical block 284352
    Buffer I/O error on device hdd, logical block 284353
    Buffer I/O error on device hdd, logical block 284354
    Buffer I/O error on device hdd, logical block 284355
    Buffer I/O error on device hdd, logical block 284356
    Buffer I/O error on device hdd, logical block 284357
    Buffer I/O error on device hdd, logical block 284358
    Buffer I/O error on device hdd, logical block 284359
    Buffer I/O error on device hdd, logical block 284360
    Buffer I/O error on device hdd, logical block 284361
    Buffer I/O error on device hdd, logical block 284362
    Buffer I/O error on device hdd, logical block 284363
    Buffer I/O error on device hdd, logical block 284364
    Buffer I/O error on device hdd, logical block 284365
    Buffer I/O error on device hdd, logical block 284366
    Buffer I/O error on device hdd, logical block 284367
    Buffer I/O error on device hdd, logical block 284368
    Buffer I/O error on device hdd, logical block 284369
    Buffer I/O error on device hdd, logical block 284370
    Buffer I/O error on device hdd, logical block 284371
    Buffer I/O error on device hdd, logical block 284372

Let's decode the failing command:

	28 00 00 04 56 b4 00 00 21 00 00 00 00 00 00 00

	28 	opcode: READ(10) [as the message said] [a 10-byte command]
	00 	Logical Unit = 0, DP0=0, FUA=0, RelAdr=0
	00 04 56 b4 	Logical Block Address = 284340
	00 	reserved
	00 21	Transfer Length = 33
	00	Control
	00 00 00 00 00 00	crap???

This is a read request, asking for 33 blocks, starting at block number
284340.

The .iso is 582391808 bytes or 284371 blocks of 2k.

blocks in .iso - block for start of command == 284371 - 284340 == 31

So 31 good blocks should be found at 284340 on.  But the read is for
33 blocks.

The request is asking for blocks beyond the end of the .iso.  No
wonder the request is failing: you cannot read runout blocks!

==> the system should not be reading blocks it was not asked to.
    Or, if it wants to read them, it should not return an error
    when the error is for blocks that were not requested.

Notice that the error is reported as being on sector 1137360 and block
284340.  I'm pretty sure that is is actually on block 284372.

==> the error message ought to have the correct block number, if
    possible

I claim this behaviour is wrong and broken.  But how this happens is
complicated.  Where should the fix be?

I now try with dd(1), hoping to control the readahead that is getting
us into the runout (cmp probably uses stdio which naturally tries to
read large chunks but dd should use unbuffered I/O).

    [hugh at redclaw hugh]$ dd if=/dev/hdd bs=2048 skip=284340 count=1 of=0
    dd: reading `/dev/hdd': Input/output error
    0+0 records in
    0+0 records out

dmesg shows:
    hdd: media error (bad sector): status=0x51 { DriveReady SeekComplete Error }
    hdd: media error (bad sector): error=0x30
    ide: failed opcode was 100
    ATAPI device hdd:
      Error: Medium error -- (Sense key=0x03)
      Unrecovered read error -- (asc=0x11, ascq=0x00)
      The failed "Read 10" packet command was: 
      "28 00 00 04 56 b4 00 00 21 00 00 00 00 00 00 00 "
    end_request: I/O error, dev hdd, sector 1137360
    Buffer I/O error on device hdd, logical block 284340

Notice that even though I specified a count of 1, the failing SCSI
command shows a count of 33!  This, itself, seems like a bug (perhaps
I have UNIX expectations of LINUX).

Here's another dd experiment, meant to avoid the dreaded readahead.
Attempt to read several blocks, in one read, starting at 284339.  It
turns out that 284339 is divisible by 11, so we can try to read 11
blocks with the following command:
	dd if=/dev/hdd bs=22528 skip=25849 count=1 of=0

The command was successful, but only 2048 bytes were read.  So the
block device is acting like one: it will limit a read to one physical
block.

Is there any way I can stop this stupid readahead?  I say stupid
because it causes an I/O error by reading something that I never asked
it too.  It compounds the mistake by reporting the error as happening
on a legitimate block.