hda: lost interrupt

Mon Apr 5 16:23:03 UTC 2004

On Sun, Apr 04, 2004 at 11:10:34PM -0400, jludwig wrote:
> On Sun, 2004-04-04 at 13:59, Alexander Dalloz wrote:
> > Am So, den 04.04.2004 schrieb Timothy Murphy um 14:41:
> > 
> > > Probably OT, but I'm getting the above error message
> > > more and more frequently, followed by the machine seizing up.

More and more frequently sounds important.  Pay attention....  It may
be as simple as a specific region of the disk being assigned to a more
frequently used file system block. Or hardware is falling
apart as we type ;-(

> > > (1) Is this definitely a sign of a dying disk?
> > 
> > Not necessarily. Do you get "lost interrupt" messages followed by a
> > message line like "hda: read_intr: status=0x50 { DriveReady SeekComplete
> > }"?
> > 
> > What harddrive brand/type do you have? Does it sound loud and gives some
> > ill sounds?....
....
> This sounds like a drive going bad. The read_itnr: indicates that a
> sector could not be read properly and the drive is being reset. In any
> case download abd back up NOW!!!

A backup of your own data is always a good idea... 
Review setup notes and decisions as part of you backup.

Check your disk drive vendor for disk tools.
Most are on  a simple boot floppy.

The newer drives keep an log and vendor tools will let you see that
log.  After checking the log, a simple read only disk scan makes
sense.

Since ALL disks have media defects there is always a strategy to map
bad for good and to recover data within limits.  You may simply have a
minor defect that needs to be mapped to a spare.

If the drive has the option, enable read after write verification.
Some people turn this off to go faster.  In this case faster is a
secondary goal.

The most reliable common strategy is to spare the track on a write.
This way the read after a write can detect the error then the spare
track can be written with the good data still in a buffer.  With this
in mind some disk service tools have a non destructive read/write/read
test.  If the read can recover the data one way or another then the
write/read pair can trigger the assignment of a spare for the bad
region.

Today each vendor and each model is different so take some time
to search the vendor site and gather up what tools and hints you can
for your specific disk drive.

At the same time get the drive serial number and check warranty and
recall information.  I have found that name brand drive vendors do the
right thing when you do your homework.  The vendor tool disk is often
the touch stone for a repair/replacement.

Research the smart capabilities for your disk.  SMART  is

  http://www.seagate.com/support/kb/disc/smart.html

"Self-Monitoring, Analysis and Reporting Technology" and can often tell
you important things about your system.  For example elevated temp in
a drive may indicate a bad/blocked fan or air flow problem in the box.

    $ chkconfig --list | grep smart
    smartd          0:off   1:off   2:on    3:on    4:on    5:on    6:off
    $ rpm -q --whatprovides /usr/sbin/smartd
    kernel-utils-2.4-9.1.101.fedora
    ...
    RTFM... there is more than disks here.  Check also your BIOS notes.

See also 
    http://www.t13.org/project/d1321r1c.pdf
    http://smartmontools.sourceforge.net/

SUMMARY: vendor tools.... good stuff and better each year.

-- 
	T o m  M i t c h e l l 
	/dev/null the ultimate in secure storage.