smartctl -l error /dev/... When to worry ?

Mon Jul 19 20:06:32 UTC 2004

Hi Randy!

Randy Kelsoe wrote:
> Hannes Mayer wrote:
> 
>> Hi all!
>>
>> I just discovered smartctl and the interesting output with
>> # smartctl -l error /dev/hda
>>
>> hda reports no errors (newest disk), but hdb and hdf do report some
>> errors every few hours (see outputs below)
>>
>> When do I actually have to start to worry about errors ?
>> I mean, is an error every few hours normal for older (1-2 years) disks ?
>>
>> ########################### hdb #################################
>>
>> smartctl version 5.21 Copyright (C) 2002-3 Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART Error Log Version: 1
>> Warning: ATA error count 60 inconsistent with error log pointer 3
>>
>> Error 60 occurred at disk power-on lifetime: 5386 hours
>>   When the command that caused the error occurred, the device was 
>> active or idle.
>> Error 59 occurred at disk power-on lifetime: 5386 hours
>>   When the command that caused the error occurred, the device was 
>> active or idle.
>> Error 58 occurred at disk power-on lifetime: 5386 hours
>>   When the command that caused the error occurred, the device was 
>> active or idle.
>> Error 57 occurred at disk power-on lifetime: 5385 hours
>>   When the command that caused the error occurred, the device was 
>> active or idle.
>> Error 56 occurred at disk power-on lifetime: 5375 hours
>>   When the command that caused the error occurred, the device was 
>> active or idle.
>>
> I have trimmed the above errors to show that the most recent 5 errors on 
> hdb were within 11 hours of each other. You need to look at a 'smartctl 
> -a /dev/hdb |grep -i power_on' to get the current age of the disk, then 
> compare it to the number of power-on hours when the error occurred. The 
> above errors occurred when you drive was 224 days old, so they might be 
> old errors, might have occurred during a power failure, etc.
> 
> You might also want to get a newer version of smartmontools. The latest 
> is 5.32, and you are running 5.21.

I get this for hdb:
   9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       646543
which I assume is in minutes, so that would be almost 449 days.
The errors occured at day 224, so that is pretty old.

 >> Error 1482 occurred at disk power-on lifetime: 5368 hours
 >>   When the command that caused the error occurred, the device was in
 >> an unknown state.
 >>   Commands leading to the command that caused the error were:
 >>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
 >>   -- -- -- -- -- -- -- --   ---------  --------------------
 >>   08 00 00 01 00 00 b0 00     220.000  DEVICE RESET
 >>   ec 00 01 01 00 00 b0 00     213.264  IDENTIFY DEVICE
 >>
 >> #################################################
> See a theme here? What device do or did you have as the master device on 
> the IDE bus with this drive? This looks like a device on the bus had a 
> problem, and the system tried to recover by resetting devices on the 
> bus. If you have a CDROM drive on the same bus, and it had problems, you 
> might see something like this in your error logs. Again, do the 
> 'smartctl -a /dev/hdf | grep -i power_on' and compare the age of the 
> drive to when the error occurred.

hdb is the slave on my IDE bus, currently with FC2.
hda is the master with windoze running idle 99% of the time, just mounting it
from FC2 from time to time.

hdf was the slave with hdb, when hdb still had windoze on it.

OK, so for hdf we have:
5368 hours = 224 days
   9 Power_On_Hours          0x0032   237   237   000    Old_age   Always       -       16981
16981 hours = 707.5 days

So these errors are pretty old too.

But I get confused by the Power_On_Hours. Can this value be altered in some way ?
hdb has 449 days and hdf 707 days, but hdb is in use for a much longer time...

Thank you very much Randy!

Cheers,
Hannes.