Can I know if my disk is going bad.

David Tonhofer redhatter at m-plify.net
Thu Oct 30 11:19:23 UTC 2008


Ryan Golhar wrote:
> I usually watch the logwatch messages.  Anytime I've had a disk go 
> bad, there are usually kernel i/o errors pointed to /dev/sda (my hd).
>
> Rohit khaladkar wrote:
>> Hi All,I am using Red Hat Linux 5.2 OS . I wanted to know if by any 
>> means
>> there is a way to find out if my disk is going bad.I looked for some 
>> command
>> and found out "badblocks" command for the same.But the problem is this
>> command takes lot of time to execute.
>>
>> Please let me know if anyone is aware of any other way of finding it 
>> out, if
>> the disk is going bad.
>>
>> Thanks!
>> Rohit Khaladkar
>
You can use "smartctl" if the driver lets the SMART commands go to the disk.

This is the case for SATA but generally not SCSI.

1) Enable SMART (switches on online testing) and switch off auto offline 
testing (heavy data collection)

/usr/sbin/smartctl -son  /dev/hde
/usr/sbin/smartctl -ooff /dev/hde

2) Then run regular tests through a crontab entry:

00 06 * * * /usr/sbin/smartctl --test long /dev/hde > /dev/null
00 0-5,9-23 * * * /usr/sbin/smartctl --test short /dev/hde > /dev/null

3) Check the results using "smartctl -a /dev/hde"



smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce 
Allen
Home page is 
http://smartmontools.sourceforge.net/                           

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD3200SB-01KMA0
Serial Number:    WD-WCAMR3474349   
Firmware Version: 08.05J08          
User Capacity:    320,072,933,376 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   
6                                                                                                                                                                                            

ATA Standard is:  Exact ATA specification draft version not 
indicated                                                                                                                                          

Local Time is:    Thu Oct 30 12:18:58 2008 
CET                                                                                                                                                                 

SMART support is: Available - device has SMART 
capability.                                                                                                                                                     

SMART support is: 
Enabled                                                                                                                                                                                      

                                                                                                                                                                                                               

=== START OF READ SMART DATA SECTION 
===                                                                                                                                                                       

SMART overall-health self-assessment test result: 
PASSED                                                                                                                                                       

                                                                                                                                                                                                               

General SMART 
Values:                                                                                                                                                                                          

Offline data collection status:  (0x05) Offline data collection 
activity                                                                                                                                       

                                        was aborted by an interrupting 
command from 
host.                                                                                                                      

                                        Auto Offline Data Collection: 
Disabled.                                                                                                                                

Self-test execution status:      ( 241) Self-test routine in 
progress...                                                                                                                                       

                                        10% of test 
remaining.                                                                                                                                                 

Total time to complete 
Offline                                                                                                                                                                                 

data collection:                 (9000) 
seconds.                                                                                                                                                               

Offline data 
collection                                                                                                                                                                                        

capabilities:                    (0x7b) SMART execute Offline 
immediate.                                                                                                                                       

                                        Auto Offline data collection 
on/off 
support.                                                                                                                           

                                        Suspend Offline collection upon 
new                                                                                                                                    

                                        
command.                                                                                                                                                               

                                        Offline surface scan 
supported.                                                                                                                                        

                                        Self-test 
supported.                                                                                                                                                   

                                        Conveyance Self-test 
supported.                                                                                                                                        

                                        Selective Self-test 
supported.                                                                                                                                         

SMART capabilities:            (0x0003) Saves SMART data before 
entering                                                                                                                                       

                                        power-saving 
mode.                                                                                                                                                     

                                        Supports SMART auto save 
timer.                                                                                                                                        

Error logging capability:        (0x01) Error logging 
supported.                                                                                                                                               

                                        No General Purpose Logging 
support.                                                                                                                                    

Short self-test 
routine                                                                                                                                                                                        

recommended polling time:        (   2) 
minutes.                                                                                                                                                               

Extended self-test 
routine                                                                                                                                                                                     

recommended polling time:        ( 110) 
minutes.                                                                                                                                                               

Conveyance self-test 
routine                                                                                                                                                                                   

recommended polling time:        (   6) 
minutes.                                                                                                                                                               


SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  
Always       -       0       
  3 Spin_Up_Time            0x0003   189   187   021    Pre-fail  
Always       -       5516    
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   
Always       -       18      
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  
Always       -       0       
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  
Always       -       0       
  9 Power_On_Hours          0x0032   087   087   000    Old_age   
Always       -       9911    
 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  
Always       -       0       
 11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   
Always       -       18
194 Temperature_Celsius     0x0022   118   099   000    Old_age   
Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   
Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   
Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   
Always       -       0
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      
9910         -
# 2  Short offline       Completed without error       00%      
9908         -
# 3  Short offline       Aborted by host               10%      
9908         -
# 4  Extended offline    Completed without error       00%      
9907         -
# 5  Short offline       Aborted by host               10%      
9904         -
# 6  Short offline       Completed without error       00%      
9902         -
# 7  Short offline       Aborted by host               10%      
9902         -
# 8  Short offline       Aborted by host               10%      
9901         -
# 9  Short offline       Completed without error       00%      
9900         -
#10  Short offline       Completed without error       00%      
9898         -
#11  Short offline       Completed without error       00%      
9897         -
#12  Short offline       Completed without error       00%      
9896         -
#13  Short offline       Aborted by host               10%      
9896         -
#14  Short offline       Completed without error       00%      
9894         -
#15  Short offline       Completed without error       00%      
9893         -
#16  Short offline       Completed without error       00%      
9893         -
#17  Short offline       Completed without error       00%      
9892         -
#18  Short offline       Completed without error       00%      
9890         -
#19  Short offline       Completed without error       00%      
9890         -
#20  Short offline       Completed without error       00%      
9888         -
#21  Short offline       Completed without error       00%      
9887         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.









More information about the redhat-list mailing list