Catastrophic disk failure, where was smartd?
Roger Heflin
rogerheflin at gmail.com
Wed Mar 26 18:28:01 UTC 2008
Bruno Wolff III wrote:
> On Wed, Mar 26, 2008 at 08:35:49 -0500,
> "David G. Mackay" <mackay_d at bellsouth.net> wrote:
>> Shouldn't there have been some indication of problems prior to the
>> failure?
>
> Only if you are lucky. Someone at Google published some information about
> smart around a year ago. In cases where catastrophic failures occur, for a high
> percentage there is no warning from smart.
>
The big issue is that most of the smart implementations don't scan the disk for
bad blocks, and in my experience several years ago with a 1000+ disks in
services was that the #1 failure was bad blocks, and smart did little to catch
that. The #2 failure was failure to spin up at all, but this seemed to be
confined to certain batches.
One thing that I would do was do a simple "dd if=/dev/sdx of=/dev/null bs=1M" on
all of my disks maybe 1x per week or 1x per month to scan it yourself, if the
disk detects a sector getting too many errors (still correctable with the extra
bits they have) they will move the data from the bad sector to a spare, and mark
the bad sector bad, and I believe smart counts when this has been done.
Roger
More information about the fedora-list
mailing list