Seagate disk problems (NCQ bug???)

D. Hugh Redelmeier hugh at mimosa.com
Mon May 11 17:21:15 UTC 2009


| From: Wolfgang S. Rupprecht <wolfgang.rupprecht+gnus200905 at gmail.com>

| "D. Hugh Redelmeier" <hugh at mimosa.com> writes:
| > "Wolfgang S. Rupprecht" <wolfgang.rupprecht+gnus200904 at gmail.com>:
| > >After running flawlessly for 6+ months I just had my Seagate
| > >ST31500343AS (w. SD35 firmware) flake out.
| >
| > And so it goes.  I infer that this cycle goes on until the power is
| > turned off.
| 
| Yes, the system gets progressively wonkier until it is rebooted.  At
| some point the drive just locks up hard and linux is dead in the water.

Ahh.  I had not noticed that your system got the disk working between
episodes.  I should have inferred this from the timestamps.

| The funny part is that I can't find anything on Seagate's site admitting
| to a problem with the version of the drive that I have.

I infer that Seagate generally doesn't disclose problems or even
fixes.  You have to report a problem to support, and perhaps even ask
explicitly for a firmware update to be offered one.

Note: messages on the Seagate forum are for user to user communication
and support people apparently never read them!  There are Seagate
moderators but they are not technical and are not support.

The one exception is the particular firmware update that prevents the
bricking of 7200.11 drives.  They announced that fix and made the
firmware available for download.  My guess is that they released it
because the recovery from bricking is painful to Seagate: once it
happens, any normal user must RMA the drive.

I, for example, asked for and got a firmware update in mid January to
fix a performance problem with my ST31500343AS.  This firmware was not
announced, but I asked for it because I knew it existed.  This was
just before the bricking problem was announced.  The firmware I got
was not the unbricking firmware.

There is much confusion on the Seagate forum.  Many users think that
the released firmware is supposed the fix their particular problem.
In fact, it was released to fix one particular bug.  It might happen
to fix other bugs (because it was based on later firmware than their
drive came with), but that was not the point.

Seagate has leaked, but not announced, details of the bug fixed by
that firmware.  Amazingly, some users have figured out how to revive
a bricked drive using a serial diagnostic console.
  http://www.msfn.org/board/index.php?showtopic=128807

If you have a lot of time on your hands, it might be interesting to
see what the serial diagnostic console has to say when your drive is
misbehaving.

|  There was some
| other guy with the same model and same firmware that also noticed the
| drive locking up once in a while.  Mine does it within 6 hours of me
| streaming data to the drive.  Seems that the data direction is very
| important.  It needs to be a write of several 1GB files for my system to
| lock up.

Do report this to Seagate support.  They might have a fix.

Do report this on the Seagate Forum (or somewhere else that you think
more suitable).  If you do, post that fact in this thread so other
interested folks can find it.

If you can reliably reproduce this problem, that in itself is very
interesting.  The reports on the Seagate forum have not been very
useful.

I'm very interested because I have two of these drives (with CC1J and
SD1A firmware) and they are mostly shelved because of the fear I have
of their reliability.  I'm playing with one in a MythTV box.  I'd like
to be able to use a recipe to see if I can drive my disks into bad
behaviour.

| At this point, I'd be happy to just turn off whatever feature is
| triggering the drives firmware bug.  I'd turn off NCQ from the linux end
| if there were any documentation on how to do it.

You didn't explain why you thought that the problem is related to NCQ.
Have you seen reports of NCQ problems?

This FAQ claims to tell you how to turn off NCQ:
  http://linux-ata.org/faq.html

Do consider doing a S.M.A.R.T. scan of the drive.  I've found that
bad blocks can do odd things to disk behaviour.




More information about the fedora-list mailing list